Aug 192018

So, I was just updating the project details for this project, and I happened to see this blog post about reading the DC voltage input on the TS-7670v2.

I haven’t yet gotten around to finishing the power meters that I was building which would otherwise be reading these values directly, but they were basically going to connect via Modbus to the TS-7670v2 anyway.  One of its roles, aside from routing between the physical management network (IPMI and switch console access), was to monitor the battery.

I will have to explore this.  Collectd doesn’t have a general-purpose I²C module, but it does have one for barometer modules, so with a bit of work, I could make one to measure the voltage input which would tell me what the battery is doing.

Jun 282018

So it’s been nearly a year since we installed the solar panels.

Date Average Daily kWh
Jun 2017 10.7
Sep 2017 11.0
Dec 2017 10.8
Mar 2018 9.3
Jun 2018 9.0

That’s our daily average energy consumption since then.  Moving to solar has taken 10% off the power consumption.  This is without optimising when I run the mains charger.

Apr 282018

So, a few weeks ago I installed a new battery charger, and tweaked it so that the solar did most of the leg work during the day, and the charger kept the batteries topped up at night.

I also discussed the addition of a new industrial PC to perform routing and system monitoring functions… which was to run Gentoo Linux/musl. For now, that little PC is still running Debian Stretch, but for 45 days, it was rock solid. The addition of this box, and taking on the role of router to the management network meant I could finally achieve one of my long-term goals for the project: decommissioning the old server.

The old server is still set up with all my data and software… but now the back-up cron job calls /sbin/poweroff when it’s done, and the BIOS is set to wake the machine up in the evening ready to receive a back-up late at night.

In its place, a virtual machine clone of the box, handles my email and all the old functions of that server. This was all done just prior to my father and I leaving for a 3 week holiday in the Snowy Mountains.

I did have a couple of hiccups with Ceph OSDs crashing … but basically re-starting the daemons (done remotely whilst travelling through Cowra) got everything back up. A bit of placement group cleaning, and everything was back online again. I had another similar hiccup coming out of Maitland, but once again, re-starting the daemons fixed it. No idea why it crashed, that’s something I’ll have to investigate.

Other than that, the cluster itself has run well.

One thing that did momentarily kill the industrial PC though: I wandered down to the rack with a small bus-powered 2.5″ HDD with the intent of re-starting my Gentoo builds. This HDD had the same content as the 3.5″ HDD I had plugged in before. I figured being bus powered, I would not be dependent on mains, and it could just chug away to its heart’s content.

No such luck, the moment I plugged that drive in, the little machine took great umbrage to the spinning rust now vacuuming the electrons away from its core functions, and shut down abruptly. I’ve now brought my 3.5″ drive and dock down, plugged that into the wall, and have my builds resuming. If power goes off, hopefully the machine either handles the loss of swap gracefully. If it does crash, the watchdog will take care of it.

Thus, I have the little TS-7670 first attempting a build of gcc, to see how we go. Finger’s crossed our power should remain up. There was at least one outage in the time we were away, but hopefully we should get though this next build!

The next step I think should be to add some control of the mains charger to allow the batteries to be boosted to full charge overnight. The thinking is a simple diode-OR arrangement. Many comparators such as the LM393 have an open-collector output, which gives us this for free.

The theory is this.

The battery bank powers a simple circuit which runs of a 5V regulator. That regulator powers a dual comparator IC and provides a reference voltage. The comparator draws bugger all power, so I’m happy to use a linear PSU here. It’s mainly there as a voltage reference.

Precision isn’t really the aim here, so adjustable pots will make life easier.

The voltages from the battery bank and the solar panel are fed through voltage dividers to bring the voltages down to below 5V, then those voltages are individually fed into separate pots that control the hysteresis. I can adjust all points of the system.

The idea is that should the batteries get too low, or the sun go down, one or the other (or both) comparators will go low and pull down on R2. If the batteries are high and the sun is up, nothing pulls on R2 so the REMOTE+ pin on the HEP-600C-12 is allowed to float to +5V, turning off the mains charger.

The advantage of this is there’s no programming of a microcontroller, it’s just analogue electronics. The LM393s are pretty hardy things, the datasheet says they’ll run at 36V and can accept a maximum voltage of VCC-1.5V; so if I run at 5V, 3.5V is my recommended maximum. The adjustment pots should let me set a threshold voltage that avoids going above this.

I mainly need 5V for the HEP-600C-12, and for providing that stable known voltage reference. The LM78C05 should be fine for this.

Once I’ve done that, I should be able to wind that charger back up to its factory setting of 14.4V, which will mean that overnight the batteries will be charged back to full charge.

Apr 012018

So yesterday I wound back the mains charger so that the solar would take on the load during the day.  Seems I wound it back a bit far, and the mains charger did almost no work overnight, leaving the battery somewhere around 11.8V.

That’s a wee bit low for my comfort.  Yes, they are deep cycle AGMs, but I’d rather not get that low.

Thus, I wound it up a bit, float at 12.8V, so Vboost at 13.6V.  That looks to be the sweet spot.  Now that the sun is up, I’m getting nice healthy amps of current down the wire from the roof:

The cluster is drawing about 8A, so that’s the cluster powered, and about 6A going to the batteries. It intermittently peaks about 15A or so.

I also found myself fine tuning the Ethernet settings on the border router. For some reason, its Realtek RTL8139 was happy to talk to the Cisco SG-200-08 it was connected to before, but didn’t quite get along with the Linksys LGS326-AU. I’ve told the switch to force 100Mbps full-duplex MDIX (evidently, it’s a cross-over cable), and so far, that seems to have settled things down.

Mar 242018

So, I’ve now moved the ADSL and router onto the battery supply. This has added an extra amp of load, but really, the solar panel handles this easy.

I dug up one of my spare switchmode PSU modules and then got to thinking about how I’d mount the thing. In the end, double sided tape… to keep the terminals of the adjustment pot from shorting, to a piece of old copper clad PCB from the project graveyard, with some wires soldered on.

The donor PCB already had regions cut out for terminals around the edge, so I could use those for drilling mounting holes. I just made additional terminal pads for soldering the input and output supply rails. Initially I tried putting a 1mF capacitor across the output, but evidently the one I grabbed was crook as it presented a 10Ω load. I don’t think the cause was due to it charging. The PSU has a 220µF there already, so let’s see how it fares.

Fairly simple, +12V comes in via the orange wire into IN+, the “LM2596” steps that down to 5V, comes out the red wire. Screw terminals allow me to swap input and output.

Before hooking it up to the ADSL modem, I made sure to dial it in to 5V.

Meh… who’s going to care about 3mV. 🙂

As it happens, the original PSU puts out 5.3V. I think I’m closer. I can always dial it up if needed.

I put the lid on the case and made up the rest of my wiring harness. One 5A blade fuse, a bit of work around the back of the rack, and it was installed.

In the meantime, I have my old server busy pushing its last daily back-up across to a newly provisioned virtual machine on the cluster.

One problem this presents is that this one VM occupies about 70% of my usable storage cluster capacity. The cases can take one 2.5″ HDD, which unless you’re willing to risk it with Seagate (I’ve had too many of them fail), top-out at 2TB.

There are SSDs too, but I’m not made of money, and I’ve already spent the cost of a small car on this cluster as it is. My thinking is I might look at modifying the cases with a new lid to accept a 3.5″ HDD. If I make the case a wee bit taller, a 3.5″ HDD would fit in the lid, and I could add fans around it to cool it.

The other option is to make external eSATA 3.5″ DIN-rail mounted cases. I did look online, but didn’t see any for sale. That said, space is getting squeezy on that DIN rail, and I do have to be mindful of cooling.

Mar 172018

Last night, I got home, having made a detour on my way into work past Jaycar Wooloongabba to replace the faulty PSU.
It was a pretty open-and-shut case, we took it out of the box, plugged it in, and sure enough, no fan.  After the saleswoman asked the advice of a co-worker, it was confirmed that the fan should be running.
It took some digging, but they found a replacement, and so it was boxed up (in the box I supplied, they didn’t have one), and I walked out the door with PSU No. 3.
I had to go straight to work, so took the PSU with me, and that evening, I loaded it into the top box to transport home on the bicycle.
I get home, and it’s first thing on my mind.  I unlock the top box, get it out, and still decked out in my cycling gear, helmet and all (needed the headlight to see down the back of the rack anyway), I get to work.
I put the ring lugs on, plug it into the wall socket and flick the switch.
Toggle the switch on the front, still nothing.
Tried the other socket on the outlet, unplugging the load, still nothing.  Did the 10km trip from Milton to The Gap kill it?
Frustrated, I figure I’ll switch a light on.  Funny… no lights.
I wander into the study… sure enough, the router, modem and switch are dead as doornails.  Wander out to the MDB outside, saw the main breaker was still on, and tried hitting the test button.  Nothing.
I wander back inside, switching the bike helmet for my old hard hat, since it looks as if I’ll need the headlight a bit longer, then take a sticky beak down the road to see if anyone else is facing the same issue.
Sure enough, I look down the street, everyone’s out.
So there goes my second attempt at bootstrapping Gentoo, and my old server’s uptime.
The power did return about an hour or so later.  The PSU was fine, you don’t think of the mains being out as the cause of your problems.
I’ll re-start my build, but I’m not going to lose another build to failing power.  Nope, had enough of that for a joke.
I could have rigged up a UPS to the TS-7670, but I already have one, and it’s in the very rack where it’ll get installed anyway.  Thus, no time like the present to install it.
I’ll have to configure the switch to present the right VLANs to the TS-7670, but once I do that, it’ll be able to take over the role of routing between the management VLAN and the main network.
I didn’t want to do this in a VM because that means exposing the hosts and the VMs to the management VLAN, meaning anyone who managed to compromise a host would have direct access to the BMCs on the other nodes.
This is not a network with high bandwidth demands, and so the TS-7670 with its 100Mbps Ethernet (built into the SoC; not via USB) is an ideal machine for this task.
Having done this, all that’s left to do is to create a 2GB dual-core VM which will receive the contents of the old server, then that server can be shut down, after 8 years of good service.  I’ll keep it around for storing the on-site backups, but now I can keep it asleep and just wake it up with Wake-on-LAN when I want to make a back-up.
This should make a dint in our electricity bill!
Other changes…

  • Looks like we’ll be upgrading the solar with the addition of another 120W panel.
  • I will be hooking up my other network switches, the ADSL router and ADSL modem up to the battery bank on the cluster, just got to get some suitable cable for doing so.
  • I have no faith in this third PSU, so already, I have a MeanWell HEP-600C coming.  We’ll wire up a suicide lead to it, and that can replace the Powertech MP-3089 + Redarc BCDC1225, as the MeanWell has a remote on/off feature I can use to control it.
Jan 172018

I’ve taken the plunge and gotten a TS-7670 ordered in a DIN-rail mount for monitoring the battery.  Not sure what the shipping will be from Arizona to here, but I somehow doubt I’m up for more than AU$300 for this thing.  The unit itself will cost AU$250.

Some will argue that a Raspberry Pi or BeagleBone would be cheaper, and that would be correct, however by the time you’ve added a DIN-rail mount case, an RS-485 control board and a 12V to 5V step-down power converter, you’d be around that figure anyway.  Plus, the Raspberry Pi doesn’t give you schematics.  The BeagleBone does, but is also a more sophisticated beast.

The plan is I’ll spin a version of Gentoo Linux on it… possibly using the musl C library to keep memory usage down as I’ve gone the base model with 128MB RAM.  I’ll re-spin the kernel and U-Boot patches I have for the latest release.

There will be two functions looked after:

  • Access to the IPMI/L2 management network
  • Polling of the two DC power meters (still to be fully designed) via Modbus

It can report to a VM running on one of the hosts.  I believe collectd has the necessary bits and pieces to do this.  Failing that, I’ve written code before that polls Modbus… I write such code for a day job.

Nov 192017

So, this weekend I did plan to run from solar full time to see how it’d go.

Mother nature did not co-operate.  I think there was about 2 hours of sunlight!  This is what the 24 hour rain map looks like from the local weather radar (image credit: Bureau of Meteorology):

In the end, I opted to crimp SB50 connectors onto the old Redarc BCDC1225 and hook it up between the battery harness and the 40A power supply. It’s happily keeping the batteries sitting at about 13.2V, which is fine. The cluster ran for months off this very same power supply without issue: it’s when I introduced the solar panels that the problems started. With a separate controller doing the solar that has over-discharge protection to boot, we should be fine.

I also have mostly built-up some monitoring boards based on the TI INA219Bs hooked up to NXP LPC810s. I have not powered these up yet, plan is to try them out with a 1ohm resistor as the stand-in for the shunt and a 3V rail… develop the firmware for reporting voltage/current… then try 9V and check nothing smokes.

If all is well, then I’ll package them up and move them to the cluster. Not sure of protocols just yet. Modbus/RTU is tempting and is a protocol I’m familiar with at work and would work well for this application, given I just need to represent voltage and current… both of which can be scaled to fit 16-bit registers easy (voltage in mV, current in mA would be fine).

I just need some connectors to interface the boards to the outside world and testing will begin. I’ve ordered these and they’ll probably turn up some time this week.

Nov 052017

So… with the new controller we’re able to see how much current we’re getting from the solar.  I note they omit the solar voltage, and I suspect the current is how much is coming out of the MPPT stage, but still, it’s more information than we had before.

With this, we noticed that on a good day, we were getting… 7A.

That’s about what we’d expect for one panel.  What’s going on?  Must be a wiring fault!

I’ll admit when I made the mounting for the solar controller, I didn’t account for the bend radius in the 6gauge wire I was using, and found it was difficult to feed it into the controller properly.  No worries, this morning at 4AM I powered everything off, took the solar controller off, drilled 6 new holes a bit lower down, fed the wires through and screwed them back in.

Whilst it was all off, I decided I’d individually charge the batteries.  So, right-hand battery came first, I hook the mains charger directly up and let ‘er rip.  Less than 30 minutes later, it was done.

So, disconnect that, hook up the left hand battery.  45 minutes later the charger’s still grinding away.  WTF?

Feel the battery… it is hot!  Double WTF?

It would appear that this particular battery is stuffed.  I’ve got one good one though, so for now I pull the dud out and run with just the one.

I hook everything up,  do some final checks, then power the lot back up.

Things seem to go well… I do my usual post-blackout dance of connecting my laptop up to the virtual instance management VLAN, waiting for the OpenNebula VM to fire up, then log into its interface (because we’re too kewl to have a command line tool to re-start an instance), see my router and gitea instances are “powered off”, and instruct the system to boot them.

They come up… I’m composing an email, hit send… “Could not resolve hostname”… WTF?  Wander downstairs, I note the LED on the main switch flashing furiously (as it does on power-up) and a chorus of POST beeps tells me the cluster got hard-power-cycled.  But why?  Okay, it’s up now, back up stairs, connect to the VLAN, re-start everything again.

About to send that email again… boompa!  Same error.  Sure enough, my router is down.  Wander downstairs, and as I get near, I hear the POST beeps again.  Battery voltage is good, about 13.2V.  WTF?

So, about to re-start everything, then I lose contact with my OpenNebula front-end.  Okay, something is definitely up.  Wander downstairs, and the hosts are booting again.  On a hunch I flick the off-switch to the mains charger.  Klunk, the whole lot goes off.  There’s no connection to the battery, and so when the charger drops its power to check the battery voltage, it brings the whole lot down.

WTF once more?  I jiggle some wires… no dice.  Unplug, plug back in, power blinks on then off again.  What is going on?

Finally, I pull right-hand battery out (the left-hand one is already out and cooling off, still very warm at this point), 13.2V between the negative terminal and positive on the battery, good… 13.2V between negative and the battery side of the isolator switch… unscrew the fuse holder… 13.2V between fuse holder terminal and the negative side…  but 0V between negative side on battery and the positive terminal on the SB50 connector.

No apparent loose connections, so I grab one of my spares, swap it with the existing fuse.  Screw the holder back together, plug the battery back in, and away it all goes.

This is the offending culprit.  It’s a 40A 5AG fuse.  Bought for its current carrying capacity, not for the “bling factor” (gold conductors).

If I put my multimeter in continuance test mode and hold a probe on each end cap, without moving the probes, I hear it go open-circuit, closed-circuit, open-circuit, closed-circuit.  Fuses don’t normally do that.

I have a few spares of these thankfully, but I will be buying a couple more to replace the one that’s now dead.  Ohh, and it looks like I’m up for another pair of batteries, and we will have a working spare 105Ah once I get the new ones in.

On the RAM front… the firm I bought the last lot through did get back to me, with some DDR3L ECC SO-DIMMs, again made by Kingston.  Sounded close enough, they were 20c a piece more (AU$855 for 6 vs $AU864.50).

Given that it was likely this would be an increasing problem, I thought I’d at least buy enough to ensure every node had two matched sticks in, so I told them to increase the quantity to 9 and to let me know what I owe them.

At first they sent me the updated invoice with the total amount (AU$1293.20).  No problems there.  It took a bit of back-and-forth before I finally confirmed they had the previous amount I sent them.  Great, so into the bank I trundle on Thursday morning with the updated invoice, and I pay the remainder (AU$428.70).

Friday, I get the email to say that product was no longer available.  They instead, suggested some Crucial modules which were $60 a piece cheaper.  Well, when entering a gold mine, one must prepare themselves for the shaft.

Checking the link, I found it: these were non-ECC.  1Gbit×64, not 1Gbit×72 like I had ordered.  In any case I was over it, I fired back an email telling them to cancel the order and return the money.  I was in no mood for Internet shopper Russian Roulette.

It turns out I can buy the original sticks through other suppliers, just not in the quantities I’m after.  So I might be able to buy one or two from a supplier, I can’t buy 9.  Kingston have stopped making them and so what’s left is whatever companies have in stock.

So I’ll have to move to something else.  It’d be worth buying one stick of the original type so I can pair it with one of the others, but no more than that.  I’m in no mood to do this in a few years time when parts are likely to be even harder to source… so I think I’ll bite the bullet and go 16GB modules.  Due to the limits on my debit card though, I’ll have to buy them two at a time (~$900AUD each go).  The plan is:

  1. Order in two 16GB modules and an 8GB module… take existing 8GB module out of one of the compute nodes and install the 16GB modules into that node.  Install the brand new 8GB module and the recovered 8GB module into two of the storage nodes.  One compute node now has 32GB RAM, and two storage nodes are now upgraded to 16GB each.  Remaining compute node and storage node each have 8GB.
  2. Order in two more 16GB modules… pull the existing 8GB module out of the other compute node, install the two 16GB modules.  Then install the old 8GB module into the remaining storage node.  All three storage nodes now have 16GB each, both compute nodes have 32GB each.
  3. Order two more 16GB modules, install into one compute node, it now has 64GB.
  4. Order in last two 16GB modules, install into the other compute node.

Yes, expensive, but sod it.  Once I’ve done this, the two nodes doing all the work will be at their maximum capacity.  The storage nodes are doing just fine with 8GB, so 16GB should mean there’s plenty of RAM for caching.

As for virtual machine management… I’m pretty much over OpenNebula.  Dealing with libvirt directly is no fun, but at least once configured, it works!  OpenNebula has a habit of not differentiating between a VM being powered off (as in, me logging into the guest and issuing a shutdown), and a VM being forcefully turned off by the host’s power getting yanked!

With one, there should be some event fired off by libvirt to tell OpenNebula that the VM has indeed turned itself off.  With the latter, it should observe that one moment the VM is there, and next it isn’t… the inference being that it should still be there, and that perhaps that VM should be re-started.

This could be a libvirt limitation too.  I’ll have to research that.  If it is, then the answer is clear: we ditch libvirt and DIY.  I’ll have to research how I can establish a quorum and schedule where VMs get put, but it should be doable without the hassle that OpenNebula has been so far, and without going to the utter tedium that is OpenStack.

Oct 242017

So yeah, it seems history repeats itself.  The Redarc BCDC1225 is not reliable in switching between solar inputs and 12V input derived from the mains.

At least this morning’s wake-up call was a little later in the morning:

From: ipmi@hydrogen.ipmi.lan
Subject: IPMI hydrogen.ipmi.lan
Message-Id: <>
Date: Tue, 24 Oct 2017 05:43:05 +1000 (EST)

Incoming alert
IP :
Hostname: hydrogen.ipmi.lan
SEL_TIME:"1970/01/27 02:03:00" 
SENSOR_TYPE:"Voltage          "
SENSOR_ID:"12V             " 
EVENT_DESCRIPTION:"Lower Critical going low                                         "
EVENT SEVERITY:"non-critical"

We’re now rigging up the Xantrex charger that I was using in early testing and will probably use that for mains. I have a box wired up with a mains SSR for switching power to it.  I think that’ll be the long-term plan and the Redarc charger will be retired from service, perhaps we might use it in some non-critical portable station.