Nov 102018
 

Right now, the cluster is running happily with a Redarc BCDC-1225 solar controller, a Meanwell HEP-600C-12 acting as back-up supply, a small custom-made ATTiny24A-based power controller which manages the Meanwell charger.

The earlier purchased controller, a Powertech MP-3735 now is relegated to the function of over-discharge protection relay.  The device is many times the physical size of a VSR, and isn’t a particularly attractive device for that purpose.  I had tried it recently as a solar controller, but it’s fair to say, it’s rubbish at it.  On a good day, it struggles to keep the battery above “rock bottom” and by about 2PM, I’ll have Grafana pestering me about the battery slipping below the 12V minimum voltage threshold.

Actually, I’d dearly love t rip that Powertech controller apart and see what makes it tick (or not in this case).  It’d be an interesting study in what they did wrong to give such terrible results.

So, if I pull that out, the question is, what will prevent an over-discharge event from taking place?  First, I wish to set some criteria, namely:

  1. it must be able to sustain a continuous load of 30A
  2. it should not induce back-EMF into either the upstream supply or the downstream load when activated or activated
  3. it must disconnect before the battery reaches 10.5V (ideally it should cut off somewhere around 11-11.5V)
  4. it must not draw excessive power whilst in operation at the full load

With that in mind, I started looking at options.  One of the first places I looked was of course, Redarc.  They do have a VSR product, the VS12 which has a small relay in it, rated for 10A, so fails on (1).  I asked on their forums though, and it was suggested that for this task, a contactor, the SBI12, be used to do the actual load shedding.

Now, deep inside the heart of the SBI12 is a big electromechanical contactor.  Many moons ago, working on an electric harvester platform out at Laidley for Mulgowie Farming Company, I recall we were using these to switch the 48V supply to the traction motors in the harvester platform.  The contactors there could switch 400A and the coils were driven from a 12V 7Ah battery, which in the initial phases, were connected using spade lugs.

One day I was a little slow getting the spade lug on, so I was making-breaking-making-breaking contact.  *WHACK*… the contactor told me in no uncertain terms it was not happy with my hesitation and hit me with a nice big back-EMF spike!  I had a tingling arm for about 10 minutes.  Who knows how high that spike was… but it probably is higher than the 20V absolute maximum rating of the MIC29712s used for power regulation.  In fact, there’s a real risk they’ll happily let such a rapidly rising spike straight through to the motherboards, frying about $12000 worth of computers in the process!

Hence why I’m keen to avoid a high back-EMF.  Supposedly the SBI12 “neutralises” this … not sure how, maybe there’s a flywheel diode or MOV in there (like this), or maybe instead of just removing power in a step function, they ramp the current down over a few seconds so that the back-EMF is reduced.  So this isn’t an issue for the SBI12, but may be for other electromechanical contactors.

The other concern is the power consumption needed to keep such a beast activated.  The other factor was how much power these things need to stay actuated.  There’s an initial spike as the magnetic field ramps up and starts drawing the armature of the contactor closed, then it can drop down once contact has been made.  The figures on the SBI12 are ~600mA initially, then ~160mA when holding… give or take a bit.

I don’t expect this to be turned on frequently… my nodes currently have up-times around 172 days.  So while 600mA (7~8W at 12V nominal) is high, that’ll only be for a second at most.  Much of the current will be holding current at, let’s call it 200mA to be safe, so about 2~3W.

That 2-3W is going to be the same, whether my nodes collectively draw 10mA, 10A or 100A.

It seemed like a lot, but then I thought, what about a SSR?  You can buy a 100A DC SSR like this for a lot less money than the big contactors.  Whack a nice big heat-sink on it, and you’re set.  Well, why the heat-sink?  These things have a voltage drop and on resistance.  In the case of the Jaycar one, it’s about 350mV and the on resistance is about 7mΩ.

Suppose we were running flat chat at our predicted 30A maximum…

  • MOSFET switch voltage drop: 30A × 350mV = 10.5W
  • Ron resistance voltage drop: (30A)² × 7mΩ = 6.3W
  • Total power dissipation: 10.5W + 6.3W = 16.8W OUCH!

16.8W is basically the power of an idle compute node.  The 3W of the SBI12 isn’t looking so bad now!  But can we do better?

The function of a solid-state relay, amongst other things, is to provide electrical isolation between the control and switching components.  The two are usually galvanically isolated.  This is a feature I really don’t need, so I could reduce costs by just using a bare MOSFET.

The earlier issues I had with the body diode won’t be a problem here as there’s a definite “source” and “load”, there’ll be no current to flow out of the load back to the source to confuse some sensing circuit on the source side.  This same body diode might be an issue for dual-battery systems, as the auxiliary battery can effectively supply current to a starter motor via this body diode, but in my case, it’s strictly switching a load.

I also don’t have inductive loads on my system, so a P-channel MOSFET is an option.  One candidate for this is the Infineon AUIRFS3004-7P.  The Ron on these is supposedly in the realm of 900µΩ-1.25mΩ, and of course, being that it’s a bare MOSFET and not a SSR, there’s no voltage drop.  Thus my power dissipation at 30A is predicted to be a little over 1W.

There are others too with even smaller Ron values, but they are in teeny tiny 5mm square surface-mount packages.  The AUIRFS3004-7P looks dead-buggable, just bend up the gate pin so I can solder direct to it, and treat the others as single “pins”, then strap the sucker to a big heatsink (maybe an old PIII heatsink will do the trick).

I can either drive this MOSFET with something of my own creation, or with the aforementioned Redarc VS12.  The VS12 still does contain a (much smaller) electromechanical relay, but at 30mA (~400mW), it’s bugger all.

The question though was what else could be done?  @WIRING_SOLUTIONS suggested some units made by Victron Energy.  These do have a nice feature in that they also have over-voltage protection, and conveniently, it’s 16V, which is the maximum recommended for the MIC29712s I’m using.  They’re not badly priced, and are solid-state.

However, what’s the Ron, what’s the voltage drop?  Victron don’t know.  They tell me it’s “minimal”, but is that 100nV, 100mV, 1V?  At 30A, 100mV drop equates to 3W, on par with the SBI12.  A 500mV drop would equate to a whopping 15W!

I had a look at the suppliers for Victron Energy products, and via those, found a few other contenders such as this one by Baintech and the Projecta LVD30.  I haven’t asked about these, but again, like the Victron BatteryProtect, neither of these list a voltage drop or Ron.

There’s also this one from Jaycar, but given this is the same place that sold me the Powertech MP-3735, and sold me the original Powertech MP-3089, provided a replacement for that first one, then also replaced the replacement under RMA.  The Jaycar VSR also has practically no specs… yeah, I think I’ll pass!

Whitworths marine sell this, it might be worth looking at but the cut-out voltage is a little high, and they don’t actually give the holding current (330mA “engage” current sounds like it’s electromechanical), so no idea how much power this would dissipate either.

The power controller isn’t doing a job dissimilar to a VSR… in fact it could be repurposed as one, although I note its voltage readings seem to drift quite a lot.  I suspect this is due to the choice of 5% tolerance resistors on the voltage sensing circuit and my use of the ~1.1V internal voltage reference.  The resistors will drift a little bit, and the voltage reference can be anywhere from 1.0 to 1.2V.

Would a LM311N with good quality 1% resistors and a quality voltage reference be “better”?  Who knows?  Maybe I should try an experiment, see if I can get minimal drift out of a LM311N.  It’s either the resistors, the voltage reference, or a combination of the two that’s responsible for the power controller’s drift.

Perhaps I need to investigate which is causing the problem and see what can be done in the design to reduce it.  If I can get acceptable results, then maybe the VS12 can be dispensed with.  I may be able to do it with another ATTiny24A, or even just a simple LM311N.

Nov 192017
 

So, this weekend I did plan to run from solar full time to see how it’d go.

Mother nature did not co-operate.  I think there was about 2 hours of sunlight!  This is what the 24 hour rain map looks like from the local weather radar (image credit: Bureau of Meteorology):

In the end, I opted to crimp SB50 connectors onto the old Redarc BCDC1225 and hook it up between the battery harness and the 40A power supply. It’s happily keeping the batteries sitting at about 13.2V, which is fine. The cluster ran for months off this very same power supply without issue: it’s when I introduced the solar panels that the problems started. With a separate controller doing the solar that has over-discharge protection to boot, we should be fine.

I also have mostly built-up some monitoring boards based on the TI INA219Bs hooked up to NXP LPC810s. I have not powered these up yet, plan is to try them out with a 1ohm resistor as the stand-in for the shunt and a 3V rail… develop the firmware for reporting voltage/current… then try 9V and check nothing smokes.

If all is well, then I’ll package them up and move them to the cluster. Not sure of protocols just yet. Modbus/RTU is tempting and is a protocol I’m familiar with at work and would work well for this application, given I just need to represent voltage and current… both of which can be scaled to fit 16-bit registers easy (voltage in mV, current in mA would be fine).

I just need some connectors to interface the boards to the outside world and testing will begin. I’ve ordered these and they’ll probably turn up some time this week.

Nov 132017
 

So, at present I’ve been using a two-charger solution to keep the batteries at full voltage.  On the solar side is the Powertech MP3735, which also does over-discharge protection.  On the mains side, I’m using a Xantrex TC2012.

One thing I’ve observed is that the TC2012, despite being configured for AGM batteries, despite the handbook saying it charges AGM batteries to a maximum 14.3V, has a happy knack of applying quite high charging voltages to the batteries.

I’ve verified this… every meter I’ve put across it has reported it at one time or another, more than 15V across the terminals of the charger.  I’m using SB50 connectors rated at 50A and short runs of 6G cable to the batteries.  So a nice low-resistance path.

The literature I’ve read says 14.8V is the maximum.  I think something has gone out of calibration!

This, and the fact that the previous set-up over-discharged the batteries twice, are the factors that lead to the early failure of both batteries.

The two new batteries (Century C12-105DA) are now sitting in the battery cases replacing the two Giant Energy batteries, which will probably find themselves on a trip to the Upper Kedron recycling facility in the near future.

The Century batteries were chosen as I needed the replacements right now and couldn’t wait for shipping.  This just happened to be what SuperCheap Auto at Keperra sell.

The Giant Energy batteries took a number of weeks to arrive: likely because the seller (who’s about 2 hours drive from me) had run out of stock and needed to order them in (from China).  If things weren’t so critical, I might’ve given those batteries another shot, but I really didn’t have the time to order in new ones.

I have disconnected the Xantrex TC2012.  I really am leery about using it, having had one bad experience with it now.  The replacement batteries cost me $1000.  I don’t want to be repeating the exercise.

I have a couple of options:

  1. Ditch the idea of using mains power and just go solar.
  2. Dig out the Redarc BCDC1225 charger I was using before and hook that up to a regulated power supply.
  3. Source a new 20A mains charger to hook in parallel with the batteries.
  4. Hook a dumb fixed-voltage power supply in parallel with the batteries.
  5. Hook a dumb fixed-voltage power supply in parallel with the solar panel.

Option (1) sounds good, but what if there’s a run of cloudy days?  This really is only an option once I get some supervisory monitoring going.  I have the current shunts fitted and the TI INA219Bs for measuring those shunts arrived a week or so back, just haven’t had the time to put that into service.  This will need engineering time.

Option (2) could be done right now… and let’s face it, its problem was switching from solar to mains.  In this application, it’d be permanently wired up in boost mode.  Moreover, it’s theoretically impossible to over-discharge the batteries now as the MP3735 should be looking after that.

Option (3) would need some research as to what would do the job.  More money to spend, and no guarantee that the result will be any better than what I have now.

Option (4) I’m leery about, as there’s every possibility that the power supply could be overloaded by inrush current to the battery.  I could rig up a PWM circuit in concert with the monitoring I’m planning on putting in, but this requires engineering time to figure out.

Option (5) I’m also leery about, not sure how the panels will react to having a DC supply in parallel to them.  The MP3735 allegedly can take an input DC supply as low as 9V and boost that up, so might see a 13.8V switchmode PSU as a solar panel on a really cloudy day.  I’m not sure though.  I can experiment, plug it in and see how it reacts.  Research gives mixed advice, with this Stack Exchange post saying yes and this Reddit thread suggesting no.

I know now that the cluster averages about 7A.  In theory, I should have 30 hours capacity in the batteries I have now, if I get enough sun to keep them charged.

This I think, will be a week-end experiment, and maybe something I’ll try this weekend.  Right now, the cluster itself is running from my 40A switchmode PSU, and for now, it can stay there.

I’ll let the solar charger top the batteries up from the sun this week.  With no load, the batteries should be nice and full, ready come Friday evening, when I can, gracefully, bring the cluster down and hook it up to the solar charger load output.  If, at the end of the weekend, it’s looking healthy, I might be inclined to leave it that way.

Nov 122017
 

So, yesterday I had this idea of building an IC storage unit to solve a problem I was facing with the storage of IC tubes, and to solve an identical problem faced at HSBNE.

It turned out to be a longish project, and by 11:30PM, I had gotten far, but still had a bit of work to do.  Rather than slog it out overnight, I thought I’d head home and resume it the next day.  Instead of carting the lot home, and back again, I decided to leave my bicycle trailer with all the project gear and my laptop, stashed at HSBNE’s wood shop.

By the time I had closed up the shop and gotten going, it was after midnight.  That said, the hour of day was a blessing: there was practically no traffic, so I road on the road most of the way, including the notorious Kingsford-Smith Drive.  I made it home in record time: 1 hour, 20 minutes.  A record that stood until later this morning coming the other way, doing the run in 1:10.

I was exhausted, and was thinking about bed, but wheeling the bicycle up the drive way and opening the garage door, I caught a whiff.  What’s that smell?  Sulphur??

Remember last post I had battery trouble, so isolated the crook battery and left the “good” one connected?

The charger was going flat chat, and the battery case was hot!  I took no chances, I switched the charger off at the wall and yanked the connection to the battery wiring harness.  I grabbed some chemical handling gloves and heaved the battery case out.  Yep, that battery was steaming!  Literally!

This was the last thing I wanted to deal with at nearly 2AM on a Sunday morning.  I did have two new batteries, but hadn’t yet installed them.  I swapped the one I had pulled out last fortnight, and put in one of the new ones.  I wanted to give them a maintenance charge before letting them loose on the cluster.

The other dud battery posed a risk though, with the battery so hot and under high pressure, there was a good chance that it could rupture if it hadn’t already.  A shower of sulphuric acid was not something I wanted.

I decided there was nothing running on the cluster that I needed until I got up later today, so left the whole kit off, figuring I’d wait for that battery to cool down.

5AM, I woke up, checked the battery, still warm.  Playing it safe, I dusted off the 40A switchmode PSU I had originally used to power the Redarc controller, and plugged it directly into the cluster, bypassing the batteries and controller.  That would at least get the system up.

This evening, I get home (getting a lift), and sure enough, the battery has cooled down, so I swap it out with another of the new batteries.  One of the new batteries is charging from the mains now, and I’ll do the second tomorrow.

See if you can pick out which one is which…

Oct 282017
 

So, this morning I decided to shut the whole lot down and switch to the new solar controller.  There’s some clean-up work to be done, but for now, it’ll do.  The new controller is a Powertech MP3735.  Supposedly this one can deliver 30A, and has programmable float and bulk charge voltages.  A nice feature is that it’ll disconnect the load when it drops below 11V, so finding the batteries at 6V should be a thing of the past!  We’ll see how it goes.

I also put in two current shunts, one on the feed into/out of the battery, and one to the load.  Nothing is connected to monitor these as yet, but some research suggested that while in theory it is just an op-amp needed, that op-amp has to deal with microvolt differences and noise.

There are instrumentation amplifiers designed for that, and a handy little package is TI’s INA219B.  This incorporates aforementioned amplifier, but also adds to that an ADC with an I²C interface.  Downside is that I’ll need an MCU to poll it, upside is that by placing the ADC and instrumentation amp in one package, it should cut down noise, further reduced if I mount the chip on a board bolted to the current shunt concerned.  The ADC measures bus voltage and temperature as well.  Getting this to work shouldn’t be hard.  (Yes, famous last words I know.)

A few days ago, I also placed an order for some more RAM for the two compute nodes.  I had thought 8GB would be enough, and in a way it is, except I’ve found some software really doesn’t work properly unless it has 2GB RAM available (Gitea being one, although it is otherwise a fantastic Git repository manager).  By bumping both these nodes to 32GB each (4×8GB) I can be less frugal about memory allocations.

I can in theory go to 16GB modules in these boxes, but those were hideously expensive last time I looked, and had to be imported.  My debit card maxes out at $AU999.99, and there’s GST payable on anything higher anyway, so there goes that idea.  64GB would be nice, but 32GB should be enough.

The fun bit though, Kingston no longer make DDR3 ECC SO-DIMMs.  The mob I bought the last lot though informed me that the product is no longer available, after I had sent them the B-Pay payment.  Ahh well, I’ve tossed the question back asking what do they have available that is compatible.

Searching for ECC SODIMMs is fun, because the search engines will see ECC and find ECC DIMMs (i.e. full-size).  When looking at one of these ECC SODIMM unicorns, they’ll even suggest the full-size version as similar.  I’d love to see the salespeople try to fit the suggested full-size DIMM into the SODIMM socket and make it work!

The other thing that happens is the search engine sees ECC and see that that’s a sub-string of non-ECC.  Errm, yeah, if I meant non-ECC, I’d have said so, and I wouldn’t have put ECC there.

Crucial and Micron both make it though, here’s hoping mixing and matching RAM from different suppliers in the same bank won’t cause grief, otherwise the other option is I pull the Kingston sticks out and completely replace them.

The other thing I’m looking at is an alternative to OpenNebula.  Something that isn’t a pain in the arse to deploy (like OpenStack is, been there, done that), that is decentralised, and will handle KVM with a Ceph back-end.

A nice bonus would be being able to handle cross-architecture QEMU VMs, in particular for ARM and MIPS targets.  This is something that libvirt-based solutions do not do well.

I’m starting to think about ways I can DIY that solution.  Blockchain was briefly looked at, and ruled out on the basis that while it’d be good for an audit log, there’s no easy way to index it: reading current values would mean a full-scan of the blockchain, so not a solution on its own.

CephFS is stable now, but I’m not sure how file locking works on it.  Then there’s object storage itself, librados.  I’m not sure if there’s a database engine that can interface to that, or maybe to Amazon S3 storage (radosgw can emulate that), that’ll be the next step.  Lots to think about.

Oct 242017
 

So yeah, it seems history repeats itself.  The Redarc BCDC1225 is not reliable in switching between solar inputs and 12V input derived from the mains.

At least this morning’s wake-up call was a little later in the morning:

From: ipmi@hydrogen.ipmi.lan
To: stuartl@longlandclan.id.au
Subject: IPMI hydrogen.ipmi.lan
Message-Id: <20171023194305.72ECB200C625@atomos.longlandclan.id.au>
Date: Tue, 24 Oct 2017 05:43:05 +1000 (EST)

Incoming alert
IP : xxx.xxx.xxx.xxx
Hostname: hydrogen.ipmi.lan
SEL_TIME:"1970/01/27 02:03:00" 
SENSOR_NUMBER:"30"
SENSOR_TYPE:"Voltage          "
SENSOR_ID:"12V             " 
EVENT_DESCRIPTION:"Lower Critical going low                                         "
EVENT_DIRECTION:"Assertion  "
EVENT SEVERITY:"non-critical"

We’re now rigging up the Xantrex charger that I was using in early testing and will probably use that for mains. I have a box wired up with a mains SSR for switching power to it.  I think that’ll be the long-term plan and the Redarc charger will be retired from service, perhaps we might use it in some non-critical portable station.

Oct 222017
 

So I’ve now had the solar panels up for a month now… and so far, we’ve had a run of very overcast or wet days.

Figures… and we thought this was the “sunshine state”?

I still haven’t done the automatic switching, so right now the mains power supply powers the relay that switches solar to mains.  Thus the only time my cluster runs from solar is when either I switch off the mains power supply manually, or if there’s a power interruption.

The latter has not yet happened… mains electricity supply here is pretty good in this part of Brisbane, the only time I recall losing it for an extended period of time was back in 2008, and that was pretty exceptional circumstances that caused it.

That said, the political football of energy costs is being kicked around, and you can bet they’ll screw something up, even if for now we are better off this side of the Tweed river.

A few weeks back, with predictions of a sunny day, I tried switching off the mains PSU in the early morning and letting the system run off the solar.  I don’t have any battery voltage logging or current logging as yet, but the system went fine during the day.  That evening, I turned the mains back on… but the charger, a Redarc BCDC1225, seemingly didn’t get that memo.  It merrily let both batteries drain out completely.

The IPMI BMCs complained bitterly about the sinking 12V rail at about 2AM when I was sound asleep.  Luckily, I was due to get up at 4AM that day.  When I tried checking a few things on the Internet, I first noticed I didn’t have a link to the Internet.  Look up at the switch in my room and saw the link LED for the cluster was out.

At that point, some choice words were quietly muttered, and I wandered downstairs with multimeter in hand to investigate.  The batteries had been drained to 4.5V!!!

I immediately performed some load-shedding (ripped out all the nodes’ power leads) and power-cycled the mains PSU.  That woke the charger up from its slumber, and after about 30 seconds, there was enough power to bring the two Ethernet switches in the rack online.  I let the voltage rise a little more, then gradually started re-connecting power to the nodes, each one coming up as it was plugged in.

The virtual machine instances I had running outside OpenNebula came up just fine without any interaction from me, but  it seems OpenNebula didn’t see it fit to re-start the VMs it was responsible for.  Not sure if that is a misconfiguration, or if I need to look at an alternate solution.

Truth be told, I’m not a fan of libvirt either… overly complicated for starting QEMU VMs.  I might DIY a solution here as there’s lots of things that QEMU can do which libvirt ignores or makes more difficult than it should be.

Anyway… since that fateful night, I have on two occasions run the cluster from solar without incident.  On the off-chance though, I have an alternate charger which I might install at some point.  The downside is it doesn’t boost the 12V input like the other one, so I’d be back to using that Xantrex charger to charge from mains power.

Already, I’m thinking about the criteria for selecting a power source.  It would appear there are a few approaches I can take, I can either purely look at the voltages seen at the solar input and on the battery, or I can look at current flow.

Voltage wise, I tried measuring the solar panel output whilst running the cluster today.  In broad daylight, I get 19V off the panels, and at dusk it’s about 16V.

Judging from that, having the solar “turn on” at 18V and “turn off” at 15V seems logical.  Using the comparator approach, I’d need to set a reference of 16.5V and tweak the hysteresis to give me a ±3V swing.

However, this ignores how much energy is actually being produced from solar in relation to how much is being consumed.  It is possible for a day to start off sunny, then for the weather to cloud over.  Solar voltage in that case might be sitting at the 16V mentioned.

If the current is too low though, the cluster will drain more power out than is going in, and this will result in the exact conditions I had a few weeks ago: a flat battery bank.  Thus I’m thinking of incorporating current shunts both on the “input” to the battery bank, and to the “output”.  If output is greater than input, we need mains power.

There’s plenty of literature about interfacing to current shunts.  I’ll have to do some research, but immediately I’m thinking an op-amp running from the battery configured as a non-inverting DC gain block with the inputs going to either side of the current shunt.

Combining the approaches is attractive.  So turn on when solar exceeds 18V, turn off when battery output current exceeds battery input current.  A dual op-amp, a dual comparator, two current shunts, a R-S flip-flop and a P-MOSFET for switching the relay, and no hysteresis calculations needed.