Recently, I had a failure in the cluster, namely one of my nodes deciding to go the way of the dodo. I think I’ve mostly recovered everything from that episode.

I bought some new nodes which I can theoretically deploy as spare nodes, Core i5 Intel NUCs, and for now I’ve temporarily decommissioned one of my compute nodes (lithium) to re-purpose its motherboard to get the downed storage node back on-line. Whilst I was there, I went and put a new 2TB HDD in… and of course I left the 32GB RAM in, so it’s pretty much maxxed out.

I’d like to actually make use of these two new nodes, however I am out of switch capacity, with all 26 ports of the Linksys LGS-326AU occupied or otherwise reserved. I did buy a Netgear GS748T with the intention of moving across to it, but never got around to doing so.

The principle matter here being that the Netgear requires a wee bit more power. AC power ratings are 100-250V, 1.5A max. Now, presumably the 1.5A applies at the 100V scale, that’s ~150W. Some research suggested that internally, they run 12V, that corresponds to about 8.5A maximum current.

This is a bit beyond the capabilities of the MIC29712s.

I wound up buying a DC-DC power supply, an isolated one as that’s all I could get: the Meanwell SD-100A-12. This theoretically can take 9-18V in, and put out 12V at up to 8.5A. Perfect.

Due to lack of time, it sat there. Last week-end though, I realised I’d probably need to consider putting this thing to use. I started by popping open the cover and having a squiz inside. (Who needs warranties?)

I identified the power connections. A probe around with the multimeter revealed that, like the Linksys, it too had paralleled conductors. There were no markings on the PSU module, but un-plugging it from the mainboard and hooking up the multimeter whilst powering it up confirmed it was a 12V output, and verified the polarity. The colour scheme was more sane: Red/Yellow were positive, Black/Blue were negative.

I made a note of the pin-out inside the case.

There’s further DC-DC converters on-board near the connector, what their input range is I have no idea. The connector on the mainboard intrigued me though… I had seen that sort of connector before on ATX power supplies.

At the other end of the cable was a simple 4-pole “KK”-like connector with a wider pin spacing (I think ~3mm). Clearly designed with power capacity in mind. I figured I had three options:

1. Find a mating connector for the mainboard socket.
2. Find a mating header for the PSU connector.
3. Ram wires into the plug and hot-glue in place.

As it happens, option (1) turned out easier than I thought it would be. When I first bought the parts for the cluster, the PicoPSU modules came with two cables: one had the standard SATA and Molex power connectors for powering disk drives, the other came out to a 4-pin connector not unlike the 6-pole version being used in the switch.

Now you’ll note of those 6 poles, only 4 are actually populated. I still had the 4-pole connectors, so I went digging, and found them this evening.

As it happens, the connectors do fit un-modified, into the wrong 4 holes — if used unmodified, they would only make contact with 2 of the 4 pins. To make it fit, I had to do a slight modification, putting a small chamfer on one of the pins with a sharp knife.

The wire gauge is close to that used by the original cable, and the colour coding is perfect… black corresponds to 0V, yellow to +12V. I snipped off the JST-style connector at the other end.

I thought about pulling out the original PSU, but then realised that there was a small hole meant for a Kensington-style lock which I wasn’t using. No sharp edges, perfect for feeding the DC cables through. I left the original PSU in-situ, and just unplugged its DC output.

The DC input leads snake through the hole that Netgear helpfully provided.

Before putting the screws in, I decided to give this a test on the bench supply. The switch current fluctuates a bit when booting, but it seems to settle on about 1.75A or so. Not bad.

Terminating this, I decided to use XT-60 connectors. I wanted something other than the 30A “powerpoles” and their larger 50A cousins that are dotted throughout the cluster, as this needed to be regulated 12V. I did not want to get it mixed up with the raw 12V feed from the batteries.

I ran some heavier gauge cable to the DC-DC PSU, terminated with the mating XT-60 connector and hooked that up to my PSU. Providing it with 12V, I dialled the output to 12V exactly. I then gave it a no-load test: it held the output voltage pretty good.

Next, I hooked the switch up to the new PSU. It fired up and I measured the voltage now under load: it still remained at 12V. I wound the voltage down to 9V, then up to 15V… the voltage output never shifted. At 9V, the current consumption jumps up to about 3.5A, as one would expect.

Otherwise, it seemed to be content to draw under 2A so the efficiency of the DC-DC converter is pretty good.

I’ll need to wire in a new fuse box to power everything, but likely the plan will be to decommission the 16-port 100Mbps switch I use for the management network, slide the 48-port switch in its place, then gradually migrate everything across to the new switch.

Overall, the modding of this model switch was even less invasive than that of the Linksys. It’s 100% reversible. I dare say having posted this, there’ll be a GS748Tv6 that’ll move the 240V PSU to the mainboard, but for now at least, this is definitely a switch worth looking at if 12V operation is needed.

Right now, the cluster is running happily with a Redarc BCDC-1225 solar controller, a Meanwell HEP-600C-12 acting as back-up supply, a small custom-made ATTiny24A-based power controller which manages the Meanwell charger.

The earlier purchased controller, a Powertech MP-3735 now is relegated to the function of over-discharge protection relay.  The device is many times the physical size of a VSR, and isn’t a particularly attractive device for that purpose.  I had tried it recently as a solar controller, but it’s fair to say, it’s rubbish at it.  On a good day, it struggles to keep the battery above “rock bottom” and by about 2PM, I’ll have Grafana pestering me about the battery slipping below the 12V minimum voltage threshold.

Actually, I’d dearly love t rip that Powertech controller apart and see what makes it tick (or not in this case).  It’d be an interesting study in what they did wrong to give such terrible results.

So, if I pull that out, the question is, what will prevent an over-discharge event from taking place?  First, I wish to set some criteria, namely:

1. it must be able to sustain a continuous load of 30A
2. it should not induce back-EMF into either the upstream supply or the downstream load when activated or activated
3. it must disconnect before the battery reaches 10.5V (ideally it should cut off somewhere around 11-11.5V)
4. it must not draw excessive power whilst in operation at the full load

With that in mind, I started looking at options.  One of the first places I looked was of course, Redarc.  They do have a VSR product, the VS12 which has a small relay in it, rated for 10A, so fails on (1).  I asked on their forums though, and it was suggested that for this task, a contactor, the SBI12, be used to do the actual load shedding.

Now, deep inside the heart of the SBI12 is a big electromechanical contactor.  Many moons ago, working on an electric harvester platform out at Laidley for Mulgowie Farming Company, I recall we were using these to switch the 48V supply to the traction motors in the harvester platform.  The contactors there could switch 400A and the coils were driven from a 12V 7Ah battery, which in the initial phases, were connected using spade lugs.

One day I was a little slow getting the spade lug on, so I was making-breaking-making-breaking contact.  *WHACK*… the contactor told me in no uncertain terms it was not happy with my hesitation and hit me with a nice big back-EMF spike!  I had a tingling arm for about 10 minutes.  Who knows how high that spike was… but it probably is higher than the 20V absolute maximum rating of the MIC29712s used for power regulation.  In fact, there’s a real risk they’ll happily let such a rapidly rising spike straight through to the motherboards, frying about \$12000 worth of computers in the process!

Hence why I’m keen to avoid a high back-EMF.  Supposedly the SBI12 “neutralises” this … not sure how, maybe there’s a flywheel diode or MOV in there (like this), or maybe instead of just removing power in a step function, they ramp the current down over a few seconds so that the back-EMF is reduced.  So this isn’t an issue for the SBI12, but may be for other electromechanical contactors.

The other concern is the power consumption needed to keep such a beast activated.  The other factor was how much power these things need to stay actuated.  There’s an initial spike as the magnetic field ramps up and starts drawing the armature of the contactor closed, then it can drop down once contact has been made.  The figures on the SBI12 are ~600mA initially, then ~160mA when holding… give or take a bit.

I don’t expect this to be turned on frequently… my nodes currently have up-times around 172 days.  So while 600mA (7~8W at 12V nominal) is high, that’ll only be for a second at most.  Much of the current will be holding current at, let’s call it 200mA to be safe, so about 2~3W.

That 2-3W is going to be the same, whether my nodes collectively draw 10mA, 10A or 100A.

It seemed like a lot, but then I thought, what about a SSR?  You can buy a 100A DC SSR like this for a lot less money than the big contactors.  Whack a nice big heat-sink on it, and you’re set.  Well, why the heat-sink?  These things have a voltage drop and on resistance.  In the case of the Jaycar one, it’s about 350mV and the on resistance is about 7mΩ.

Suppose we were running flat chat at our predicted 30A maximum…

• MOSFET switch voltage drop: 30A × 350mV = 10.5W
• Ron resistance voltage drop: (30A)² × 7mΩ = 6.3W
• Total power dissipation: 10.5W + 6.3W = 16.8W OUCH!

16.8W is basically the power of an idle compute node.  The 3W of the SBI12 isn’t looking so bad now!  But can we do better?

The function of a solid-state relay, amongst other things, is to provide electrical isolation between the control and switching components.  The two are usually galvanically isolated.  This is a feature I really don’t need, so I could reduce costs by just using a bare MOSFET.

The earlier issues I had with the body diode won’t be a problem here as there’s a definite “source” and “load”, there’ll be no current to flow out of the load back to the source to confuse some sensing circuit on the source side.  This same body diode might be an issue for dual-battery systems, as the auxiliary battery can effectively supply current to a starter motor via this body diode, but in my case, it’s strictly switching a load.

I also don’t have inductive loads on my system, so a P-channel MOSFET is an option.  One candidate for this is the Infineon AUIRFS3004-7P.  The Ron on these is supposedly in the realm of 900µΩ-1.25mΩ, and of course, being that it’s a bare MOSFET and not a SSR, there’s no voltage drop.  Thus my power dissipation at 30A is predicted to be a little over 1W.

There are others too with even smaller Ron values, but they are in teeny tiny 5mm square surface-mount packages.  The AUIRFS3004-7P looks dead-buggable, just bend up the gate pin so I can solder direct to it, and treat the others as single “pins”, then strap the sucker to a big heatsink (maybe an old PIII heatsink will do the trick).

I can either drive this MOSFET with something of my own creation, or with the aforementioned Redarc VS12.  The VS12 still does contain a (much smaller) electromechanical relay, but at 30mA (~400mW), it’s bugger all.

The question though was what else could be done?  @WIRING_SOLUTIONS suggested some units made by Victron Energy.  These do have a nice feature in that they also have over-voltage protection, and conveniently, it’s 16V, which is the maximum recommended for the MIC29712s I’m using.  They’re not badly priced, and are solid-state.

However, what’s the Ron, what’s the voltage drop?  Victron don’t know.  They tell me it’s “minimal”, but is that 100nV, 100mV, 1V?  At 30A, 100mV drop equates to 3W, on par with the SBI12.  A 500mV drop would equate to a whopping 15W!

I had a look at the suppliers for Victron Energy products, and via those, found a few other contenders such as this one by Baintech and the Projecta LVD30.  I haven’t asked about these, but again, like the Victron BatteryProtect, neither of these list a voltage drop or Ron.

There’s also this one from Jaycar, but given this is the same place that sold me the Powertech MP-3735, and sold me the original Powertech MP-3089, provided a replacement for that first one, then also replaced the replacement under RMA.  The Jaycar VSR also has practically no specs… yeah, I think I’ll pass!

Whitworths marine sell this, it might be worth looking at but the cut-out voltage is a little high, and they don’t actually give the holding current (330mA “engage” current sounds like it’s electromechanical), so no idea how much power this would dissipate either.

The power controller isn’t doing a job dissimilar to a VSR… in fact it could be repurposed as one, although I note its voltage readings seem to drift quite a lot.  I suspect this is due to the choice of 5% tolerance resistors on the voltage sensing circuit and my use of the ~1.1V internal voltage reference.  The resistors will drift a little bit, and the voltage reference can be anywhere from 1.0 to 1.2V.

Would a LM311N with good quality 1% resistors and a quality voltage reference be “better”?  Who knows?  Maybe I should try an experiment, see if I can get minimal drift out of a LM311N.  It’s either the resistors, the voltage reference, or a combination of the two that’s responsible for the power controller’s drift.

Perhaps I need to investigate which is causing the problem and see what can be done in the design to reduce it.  If I can get acceptable results, then maybe the VS12 can be dispensed with.  I may be able to do it with another ATTiny24A, or even just a simple LM311N.

So last week, I came home to no power, which of course meant no Internet because the ADSL service is still on mains power.

This is something that’s been on my TO-DO list for a while now, and I’ve been considering how to go about it.

One way was to run 12V from the server rack to the study where the ADSL is. I’d power the study switch (a Cisco SG-208), the ADSL modem/router (a TP-Link TD-8817) and the border router (an Advantech UNO-1150G).

The border router, being a proper industrial PC is happy with any voltage between 9 and 32V, but will want up to 24W, so there’s 2A. The ADSL modem needs 5V 1A… easy enough, and the switch needs 12V, not sure what power rating. I’m not sure if it’ll take 15V, I’d be more comfortable putting it on an LDO like I did for the Linksys switch and the cluster nodes. (Thanks to @K.C. Lee for the suggestion on those LDOs.)

With all that, we’re looking at 3-4A of current at 12V, over a distance of about 5 metres. The 6 AWG cable I used to hook panels to solar controller is obviously massive overkill here, but CAT5e is not going to cut it… it needs to be something around the realm of 12 AWG… 20 at the smallest.

I have some ~14AWG speaker cable that could do it, but that sounds nasty.

The other approach is to move the ADSL. After finding a CAT3 6P4C keystone insert, I dug out some CAT5e (from a box that literally fell off the back of a truck), slapped my headlamp onto my hard hat, plonked that on my head and got to work.

It took me about an hour to install the new cable. I started by leaving the network-end unterminated, but with enough loose cable to make the distance… worked my way back to the socket location, cut my cable to length, fitted the keystone insert, then went back to the ADSL splitter and terminated the new run.

There was a momentary blip on the ADSL (or maybe that was co-incidence), then all was good.

After confirming I still had ADSL on the old socket, I shut down the router and ADSL modem, and re-located those to sit on top of the rack. Rather than cut new cables, I just grabbed a power board and plugged that in behind the rack, and plugged the router and modem into it. I rummaged around and found a suitably long telephone cable (with 6P6C terminations), and plugged that in. Lo and behold, after a minute or two, I had Internet.

The ugly bit though is that the keystone insert didn’t fit the panel I had, so for now, it’s just dangling in the air. No, not happy about that, but for now, it’ll do. At worst, it only has to last another 3 years before we’ll be ripping it out for the NBN.

The other 3 pairs on that CAT5e are spare.  If I want a 56kbps PSTN modem port, I can wire up one of those to the voice side of the ADSL splitter and terminate it here.

I think tomorrow, I’ll make up a lead that can power the border router directly from the battery.  I have two of these “LM2596HV” DC-DC converter modules.  I’m thinking put an assortment of capacitors (a few beefy electrolytics and some ceramics) to smooth out the DC output, and I can rummage around for a plug that fits the ADSL modem/router and adjust the supply for 5V.  I’ll daisy-chain this off the supply for the border router.

We’re slated for Hybrid Fibre Coax for NBN, when that finally arrives.  I’ll admit I am nowhere near as keen as I was on optic fibre.  Largely because the coax isn’t anywhere near as future-proofed, plus in the event of a lightning strike hitting the ground, optic fibre does not conduct said lightning strike into your equipment; anything metallic, will.

By moving the ADSL to here though, switching to the NBN in the next 12-24 months should be dead easy.  We just need to run it from the junction box outside, nailing it to the joists under the floor boards in our garage through to where the rack is.  No ceiling/wall cavities or confined spaces to worry about.  If the NBN modem needs a different voltage or connector, we just give that DC-DC converter a tweak and replace the output cable to suit.

We of course wait before switching the DC supply until after we’ve proven it working from mains power in the presence of the installer.  Keep the original PSU handy and intact for “debugging” purposes. 😉

There is an existing Foxtel cable, from the days when Foxtel was an analogue service, and I remember the ol’e tug-o-war the installer had with that cable.  It is installed in the lounge room, which is an utterly useless location for the socket, and given the abuse the cable suffered (a few channels were a bit marginal after install), I have no faith in it for an Internet connection.  Thus, a new cable would be best.  I’ll worry about that when the time comes.

On the power supply front… I have my replacement.  The big hold-up with installing it though is I’ll need to get a suicide lead wired up to the mains end, then I need to figure out some way to protect that from accidental contact.  There’s a little clear plastic cover that slips over the contacts, but it is minimal at best.

I’m thinking a 3D printed or molded two-part cover, one part which is glued to the terminal block and provides the anchor point for the second part which can house a grommet and screw into the first block.  That will make the mains end pretty much as idiot-resistant as it’s possible to be.  We’ll give that some thought over the weekend.

The other end, is 15V at most, I’m not nearly so worried about that, as it won’t kill you unless you do something incredibly stupid.

So, as promised, the re-design of the charge controller. … now under the the influence of a few glasses of wine, so this should be interesting…

As I mentioned in my last post, it was clear that the old logic just wasn’t playing nice with this controller, and that using this controller to maintain the voltage to the nodes below 13.6V was unrealistic.

The absolute limits I have to work with are 16V and 11.8V.

The 11.8V comes from the combination of regulator voltage drop and ATX PSU power range limits: they don’t operate below about 10.8V, if you add 700mV, you get 11.5V … you want to allow yourself some head room. Plus, cycling the battery that deep does it no good.

As for the 16V limit… this is again a limitation of the LDOs, they don’t operate above 16V. In any case, at 16V, the poor LDOs are dropping over 3V, and if the node is running flat chat, that equates to 15W of power dissipation in the LDO. Again, we want some headroom here.

The Xantrex charger likes pumping ~15.4V in at flat chat, so let’s go 15.7V as our peak.

Those are our “extreme” ranges.

At the lower end, we can’t disconnect the nodes, but something should be visible from the system firmware on the cluster nodes themselves, and we can thus do some proactive load shedding, hibernating virtual instances and preparing nodes for a blackout.

Maybe I can add a small 10Mbps Ethernet module to an AVR that can wake the nodes using WOL packets or IPMI requests. Perhaps we shut down two nodes, since the Ceph cluster will need 2/3 up, and we need at least one compute node.

At the high end, the controller has the ability to disconnect the charger.

So that’s worked out. Now, we really don’t want the battery getting that critically low. Thus the time to bring the charger in will be some voltage above the 11.8V minimum. Maybe about 12V… perhaps a little higher.

We want it at a point that when there’s a high load, there’s time to react before we hit the critical limit.

The charger needs to choose a charging source, switch that on, then wait … after a period check the voltage and see if the situation has improved. If there’s no improvement, then we switch sources and wait a bit longer. Wash, rinse, repeat. When the battery ceases to increase in voltage, we need to see if it’s still in need of a charge, or whether we just call it a day and run off the battery for a bit.

If the battery is around 14.5~15.5V, then that’s probably good enough and we should stop. The charger might decide this for us, and so we should just watch for that: if the battery stops charging, and it is at this higher level, just switch to discharge mode and watch for the battery hitting the low threshold.

Thus we can define four thresholds, subject to experimental adjustment:

 Symbol Description Threshold $V_{CH}$ Critical high voltage 15.7V $V_H$ High voltage 15.5V $V_L$ Low voltage 12.0V $V_{CL}$ Critical low voltage 11.8V

Now, our next problem is the waiting… how long do we wait for the battery to change state? If things are in the critical bands, then we probably want to monitor things very closely, outside of this, we can be more relaxed.

For now, I’ll define two time-out settings… which we’ll use depending on circumstances:

 Symbol Description Period $t_{LF}$ Low-frequency polling period 15 sec $t_{HF}$ High-frequency polling period 5 sec

In order to track the state, I need to define some variables… we shall describe the charger’s state in terms of the following variables:

 Symbol Description Initial value $V_{BL}$ Last-known battery voltage, set at particular points. 0V $V_{BN}$ The current battery voltage, as read by the ADC using an interrupt service routine. 0V $t_d$ Timer delay… a timer used to count down until the next event. $t_{HF}$ $S$ Charging source, an enumeration: 0: No source selected 1: Main charging source (e.g. solar) 2: Back-up charging source (e.g. mains power) 0

The variable names in the actual code will be a little more verbose and I’ll probably use #defines for the enumeration.

Below is the part-state-machine part-flow-chart diagram that I came up with. It took a few iterations to try and describe this accurately, I was going to use a state machine syntax similar to what my workplace uses, but in the end, found the ye olde flow chart shows it best.

In this diagram, a filled in dot represents the entry point, a dot with an X represents an exit point, and a dot in a circle represents a point where the state machine re-enters the state and waits for the main loop to iterate once more.

You’ll note that for this controller, we only care about one voltage, the battery voltage. That said, the controller will still have temperature monitoring duties, so we still need some logic to switch the ADC channel, throw away dummy samples (as per the datasheet) and manage sample storage. The hardware design does not need to change.

We can use quiescent voltages to detect the presence of a charging source, but we do not need to, as we can just watch the battery voltage rise, or not, to decide whether we need to take further action.

So, having knocked the regulation on the LDOs down a few pegs… I am closer to the point where I can leave the whole rig running unattended.

One thing I observed prior to the adjustment of the LDOs was that the controller would switch in the mains charger, see the voltage shoot up quickly to about 15.5V, before going HOLYCRAP and turning the charger off.

I had set a set point at about 13.6V based on two facts:

• The IPMI BMCs complained when the voltage raised above this point
• The battery is nominally 13.8V

As mentioned, I’m used to my very simple, slow charger, that trickle charges at constant voltage with maximum current output of 3A. The Xantrex charger I’m using is quite a bit more grunty than that. So re-visiting the LDOs was necessary, and there, I have some good results, albeit with a trade-off in efficiency.

Ahh well, can’t have it all.

I can run without the little controller, as right now, I have no panels. Well, I’ve got one, a 40W one, which puts out 3A on a good day. A good match for my homebrew charger to charge batteries in the field, but not a good match for a cluster that idles at 5A. I could just plug the charger directly into the battery and be done with it for now, defer this until I get the panels.

But I won’t.

I’ve been doing some thought about two things, the controller and the rack. On the rack, I found I can get a cheap one for \$200. That is cheap enough to be considered disposable, and while sure it’s meant for DJ equipment, two thoughts come to mind:

• AV equipment with all its audio transformers and linear power supplies, is generally not light
• It’s on wheels, meant to be moved around… think roadies and such… not a use case that is gentle on equipment

Thus I figure it’ll be rugged enough to handle what I want, and is open enough to allow good airflow. I should be able to put up to 3 AGM batteries in the bottom, the 3-channel charger bolted to the side, with one charge controller per battery. There are some cheap 30A schottky diodes, which would let me parallel the batteries together to form one redundant power supply.

Downside being that would drop about 20-25W through the diode. Alternatively, I make another controller that just chooses the highest voltage source, with a beefy capacitor bank to handle the switch-over. Or I parallel the batteries together, something I am not keen to do.

I spent some time going back to the drawing board on the controller. The good news, the existing hardware will do the job, so no new board needed. I might even be able to simplify logic, since it’s the battery voltage that matters, not the source voltages. But, right now I need to run. So perhaps tomorrow I’ll go through the changes. 😉

Okay, so now the searing heat of the day has dissipated a bit, I’ve dragged the cluster out and got everything running.

No homebrew charge controller, we have:

240v mains → 20A battery charger → battery → volt/current meter → cluster.

Here’s the volt meter, showing the battery voltage mid-charge:

… and this is what the IPMI sensor readings display…

Okay, the 3.3V and 5V rails are lower than I’d expect, but that’s the duty of the PSU/motherboard, not my direct problem.

The nodes also vary a bit… here’s another one. Same set-up, and this time I’ll show the thresholds (which are the same on all nodes):

Going forward… I need to get the cluster solar ready. Running it all from 12V is half the story, but I need to be able to manage switching between mains and solar.

The battery I am using at the moment is a second-hand 100Ah (so more realistically ~70Ah) AGM cell battery. I have made a simple charger for my LiFePO₄ packs that I use on the bicycle, there I just use a LM2576 switchmode regulator to put out a constant voltage at 3A and leave the battery connected to “trickle charge”. Crude, but it works. When at home, I use a former IBM laptop power supply to provide 16V 3A… when camping I use a 40W “12V” solar panel. I’m able to use either to charge my batteries.

The low output means I can just leave it running. 3A is well below the maximum inrush current capacity of the batteries I use (typically 10 or 20Ah) which can often handle more than 1C charging current.

Here, I’m using an off-the-shelf charger made by Xantrex, and it is significantly more sophisticated, using PWM control, multi-stage charging, temperature compensation, etc. It also puts out a good bit more power than my simple charger.

Consequently I see a faster rise in the voltage, and that is something my little controller will have to expect.

In short, I am going to need a more sophisticated state machine… one that leaves the cut-off voltage decision to the charger. One that notices the sudden drop from ~15V to ~14V and shuts off or switches only after it remains at that level for some time (or gets below some critical limit).

So… in the last test, I tried setting up the nodes with the ATTiny24A power controller attempting to keep the battery between 11.8 and 13.8V.

This worked… moreover it worked without any smoke signals being emitted.

The trouble was that the voltage on the battery shot up far faster than I was anticipating. During a charge, as much as 15.5V is seen across the battery terminals, and the controller was doing exactly as programmed in this instance, it was shutting down power the moment it saw the high voltage set-point exceeded.

This took all of about 2 seconds. Adding a timeout helped, but it still cycled on-off-on-off over a period of 10 seconds or so. Waay too fast.

So I’m back to making the nodes more tolerant of high voltages.

The MIC29712s are able to tolerate up to 16V being applied with peaks at 20V, no problem there, and they can push 7.5A continuous, 15A peak. I also have them heatsinked, and the nodes so far chew a maximum of 3A.

I had set them up to regulate down to approximately 13.5V… using a series pair of 2.7kΩ and 560Ω resistors for R1, and a 330Ω for R2. Those values were chosen as I had them on hand… 5% tolerance ¼W carbon film resistors. Probably not the best choice… I wasn’t happy about having two in series, and in hindsight, I should have considered the possibility of value swing due to temperature.

Thinking over the problem over the last week or so… the problem seemed to lay in this set point: I was too close to the upper bound, and so the regulator was likely to overshoot it. I needed to knock it back a peg. Turns out, there were better options for my resistor selections without resorting to a trim pot.

Normally I stick to the E12 range, which I’m more likely to have laying around. The E12 series goes …2.7, 3.3, 3.9, 4.7, 5.6… so the closest I could get was by combining resistors. The E24 range includes values like 3.0 and 3.6.

Choosing R1=3.6kΩ and R2=390Ω gives Vout ~= 12.7V. Jaycar sell 1% tolerance packs of 8 resistors at 55c each. While I was there today, I also picked up some 10ohm 10W wire wound resistors… before unleashing this on an unsuspecting AU\$1200 computer, I’d try it out with a dummy load made with four of these resistors in parallel… making a load that would consume about 5A for testing.

Using a variable voltage power supply, I found that the voltage could hit 12.7V but no higher… and was at worst .7V below the input. Good enough.

At 16V, the regulator would be dropping 3.3V, passing a worst case 3A current for a power dissipation of 9W out of the total 48W consumption. About 80% efficiency.

Not quite what I had hoped for… but this is a worst case scenario, with the nodes going flat chat and the battery charger pumping all the electrons it can. The lead acid battery has a nominal voltage of 13.8V… meaning we’re dropping 1.1V.

On a related note, I also overlooked this little paragraph in the motherboard handbook:

(*Do not use the 4-pin DC power @J1 when the 24-pin ATX Power @JPW1 is connected to the power supply. Do not plug in both J1 and JPW1 at the same time.)

Yep, guess what I’ve done. Being used to motherboards that provide both and needed both, I plugged them both in.

No damage done as all nodes work fine… (or they did last time I tried them… yet to fire them up since this last bit of surgery). It is possible there is no isolation between the on-motherboard PSU and the external ATX one and that if you did plug in power from two differing sources, you could get problems.

In a way if I had spotted this feature before, I could have done without those little PSUs after all, just needing a Molex-style power adaptor cable to plug into the motherboard.

Still… this works, so I’m not changing it. I have removed that extra connection though, and they’ve been disconnected from the PSUs so they won’t cause confusion in future.

I might give this a try when things cool down a bit … BoM still reports it being about 32°C outside (I have a feeling where I live is a few degrees hotter than that) and so I don’t feel energetic enough to drag my cluster out to the workbench just now. (Edit: okay, I know…those in NSW are facing far worse. Maybe one of the mob in New Holland should follow the advice of Crowded House and take the weather with them over here to the east coast! Not all of it of course, enough to cool us off and reduce their flood.)

Well, I’ve finally dragged this project out and plugged everything in to test the new power modules out.

I’ll be hooking up the laptop and getting the nodes to do something strenuous in a moment, but for now, I have them just idling on the battery, with the battery charger being switched by the charge controller, built around an ATTiny24A and this time, a separate power module rather than it being integrated.

I’ve had it going for a few hours now… and so far, so good. The PSU is getting turned on and off more often than I’d like, but at least the smoke isn’t escaping now. The heatsink for the power modules is warm, but still not at the “burn your fingers off” stage.

That to me suggests the largish heatsink was the right one to use.

Two things I need to probably address though:

• In spite of the LDOs, the acceptable voltage range of the computers is still rather narrow… I’m not sure if it’s just the IPMI BMC being fussy or if the LDOs need to be knocked down a peg to keep the voltage within limits. Perhaps I should use the same resistor values as I did for the Ethernet switch.
• The thresholds seem to get reached very quickly which means the timeouts still need lengthening. Addressing the LDO settings should help with this, as it’ll mean I can bump my thresholds higher.

If I can nail those last two issues, then I might be at risk of having the hardware aspect of this project done and having a workable cluster to do the software side of the project. Shock horror!

So, late yesterday afternoon, I devised a light test of the controller to see how it would perform.

For this I disconnected all but one of the nodes, and hooked up one of my old 10Ah LiFePO₄ packs and my 3A charger hooked to mains. The LM2576-based charger is just able to hold this load and provide 1A charging current.

The first thing I noticed is that the fan seemed to turn on and off a lot… this could be a difference in the temperature sensors between the DIP version of the ATTiny24A that the prototype used and the SOIC version which the new controller used.

The test ran overnight. The node basically was idling, as were the two Ethernet switches. But, it served the purpose. I now know the logic is sound, although I might want to adjust my set-points a little.

That’s the output data from a small digital power meter that was hooked up in circuit. This device is unable to display negative current, so the points at which the battery was charging is shown as 0A. Left axis is voltage, right is current. You can see that the charger gets brought in when the battery dips below 12V and clicks off just before 13.2V.

I can probably go a little higher than that, maybe about 13.6V. I may also need to re-visit the fixed resistor settings on the linear regs inside the nodes to knock them down a few more pegs to prevent the BMCs whining about the high voltage.

Next weekend, I might consider hooking up the 20A mains charger and giving it a full load test.

Well, I finally got busy with the soldering iron again. This time, installing the regulators in the cluster nodes and in the 26-port switch.

I had a puzzle as to where to put the regulator, I didn’t want it exposed, as they’re a static-sensitive device, so better to keep them enclosed. It needed somewhere where the air would be flowing, and looking around, I found the perfect spot, just in behind the CPU heatsink. There’s a small gap where the air will be flowing past to cool the CPU, and it’s sufficiently near the ATX PSU to feed the power cabling past.

I found I was able to tap M3 threads into the tops of the heatsinks and fix them to the “front” of the case near where the DIN rail brackets fit in. So from the outside, it looks all neat and tidy.

After installing those, I turned my attention to the switch. Now I had an educated guess that the switch would be stepping down from 12V, so being close to that was not so critical, however going above it would stretch the friendship.

Rather than feeding it 13.1V like the compute nodes, I decided I’d find some alternate resistor values that’d be closer to 12V. Those wound up being R1=3.3kΩ and R2=390Ω, which gave about 11.8V. Close enough. It was then a matter of polarity. The wiring inside this switch uses a non-standard colour code, and as I suspected, the conductors are just paralleled, it’s the one feed of 12V.

Probing with a multimeter revealed the pin pairs were shorted, and removing the PSU confirmed this. I pulled out the switch mainboard and probed around the electrolytics which had their negative sides marked. Sure enough, it’s the Australian Olympic team colours that give away the 0V side.

I’ve shown the original colour code here as coloured dots, but essentially, green and yellow are the 0V side, and red and black are the +12V side. So I had everything necessary. I grabbed a bit of scrap PCB, used the old PSU as a template for drilling out the holes, used a hacksaw to divide the PCB surface up then dead-bugged the rest. To position the heatsink, I drilled a 3mm hole in the bottom of the case and screwed a 10mm M3 stand-off there. Yes, this means there’s an annoying lump on the bottom, I should use a countersunk M3 screw, I’ll fix that later if it bothers me, I’ll be rack-mounting it anyway.

On the input to the regulator, I have a 330uF electrolytic capacitor and 100nF monolithic capacitor in parallel, on the output, it’s a 470uF and a 100nF. A third 100nF hooks the adjust pin to 0V to reduce noise. I de-soldered the original PSUs socket and used that on the new board. It fits beautiful. 100-240V? Not any more Linksys.

So now, the whole lot runs off a single 12V battery supply. The remainder of this project is the charging of that battery and the software configuration of the cluster.

At present, the whole cluster’s doing an `emerge @system`, with distcc running, and drawing about 7.5A with the battery sitting at 12.74V (~95W). Edit: Now that they’ve properly fired up, I’m seeing a drain of 10.3A (126W). Looks that’s going to be the “worst case scenario”.