Mar 112017
 

So, as promised, the re-design of the charge controller. … now under the the influence of a few glasses of wine, so this should be interesting…

As I mentioned in my last post, it was clear that the old logic just wasn’t playing nice with this controller, and that using this controller to maintain the voltage to the nodes below 13.6V was unrealistic.

The absolute limits I have to work with are 16V and 11.8V.

The 11.8V comes from the combination of regulator voltage drop and ATX PSU power range limits: they don’t operate below about 10.8V, if you add 700mV, you get 11.5V … you want to allow yourself some head room. Plus, cycling the battery that deep does it no good.

As for the 16V limit… this is again a limitation of the LDOs, they don’t operate above 16V. In any case, at 16V, the poor LDOs are dropping over 3V, and if the node is running flat chat, that equates to 15W of power dissipation in the LDO. Again, we want some headroom here.

The Xantrex charger likes pumping ~15.4V in at flat chat, so let’s go 15.7V as our peak.

Those are our “extreme” ranges.

At the lower end, we can’t disconnect the nodes, but something should be visible from the system firmware on the cluster nodes themselves, and we can thus do some proactive load shedding, hibernating virtual instances and preparing nodes for a blackout.

Maybe I can add a small 10Mbps Ethernet module to an AVR that can wake the nodes using WOL packets or IPMI requests. Perhaps we shut down two nodes, since the Ceph cluster will need 2/3 up, and we need at least one compute node.

At the high end, the controller has the ability to disconnect the charger.

So that’s worked out. Now, we really don’t want the battery getting that critically low. Thus the time to bring the charger in will be some voltage above the 11.8V minimum. Maybe about 12V… perhaps a little higher.

We want it at a point that when there’s a high load, there’s time to react before we hit the critical limit.

The charger needs to choose a charging source, switch that on, then wait … after a period check the voltage and see if the situation has improved. If there’s no improvement, then we switch sources and wait a bit longer. Wash, rinse, repeat. When the battery ceases to increase in voltage, we need to see if it’s still in need of a charge, or whether we just call it a day and run off the battery for a bit.

If the battery is around 14.5~15.5V, then that’s probably good enough and we should stop. The charger might decide this for us, and so we should just watch for that: if the battery stops charging, and it is at this higher level, just switch to discharge mode and watch for the battery hitting the low threshold.

Thus we can define four thresholds, subject to experimental adjustment:

Symbol Description Threshold
V_{CH} Critical high voltage 15.7V
V_H High voltage 15.5V
V_L Low voltage 12.0V
V_{CL} Critical low voltage 11.8V

Now, our next problem is the waiting… how long do we wait for the battery to change state? If things are in the critical bands, then we probably want to monitor things very closely, outside of this, we can be more relaxed.

For now, I’ll define two time-out settings… which we’ll use depending on circumstances:

Symbol Description Period
t_{LF} Low-frequency polling period 15 sec
t_{HF} High-frequency polling period 5 sec

In order to track the state, I need to define some variables… we shall describe the charger’s state in terms of the following variables:

Symbol Description Initial value
V_{BL} Last-known battery voltage, set at particular points. 0V
V_{BN} The current battery voltage, as read by the ADC using an interrupt service routine. 0V
t_d Timer delay… a timer used to count down until the next event. t_{HF}
S Charging source, an enumeration:

  • 0: No source selected
  • 1: Main charging source (e.g. solar)
  • 2: Back-up charging source (e.g. mains power)
0

The variable names in the actual code will be a little more verbose and I’ll probably use #defines for the enumeration.

Below is the part-state-machine part-flow-chart diagram that I came up with. It took a few iterations to try and describe this accurately, I was going to use a state machine syntax similar to what my workplace uses, but in the end, found the ye olde flow chart shows it best.

In this diagram, a filled in dot represents the entry point, a dot with an X represents an exit point, and a dot in a circle represents a point where the state machine re-enters the state and waits for the main loop to iterate once more.

You’ll note that for this controller, we only care about one voltage, the battery voltage. That said, the controller will still have temperature monitoring duties, so we still need some logic to switch the ADC channel, throw away dummy samples (as per the datasheet) and manage sample storage. The hardware design does not need to change.

We can use quiescent voltages to detect the presence of a charging source, but we do not need to, as we can just watch the battery voltage rise, or not, to decide whether we need to take further action.

Feb 112017
 

So… in the last test, I tried setting up the nodes with the ATTiny24A power controller attempting to keep the battery between 11.8 and 13.8V.

This worked… moreover it worked without any smoke signals being emitted.

The trouble was that the voltage on the battery shot up far faster than I was anticipating. During a charge, as much as 15.5V is seen across the battery terminals, and the controller was doing exactly as programmed in this instance, it was shutting down power the moment it saw the high voltage set-point exceeded.

This took all of about 2 seconds. Adding a timeout helped, but it still cycled on-off-on-off over a period of 10 seconds or so. Waay too fast.

So I’m back to making the nodes more tolerant of high voltages.

The MIC29712s are able to tolerate up to 16V being applied with peaks at 20V, no problem there, and they can push 7.5A continuous, 15A peak. I also have them heatsinked, and the nodes so far chew a maximum of 3A.

I had set them up to regulate down to approximately 13.5V… using a series pair of 2.7kΩ and 560Ω resistors for R1, and a 330Ω for R2. Those values were chosen as I had them on hand… 5% tolerance ¼W carbon film resistors. Probably not the best choice… I wasn’t happy about having two in series, and in hindsight, I should have considered the possibility of value swing due to temperature.

Thinking over the problem over the last week or so… the problem seemed to lay in this set point: I was too close to the upper bound, and so the regulator was likely to overshoot it. I needed to knock it back a peg. Turns out, there were better options for my resistor selections without resorting to a trim pot.

Normally I stick to the E12 range, which I’m more likely to have laying around. The E12 series goes …2.7, 3.3, 3.9, 4.7, 5.6… so the closest I could get was by combining resistors. The E24 range includes values like 3.0 and 3.6.

Choosing R1=3.6kΩ and R2=390Ω gives Vout ~= 12.7V. Jaycar sell 1% tolerance packs of 8 resistors at 55c each. While I was there today, I also picked up some 10ohm 10W wire wound resistors… before unleashing this on an unsuspecting AU$1200 computer, I’d try it out with a dummy load made with four of these resistors in parallel… making a load that would consume about 5A for testing.

Using a variable voltage power supply, I found that the voltage could hit 12.7V but no higher… and was at worst .7V below the input. Good enough.

At 16V, the regulator would be dropping 3.3V, passing a worst case 3A current for a power dissipation of 9W out of the total 48W consumption. About 80% efficiency.

Not quite what I had hoped for… but this is a worst case scenario, with the nodes going flat chat and the battery charger pumping all the electrons it can. The lead acid battery has a nominal voltage of 13.8V… meaning we’re dropping 1.1V.

On a related note, I also overlooked this little paragraph in the motherboard handbook:

(*Do not use the 4-pin DC power @J1 when the 24-pin ATX Power @JPW1 is connected to the power supply. Do not plug in both J1 and JPW1 at the same time.)

Yep, guess what I’ve done. Being used to motherboards that provide both and needed both, I plugged them both in.

No damage done as all nodes work fine… (or they did last time I tried them… yet to fire them up since this last bit of surgery). It is possible there is no isolation between the on-motherboard PSU and the external ATX one and that if you did plug in power from two differing sources, you could get problems.

In a way if I had spotted this feature before, I could have done without those little PSUs after all, just needing a Molex-style power adaptor cable to plug into the motherboard.

Still… this works, so I’m not changing it. I have removed that extra connection though, and they’ve been disconnected from the PSUs so they won’t cause confusion in future.

I might give this a try when things cool down a bit … BoM still reports it being about 32°C outside (I have a feeling where I live is a few degrees hotter than that) and so I don’t feel energetic enough to drag my cluster out to the workbench just now. (Edit: okay, I know…those in NSW are facing far worse. Maybe one of the mob in New Holland should follow the advice of Crowded House and take the weather with them over here to the east coast! Not all of it of course, enough to cool us off and reduce their flood.)

Apr 232016
 

Well, I finally got busy with the soldering iron again. This time, installing the regulators in the cluster nodes and in the 26-port switch.

I had a puzzle as to where to put the regulator, I didn’t want it exposed, as they’re a static-sensitive device, so better to keep them enclosed. It needed somewhere where the air would be flowing, and looking around, I found the perfect spot, just in behind the CPU heatsink. There’s a small gap where the air will be flowing past to cool the CPU, and it’s sufficiently near the ATX PSU to feed the power cabling past.

I found I was able to tap M3 threads into the tops of the heatsinks and fix them to the “front” of the case near where the DIN rail brackets fit in. So from the outside, it looks all neat and tidy.

After installing those, I turned my attention to the switch. Now I had an educated guess that the switch would be stepping down from 12V, so being close to that was not so critical, however going above it would stretch the friendship.

Rather than feeding it 13.1V like the compute nodes, I decided I’d find some alternate resistor values that’d be closer to 12V. Those wound up being R1=3.3kΩ and R2=390Ω, which gave about 11.8V. Close enough. It was then a matter of polarity. The wiring inside this switch uses a non-standard colour code, and as I suspected, the conductors are just paralleled, it’s the one feed of 12V.

Probing with a multimeter revealed the pin pairs were shorted, and removing the PSU confirmed this. I pulled out the switch mainboard and probed around the electrolytics which had their negative sides marked. Sure enough, it’s the Australian Olympic team colours that give away the 0V side.

I’ve shown the original colour code here as coloured dots, but essentially, green and yellow are the 0V side, and red and black are the +12V side. So I had everything necessary. I grabbed a bit of scrap PCB, used the old PSU as a template for drilling out the holes, used a hacksaw to divide the PCB surface up then dead-bugged the rest. To position the heatsink, I drilled a 3mm hole in the bottom of the case and screwed a 10mm M3 stand-off there. Yes, this means there’s an annoying lump on the bottom, I should use a countersunk M3 screw, I’ll fix that later if it bothers me, I’ll be rack-mounting it anyway.

On the input to the regulator, I have a 330uF electrolytic capacitor and 100nF monolithic capacitor in parallel, on the output, it’s a 470uF and a 100nF. A third 100nF hooks the adjust pin to 0V to reduce noise. I de-soldered the original PSUs socket and used that on the new board. It fits beautiful. 100-240V? Not any more Linksys.

So now, the whole lot runs off a single 12V battery supply. The remainder of this project is the charging of that battery and the software configuration of the cluster.

At present, the whole cluster’s doing an `emerge @system`, with distcc running, and drawing about 7.5A with the battery sitting at 12.74V (~95W). Edit: Now that they’ve properly fired up, I’m seeing a drain of 10.3A (126W). Looks that’s going to be the “worst case scenario”.

Apr 162016
 

I figured, rather than letting these loose directly on the nodes themselves, I’d give them a try with a throw-away dummy load. For this, I grabbed an old Philips car cassette player that’s probably older than I am and hooked that up. I shoved some old cassette in.

The datasheet for the regulators defines the output voltage as: V_{OUT}=1.240 \big({R_1 \over R_2} + 1\big)

Playing with some numbers, I came out with R1 being a 2.7kΩ and 560Ω resistors in series, and R2 being a 330Ω. So I scratched around for those resistors, grabbed one of the MIC29172s and hooked it all up for a test.

The battery here is not the one I’ll use in the cluster ultimately, I have a 100Ah AGM cell battery for that. The charger seen there is what will be used, initially as a sole power source, then in combination with solar when I get the panels. It’s capable of producing 20A of current, and runs off mains power.

This is the power drain from the battery, with the charger turned on.

Not as much as I thought it’d be, but still a moderate amount.

This is what the output side of the regulator looked like:

So from 14.8V down to 13.1V. It also showed 13.1V when I had the charger unplugged, so it’s doing its job pretty well I think. That’s a drop of 1.7V, so dissipating about 600mW. Efficiency is therefore about 93%, not bad for linear regulators.

Apr 092016
 

So, I’ve been doing a bit of research about how I can stabilise the battery voltage which will drift between around 11V and 14.6V. It’s a deep-cycle type battery, so it’s actually capable of going down to 10V, but I really don’t want to push it that far.

Once I get below 12V, that’s the time to signal to the VM hosts to start hibernating the VMs and preparing for a blackout, until such time as the voltage picks back up again.

The rise above 13.5V is a challenge due to the PicoPSU limitations. @Vlad Conut rightly pointed out that the M3-ATX-HV PSUs sold by the same company would have been a better choice. For about $20 more, or an extra $100 for the cluster, I’d have something that’d work 6-30V. I’d still have to solve the problem with the switch, but it’d just be that one device, not 6.

Maybe it was because they were out of stock that I went the PicoPSU route, I also wasn’t sure about power demands, I knew the CPU needed 20W, but wasn’t sure about everything else. So I over-dimensioned everything. Hindsight is 20:20.

One option I considered was a regulator in front of each node. I had mentioned the LM7812 because I knew of it. I also knew it was a crap choice for this task, the 1.5V drop, with a 5A load would result in about 7.5W dissipated thermally. So 20W would jump to nearly 28W — not good.

That of course assumes a 7812 would handle 5A, which it won’t.

LDOs were the way to go for anything linear, otherwise I was looking at a switchmode power supply. The LM2576 has similar requirements to the LM7812, but is much more efficient being a buck converter. If 1.5V was fine, I’d be looking for a 5A-capable equivalent.

The other option would be to have one single power supply regulate power for all nodes. I mentioned in my previous log about the Redarc DC-DC power supply, and that is certainly still worthy of consideration. It is a single point of failure however, but then again, Redarc aren’t known for making crap, and the unit has a 2 year warranty anyway. I’d have downtime, but would not lose data if this went down.

@K.C. Lee pointed me to one LDO that would be a good candidate though, and is cheap enough to be worth the experiment: the Micrel MIC29750. 7.5A current, and comes in an adjustable form up to 16V. I’d imagine if I set this near 13.5V, it’d dissipate maybe 2.5W max at 5A, or 1W at 2A. Much better.

Not as good as Redarc’s solution of course, and that’s still an option, but cheap enough to try out.

Apr 072016
 

Well, I’ve been researching the problem. I have a battery that could be floating anywhere from 10V to 14.6V, depending on the input from the charger.

I have a computer PSU, that is not happy with voltages outside of 10.5V—13.5V.

What are my options?

  • Linear regulator: the standard ones have a 1.5V drop across them, which at the full rated input current of the PSU, 8A is 12W. Per node.
  • Low drop-out regulators get a little lower, but I’d still lose a few watts per node.
  • Buck converters can do better, but a lot still need at least 1V difference.

So I really need to boost it first, then regulate down. One thing I was not looking forward to, was designing then winding the transformer/inductor needed. An off-the-shelf solution therefore seems attractive, even if I miss out on kudos points for a DIY solution.

Redarc make a couple, and this unit looks like it’ll do exactly what is needed. Not cheap, but it seems comparable to what I’ve seen elsewhere. I’ll have a look and see what else there is, but this might be the most time-economical way to solve the problem and the efficiency is pretty good.

Apr 032016
 

Of course, there’s always something there to throw a spanner in the works, and for me, it’s the PicoPSU.

It seems to work great, however, there is an Achilles heel with these things: they have a fairly narrow band of tolerable voltages they’ll operate at. Namely 10.5—13.5V.

Now, 10.5V is fair enough, but 13.5V? Typical lead acid batteries are 13.8V nominal voltage, and will get to 14.5V when charging. So I need some preregulator that will handle when the voltage is up around 13.5V or above, and drop it down just a little, passing through up to 2A (5A to be safe).

It still has to be stable when the current changes, “turned off” on these computers means a drain of about 200mA for the IPMI. So our operating parameters are summed up as 10.5—13.5V and 200mA—5A.

It needs to continue operating when the battery gets to ~11.5V.

So what are my options?

  • LM2576 simple switcher? 12V in will produce 10.5V out.
  • LM7812 has the same problem, and will chew more power.

A series regulator built on a zener/NPN might work. The voltage drop across the NPN ordinarily is going to be fairly low, however there will still be a drop of about 0.7V or so. That’s possibly “good enough”, since at 11.5V input, we should still see about 10.8V out which is within range.

Two diodes in series, with a relay to short them out when the voltage drops below 12V would work too. That’d need a comparator and voltage reference to drive the relay. It’s a cheap solution too.

Another prospect is a beefy DC-DC converter on the battery, so we don’t actually care what the battery voltage is, we boost it say to 15V then regulate it back down to 12V. A 30A-capable flyback or boost-buck converter would do it. This is more complex, and much more expensive to do off-the-shelf, so I think that’d be a method of last resort.