Jun 142018

So, last Sunday we did a trip up the Brisbane Valley to do a rekkie for the Yarraman to Wulkuraka bike ride that Brisbane WICEN will be assisting in at the end of next month.

The area is known to be quite patchy where phone reception is concerned, with Linville shown to be highly unreliable… Telstra recommends external antennas are required to get any sort of service.  So it seemed a good place to take the Kite and try it out in a weak signal area.

3G coverage in Linville, with external antenna.

4G coverage in Linville, with external antenna.

4GX coverage in Linville, with external antenna.

Sadly, I didn’t get as much time as I would have liked to perform these tests, and it would have been great to compare against a few others… but I was able to take some screenshots on the way up of the three phones, all on the same network (Telstra), using their internal antennas (and the small whip in the case of the Kite).  However, we got there in the afternoon, and there were clouds gathering, so we had to get to Moore.

In any case, Telstra seems to have pulled their socks up since those maps were updated… as I found I was getting reasonable coverage on the T83.  The Kite was in the car at the time, I didn’t want it getting damaged if I came off the bike or if the heavens opened up.

I did manage to take some screenshots on the three phones on the way up.

This is not that scientific, and a bit crude since I couldn’t take the screenshots at exactly the same moment.  Plus, we were travelling at 100km/hr for much of the run.  There was one point where we stopped for breakfast at Fernvale, I can’t recall exactly what time that was or whether I got a screenshot from all three phones at that time.

The T84 is the only phone out of the three that can do the 4GX 700MHz band.

Time ZTE T83 ZTE T84 iSquare Mobility Kite v1 Notes
2018-06-10T06:08:16 t83 at 2018-06-10T06:08:16 Leaving Brisbane
2018-06-10T06:09:24 kite at 2018-06-10T06:09:24
2018-06-10T06:09:33 t83 at 2018-06-10T06:09:33
2018-06-10T06:26:17 t83 at 2018-06-10T06:26:17
2018-06-10T06:26:25 kite at 2018-06-10T06:26:25
2018-06-10T07:30:27 t84 at 2018-06-10T07:30:27 A rare moment where the T84 beats the others.  My guess is this is a 4GX (700MHz) cell.
2018-06-10T07:30:34 kite at 2018-06-10T07:30:34
2018-06-10T07:30:39 t83 at 2018-06-10T07:30:39
2018-06-10T07:41:48 kite at 2018-06-10T07:41:48
2018-06-10T07:41:54 t84 at 2018-06-10T07:41:54 HSPA coverage… one of the few times we see the T84 drop back to 3G.
2018-06-10T07:42:01 t83 at 2018-06-10T07:42:01
2018-06-10T07:51:34 t83 at 2018-06-10T07:51:34 Patchy coverage at times en route to Moore.
2018-06-10T07:51:45 kite at 2018-06-10T07:51:45
2018-06-10T08:24:57 kite at 2018-06-10T08:24:57 For grins, trying out Optus coverage on the Kite at Moore.  There’s a tower at Benarkin, not sure if there’s one closer to Moore.
2018-06-10T08:25:39 kite at 2018-06-10T08:25:39
2018-06-10T08:54:28 t84 at 2018-06-10T08:54:28
2018-06-10T08:54:35 kite at 2018-06-10T08:54:35 En route to Benarkin, we lose contact with Telstra on all three devices.
2018-06-10T08:54:39 t83 at 2018-06-10T08:54:39
2018-06-10T09:35:14 kite at 2018-06-10T09:35:14 In Benarkin.
2018-06-10T09:35:22 t83 at 2018-06-10T09:35:22
2018-06-10T10:25:27 kite at 2018-06-10T10:25:27
2018-06-10T10:25:48 t83 at 2018-06-10T10:25:48

So what does the above show?  Well, for starters, it is apparent that the T83 gets left in the dust by both devices.  This is interesting as my T83 definitely was the more reliable on our last trip into the Snowy Mountains, regularly getting a signal in places where the T84 failed.

Two spots I’d love to take the Kite would be Dumboy Creek (4km outside Delungra on the Gwydir Highway) and Sawpit Creek (just outside Jindabyne), but both are a bit far for a day trip!  It’s unlikely I’ll be venturing that far south again this year.

On this trip up the Brisbane Valley though, I observed that when the signal got weak, the Kite was more willing to drop back to 3G, whereas the two ZTE phones hung onto that little scrap of 4G.  Yes, 4G might give clearer call quality and faster speeds in ideal conditions, but these conditions are not ideal, we’re in fringe coverage.

The 4G standards use much more dense forms of modulation (QPSK, 16-QAM or 64-QAM) than 3G (QPSK only) trading off spectral efficiency for signal-to-noise performance, thus lean more heavily on forward error correction to achieve communications in adverse conditions.  When a symbol is corrupted, more data is lost with these standards.  3G might be slower, but sometimes slow and steady wins the race, fast and flaky is a recipe of frustration.

A more scientific experiment, where we are stationary, and can let each device “settle” before taking a reading, would be worthwhile.  Without a doubt, the Kite runs rings around the T83.  The T84 is less clear: the T84 and the Kite both run the same chipset; the Qualcomm MSM8916.  The T83 runs the older MSM8930.

By rights, the T84 and Kite should perform nearly identical, with the Kite having the advantage of a high-gain whip antenna instead of a more conventional patch panel antenna.  The only edge the T84 has, is the 700MHz band, which isn’t that heavily deployed here in Australia right now.

The T83 and T84 can take an external antenna, but the socket is designed for cradle use and isn’t as rugged or durable as the SMA connector used on the Kite.  It’s soldered to the PCB, and when a cable is plugged in, it disconnects the internal antenna.

Thus damage to this connector can render these phones useless.  The SMA connector on the Kite however is a pigtail to an IPX socket inside … a readily available off-the-shelf (mail-order) part.  People may not like the whip sticking out though.

The Kite does ship with a patch antenna, which is about 75% efficient; so maybe 0dBi at best, however I think making the case another 10mm longer and incorporating the whip into the top of the phone so the antenna can tuck away when not needed, is a better plan.  It would not be hard to make the case accommodate it so it’s invisible and can fold out, or be replaced with a coax connection to an external antenna.

If there’s time, I’ll try to get some more conclusive tests done, but there’s no guarantees on that.

Jun 042018

So, recently there was a task at my work to review enabling gzip compression on our nginx HTTP servers to compress the traffic.

Now, in principle it seemed like a good idea, but having been exposed to the security world a little bit, I was familiar with some of the issues with this, notably, CRIME, BEAST and BREACH.  Of these, only BREACH is unmitigated at the browser end.

The suggested mitigations, in order of effectiveness are:

  1. Disabling HTTP compression
  2. Separating secrets from user input
  3. Randomizing secrets per request
  4. Masking secrets (effectively randomizing by XORing with a random secret per request)
  5. Protecting vulnerable pages with CSRF
  6. Length hiding (by adding random number of bytes to the responses)
  7. Rate-limiting the requests

Now, we’ve effectively being doing (1) by default… but (2), (3) and (4) make me wonder how protocols like OAuth2 are supposed to work.  That got me thinking about a little toy I was given for attending the 2011 linux.conf.au… it’s a YubiKey, one of the early model ones.  The way it operates is that Yubico’s servers, and your key, share a secret AES key (I think it’s AES-128), some static data, and a counter.  Each time you generate a one-time pad with the key, it increments its counter, encrypts the value with the static data, then encodes the output as a hexdump using a keyboard-agnostic encoding scheme to be “typed” into the computer.

Yubico receive this token, decrypt it, then compare the counter value.  If it checks out, and is greater than the existing counter value at their end, they accept it, and store that new counter value.

The same made me wonder if that could work for requests from a browser… that is, you agree on a shared secret over HTTPS, or using Diffie Hellman.  You synchronise counters (either using your new shared secret, or over HTTPS at the same time as you make the shared key), then from there on, each request to your API made by the browser, is then accompanied by a one-time pad, generated by encrypting the counter value and the static data and sending that in the HTTP headers.

There are a few libraries that do AES in the browser, such as JSAES (GPLv3) and aes-js (MIT).

This is going to be expensive to do, so a compromise might be to use this every N requests, where N is small enough that BREACH doesn’t have a sufficient number of requests from which it can derive a secret.  By the time it figures out that secret, the token is expired.  Or they could be bulk-generated at the browser end in the background so there’s a ready supply.

I haven’t gone through the full in’s and out’s of this, and I’m no security expert, but that’s just some initial thinking.

Feb 132018

So, over the last few years we’ve seen a big shift in the way websites operate.

Once upon a time, JavaScript was a nice-to-have, and you as a web developer better be prepared for it to not be functional; the DOM was non-existent, and we were ooohing and ahhing over the de facto standard in Internet multimedia; MacroMedia Flash.  The engine we now call WebKit was still a primitive and quite basic renderer called KHTML in a little-known browser called Konqueror.  Mozilla didn’t exist as an open-source project yet; it was Netscape and Microsoft duelling it out together.

Back then, XMLHTTPRequest was so new, it wasn’t a standard yet; Microsoft had implemented the idea as an ActiveX control in IE5, no one else had it yet.  So if you wanted to update a page, you had to re-load the whole lot and render it server-side.  We had just shaken off our FONT tags for CSS (thank god!), but if you wanted to make an image change as the mouse cursor hovered over it, you still needed those onmouseover/onmouseout event handlers to swap the image.  Ohh, and scalable graphics?  Forget it.  Render as a GIF or JPEG and hope you picked the resolution right.

And bear in mind, the expectation was that, a user running an 800×600 pixel screen resolution, and connected via a 28.8kbps dial-up modem, should be able to load your page up within about 30 seconds, and navigate without needing to resort to horizontal scroll bars.  That meant images had to be compressed to be no bigger than 30kB.

That was 17 years ago.  Man I feel old!

This gets me thinking… today, the expectation is that your Internet connection is at least 256kbps.  Why then do websites take so long to load?

It seems our modern web designers have forgotten the art of how to pack down a website to minimise the amount of data needed to be transmitted so that the page is functional.  In this modern age of “pretty” web design, we’ve forgotten how to make a page practical.

Today, if you want to show an icon on a page, and have it fill the entire browser window, you can fire up Inkscape or Adobe Illustrator, let the creative juices flow and voilá, out pops a scalable vector graphic, which can be dropped straight into your HTML.  Turn on gzip compression on the web server, and that graphic will be on that 28.8kbps user’s screen in under 3 seconds, and can still be as big as they want.

If you want to make a page interactive, there’s no need to reload the entire page; XMLHTTPRequest is now a W3C standard, and implemented in all the major browsers.  Websockets means an end to any kind of polling; you can get updates as they happen.

It seems silly, but in spite of all the advancements, website page loads are not getting faster, they’re getting slower.  The “everybody has broadband” and “everybody has full-HD screens” argument is being used as an excuse for bloat and sloppy design practices.

More than once I’ve had to point someone to the horizontal scroll bar because the web designer failed to test their website at the rather common 1366×768 screen resolution of a typical laptop.  If I had a dollar for every time that’s happened in the last 12 months, I’d be able to buy the offending companies out and sack the web designers responsible!

One of the most annoying, from a security perspective, is the proliferation of “content distribution networks”.  It seems they’ve realised these big bulky blobs of JavaScript take a long time to load even on fast links.  So, what do the bright sparks do?  “I know… instead of loading it from one server, I’ll put it on 10 and increase my upload capacity 10-fold!”  Yes, they might have 1Gbps on each host.  1Gbps × 10 = 10Gbps, so the page will load at 10Gbps, right?

Cue sad tuba sound effect.

At my workplace, we have a 20Mbps Ethernet (not ADSL[2], fibre or cable; Ethernet) link to the Internet.  On that link, I’ve been watching the web get slower and slower… and I do not think our ISP is completely to blame, as I see the same issue at home too.  One where we feel the pain a lot, is Atlassian’s system, particularly Jira and Confluence.  To give you how bad they drink the CDN cool-aid, check out the number of sites I have to whitelist in order to get the page functional:

Atlassian’s JIRA… failing in spite of a crapton of scripts being loaded.

That’s 17 different hosts my web browser must make contact with, and download content from, before the page will function.  17 separate HTTP connections, which must fight with all other IP traffic on that 20Mbps Ethernet link for bandwidth.  20Mbps is the maximum that any one connection will do, and I can guarantee it will not reach even half that!

Interestingly, despite allowing all those scripts to load, they still failed to come up with the goods after a pregnant pause.  So the extra trashing of the link was for naught.  Then there’s the security implications.

At least 3 of those, are pages that Atlassian do not control.  If someone compromised ravenjs.com for example; they could inject any JavaScript they want on the JIRA site, and take control of a user’s account.  Atlassian are relying on these third partys’ promises and security practices, to ensure their site stays secure, and stays in their (third party’s) control.  Suppose someone forgets to renew the domain subscription, the result could be highly embarrassing!

So, I’m left wondering what they teach these days.  For a multitude of reasons, sites should be blazingly quick to load, partly because modern techniques ought to permit vastly improved efficiency of content representation and delivery; and that network link speeds are steadily improving.  However it seems the reverse is true… why are we failing so badly?

Dec 252017

So, I’m home now for the Christmas break… and the fan in my power supply decided it would take a Christmas break itself.

The power supply was purchased brand new in June… it still works as a power supply, but with the fan seized up, it represents an overheating risk.  Unfortunately, the only real options I have are the Xantrex charger, which cooked my last batteries, or a 12V 20A linear PSU I normally use for my radio station.  20A is just a touch light-on, given the DC-DC converter draws 25A.  It’ll be fine to provide a top-up, but I wouldn’t want to use it for charging up flat batteries.

Now, I can replace the faulty fan.  However, that PSU is under warranty still, so I figure, back it goes!

In the meantime, an experiment.  What happens if I just turn the mains off and rely on the batteries?  Well, so far, so good.  Saturday afternoon, the batteries were fully charged, I unplugged the mains supply.  Battery voltage around 13.8V.

Sunday morning, battery was down to 12.1V, with about 1A coming in off the panels around 7AM (so 6A being drained from batteries by the cluster).

By 10AM, the solar panels were in full swing, and a good 15A was being pumped in, with the cluster drawing no more than 8A.  The batteries finished the day around 13.1V.

This morning, batteries were slightly lower at 11.9V.   Just checking now, I’m seeing over 16A flowing in from the panels, and the battery is at 13.2V.

I’m in the process of building some power meters based on NXP LPC810s and TI INA219Bs.  I’m at two minds what to use to poll them, whether I use a Raspberry Pi I have spare and buy a case, PSU and some sort of serial interface for it… or whether I purchase a small industrial PC for the job.

The Technologic Systems TS-7670 is one that I am considering, given they’ll work over a wide range of voltages and temperatures, they have plenty of UARTs including RS-485 and RS-232, and while they ship with an old Linux kernel, yours truly has ported both U-Boot and the mainline Linux kernel.  Yes, it’s ARMv5, but it doesn’t need to be a speed demon to capture lots of data, and they work just fine for Barangaroo where they poll Modbus (via pymodbus) and M-bus (via python-mbus).

Dec 082017

So, I have two compute nodes.  I’ll soon have 32GB RAM in each one, currently one has 32GB and the other has its original 8GB… with 5 8GB modules on the way.

I’ve tested these, and they work fine in the nodes I have, they’ll even work along side the Kingston modules I already have, so one storage node will have a mixture.  That RAM is expected to arrive on Monday.

Now, it’d be nice to have HA set up so that I can power down the still-to-be-upgraded compute node, and have everything automatically fire up on the other compute node.  OpenNebula supports this. BUT I have two instances that are being managed outside of OpenNebula that I need to handle: one being the core router, the other being OpenNebula itself.

My plan was to use corosync.  I have an identical libvirt config for both VMs, allowing me to move the VMs manually between the hosts.  VM Disk storage is using RBDs on Ceph.  Thus, HA by default.

As an experiment, I thought, what would happen if I fired up two instances of the VM that pointed to the same RBD image?  I was expecting one of two things to happen:

  • The image would be locked by the first started image, locking out the second.  One instance would boot, the other would fail to boot.
  • Both instances would boot… the split-brain scenario.

So, I created a libvirt domain on one node, slapped Ubuntu on there (I just wanted a basic OS for testing, so command line, nothing fancy).  As that was installing, I dumped out the “XML config” and imported that to the second node, but didn’t start it yet.

Once I had the new VM booted on node 1, I booted it on node 2.

To my horror, it started booting, and booted straight to a log-in prompt. Great, I had manually re-created the split-brain scenario I specifically hoped to avoid.  Thankfully, it is a throw-away VM specifically for testing this behaviour.  To be sure, I logged in on both, then hard-resetted one.  It boots to GRUB, then immediately GRUB goes into panic mode.  I hard reset the other VM, it boots past GRUB, but then systemd goes into panic mode.  This is expected: the two VMs are stomping on each others’ data oblivious to each others’ existence, a recipe for disaster.

So for this to work, I’m going to have to work on my fencing.  I need to ensure beyond all possible doubt, that the VM is running in one place and one place ONLY.

libvirt supports VM hooks to do this, and there’s an example here, however this thread seems to suggest this is not a reliable way of doing things.  RBD locking is what I hoped libvirt would do implicitly, but it seems not, and it appears that the locks are not removed when a client dies, which could lead to other problems.

A distributed lock manager would handle this, and this is something I need to research.  Possibilities include HashiCorp Consul, Apache ZooKeeper, CoreOS etcd and Redis, among others.  I can also try to come up with my own, perhaps built on PAXOS or Raft.

The state needs to only be kept in memory, persistence on disk is not required.  It’s safe to assume that if the cluster doesn’t know about a VM, it isn’t running anywhere else.  Once told of that VMs existence though, it should ensure only one instance runs at a time.

If a node loses contact with the remaining group, it should terminate everything it has, as it’s a fair bet, the others have noticed its absence and have re-started those instances already.

There’s lots to think about here, so I’ll leave this post at this point and ponder this some more.

Oct 102017

So, over the last few years, computing power has gotten us to the point where remotely operated aerial vehicles are not only a thing, but are cheap and widely available.

There are of course, lots of good points about these toys, lots of tasks in which they can be useful.  No, I don’t think Amazon Prime is one of them.

They come with their risks though, and there’s a big list of do’s and don’ts regarding their use.  For recreational use, CASA for example, have this list of rules.  This includes amongst other things, staying below 120m altitude, and 30m away from any person.

For a building, that might as well be 30m from the top of the roof, as you cannot tell if there are people within that building, or where in that building those people reside, or from what entrance they may exit.

I in principle have no problem with people playing around with them.  I draw the line where such vehicles enter a person’s property.

The laws are rather lax about what is considered trespass with regards to such vehicles.  The no-brainer is if the vehicle enters any building or lands (controlled or otherwise) on any surface within the property.  A big reason for this is that the legal system often trails technological advancement.

This does not mean it is valid to fly over someone’s property.  For one thing, you had better ensure there is absolutely no chance that your device might malfunction and cause damage or injury to any person or possession on that property.

Moreover, without speaking to the owner of said property, you make it impossible for that person to take any kind of preventative action that might reduce the risk of malfunction, or alert you to any risks posed on the property.

In my case, I operate an amateur radio station.  My transmitting equipment is capable of 100W transmit power between 1.8MHz and 54MHz, 50W transmit power between 144MHz and 148MHz, and 20W transmit power between 420MHz and 450MHz, using FM, SSB, AM and CW, and digital modes built on these analogue modulation schemes.

Most of my antennas are dipoles, so 2.2dBi, I do have some higher-gain whips, and of course, may choose to use yagis or even dish antennas.  The stations that I might choose to work are mostly terrestrial in nature, however, airborne stations such as satellites, or indeed bouncing off objects such as the Moon, are also possibilities.

Beyond the paperwork that was submitted when applying for my radio license (which for this callsign, was filed about 9 years ago now, or for my original callsign was filed back in December 2007), there is no paperwork required to be submitted or filled out prior to me commencing transmissions.  Not to the ACMA, not to CASA, not to registered drone operators in the local area, not anybody.

While I’ve successfully operated this station with no complaints from my neighbours for nearly 10 years… it is worth pointing out that the said neighbours are a good distance away from my transmitting equipment.  Far enough away that the electromagnetic fields generated are sufficiently diminished to pose no danger to themselves or their property.

Any drone that enters the property, is at risk of malfunction if it strays too close to transmitting antennas.  If you think I will cease activity because you are in the area, think again.  There is no expectation on my part that I should alter my activities due to the presence of a drone.  It is highly probable that, whilst being inside, I am completely unaware of your device’s presence.  I cannot, and will not, take responsibility for your device’s electromagnetic immunity, or lack thereof.

In the event that it does malfunction though… it will be deemed to have trespassed if it falls within the property, and may be confiscated.  If it causes damage to any person or possession within the property, it will be confiscated, and the owner will be expected to pay damages prior to the device’s return.

In short, until such time as the laws are clarified on the matter, I implore all operators of these devices, to not fly over any property without the express permission of the owner of that property.  At least then, we can all be on the same page, we can avoid problems, and make the operation safer for all.

Sep 172017

So we’ve got a free weekend where there’ll be two of us to do a solar installation… thus the parts have now been ordered for that installation.

First priority will be to get the panels onto the roof and bring the feed back to where the cluster lives.  The power will come from 3 12V 120W solar panels that will be mounted on the roof over the back deck.  Theoretically these can push about 7A of current with a voltage of 17.6V.

We’ve got similar panels to these on the roof of a caravan, those ones give us about 6A of current when there’s bright sunlight.  The cluster when going flat-chat needs about 10A to run, so with three panels in broad daylight, we should be able to run the cluster and provide about 8A to top batteries up with.

We’ll be running individual feeds of 8-gauge DC cable from each panel down to a fused junction box under the roof on the back deck.  From there, it’ll be 6-gauge DC cable down to the cluster’s charge controller.

Now, we have a relay that switches between mains-sourced DC and the solar, and right now it’s hard-wired to be on when the mains supply is switched on.

I’m thinking that the simplest solution for now will be to use a comparator with some hysteresis.  That is, an analogue circuit.  When the solar voltage is greater than the switchmode DC power supply, we use solar.  We’ll need the hysteresis to ensure the relay doesn’t chatter when the solar voltage gets near the threshold.

The other factor here is that the solar voltage may get as high as 22V or so, thus resistor dividers will be needed both sides to ensure the inputs to the comparator are within safe limits.

The current consumption of this will be minimal, so a LM7809 will probably do the trick for DC power regulation to power the LM311.  If I divide all inputs by 3, 22V becomes ~7.3V, giving us plenty of head room.

I can then use the built-in NPN to drive a P-channel MOSFET that controls the relay.  The relay would connect between MOSFET drain and 0V, with the MOSFET source connecting to the switchmode PSU (this is where the relay connects now).

The solar controller also connects its control line to the MOSFET drain.  To it, the MOSFET represents the ignition switch on a vehicle, starting the engine would connect 12V to the relay and the solar controller control input, connecting the controller’s DC input to the vehicle battery and telling the controller to boost this voltage up for battery charging purposes.

By hooking it up in this manner, and tuning the hysteresis on the comparator, we should be able to handle automatic switch-over between mains power and solar with the minimum of components.

Sep 132017

So it seems that the Same Sex Marriage postal votes are finally being sent around.  This is good news in a way: we get to have a say in the matter and hopefully put the matter to bed one way or the other.

No more umming and arring, which I’m frankly sick and tired of, as I feel there are more pressing needs.  Yes, it’s important, but we have two nuclear armed crazy-haired nutters at opposite sides of the Pacific ready to light the planet up like a neon light!

I’m in support of the legislation changing by the way.  I think same-sex couples are entitled to the same rights, and it wasn’t that long ago that marriage was restricted to those not just of the opposite sex, but also had to be of the same “race” and religion.

To quote a song by John Williamson: “They’d chain you up to a boab tree, for kissing an Aborigine!”

So to my way of thinking, society changes.  What was taboo yesterday, we don’t think twice about today.  An Anglican family sending their children to a Catholic school would be heresy years ago… but for my sister and I, that is exactly what happened.  The world doesn’t seem to have imploded as a result.

The status quo regarding marriage is a hang-over from when the Church was the only place where you could get married, and ruled with far greater weight than today.  This is no longer the case, thus it no longer makes sense to hang onto this concept.

Anyway… my opinions on this are beside the point.  In spite of the good intentions, it looks as if the postal vote envelopes overlook one serious flaw: with sufficient light they are see through!

So my proposal: Put a thin piece of card in with the postal vote to block the light.  Not thick enough that it might cause the envelope to jam or interfere with sorting equipment, just opaque enough to prevent the contents being visible.  A small piece of black paper would likely do the job nicely.

Sure the ABS will have a little bit more paper to dispose of, but then at least, our votes are secure and people can’t “manipulate” the vote by snooping on sealed envelopes and discard the ones that disagree with their opinions.  At least then we won’t be wasting $122M.

Jun 292017

So, there’s some work still to be done, for example making some extension leads for the run between the battery link harness, load power distribution and the charger… and to generally tidy things up, but it is now up and running.

On the floor, is the 240V-12V power supply and the charger, which right now is hard-wired in boost mode. In the bottom of the rack are the two 105Ah 12V AGM batteries, in boxes with fuses and isolation switches.

The nodes and switching is inside the rack, and resting on top is the load power distribution board, which I’ll have to rewire to make things a little neater. A prospect is to mount some of this on the back.

I had a few introductions to make, introducing the existing pair of SG-200 switches to the newcomer and its VLANs, but now at least, I’m able to SSH into the nodes, access the IPMI BMC and generally configure the whole box and dice.

With the exception of the later upgrade to solar, and the aforementioned wiring harness clean-ups, the hardware-side of this dual hardware/software project, is largely complete, and this project now transitions to being a software project.

The plan from here:

  • Update the OSes… as all will be a little dated. (I might even blow away and re-load.)
  • Get Ceph storage up and running. It actually should be configured already, just a matter of getting DNS hostnames sorted out so they can find eachother.
  • Investigating the block caching landscape: when I first started the project at work, it was a 3-horse race between Facebook’s FlashCache, bcache and dmcache. Well, FlashCache is no more, replaced by EnhancedIO, and I’m not sure about the rest of the market. So this needs researching.
  • Management interfaces: at my workplace I tried Ganeti, OpenNebula and OpenStack. This again, needs re-visiting. OpenNebula has moved a long way from where it was and I haven’t looked at the others in a while. OpenStack had me running away screaming, but maybe things have improved.
Jun 252017

So, having got the rack mostly together, it is time to figure out how to connect everything.

I was originally going to have just one battery and upgrade later… but when it was discovered that the battery chosen was rather sick, the decision was made that I’d purchase two new batteries. So rather than deferring the management of multiple batteries, I’d have to deal with it up-front.

Rule #1 with paralleling batteries: don’t do it unless you have to. In a perfect world, you can do it just fine, but reality doesn’t work that way. There’s always going to be an imbalance that upsets things. My saving grace is that my installation is fixed, not mobile.

I did look at alternatives, including diodes (too much forward voltage drop), MOSFET switching (complexity), relay switching (complexity again, plus contact wear), and DIY uniselectors. Since I’m on a tight deadline, I decided, stuff it, I’ll parallel them.

That brings me to rule #2 about paralleling batteries: keep everything as close to matched as possible. Both batteries were bought in the same order, and hopefully are from the same batch. Thus, characteristics should be very close. The key thing here, I want to keep cable lengths between the batteries, load and charger, all equal so that the resistances all balance out. That, and using short runs of thick cables to minimise resistance.

I came up with the following connection scheme:

You’ll have to forgive the poor image quality here. On reflection, photographing a whiteboard has always been challenging.

Both batteries are set up in an identical fashion: 40A fuse on the positive side, cable from the negative side, going to an Andersen SB50/10. (Or I might put the fuse on the negative side … haven’t decided fully yet, it’ll depend on how much of each colour wire I have.) The batteries themselves are Giant Power 105Ah 12V AGM batteries. These are about as heavy as I can safely manage, weighing about 30kg each.

The central harness is what I built this afternoon, as I don’t yet have the fuse holders for the two battery harnesses.

The idea being that the resistance between the charger and each battery should be about the same. Likewise, the resistance between the load and each battery should be about the same

The load uses a distribution box and a bus bar. You’ve seen it before, but here’s how it’s wired up… pretty standard:

You might be able to make out the host names there too (periodic table naming scheme, why, because they’re Intel Atoms) … the 5 nodes are on the left and the two switches to the right of the distribution box. I have 3 spare positions.

In heavy black is the 0V bus bar.

This is what I’ve been spending much of my pondering, doing. Part of this harness is already done as it was installed that way in the car, the bit that’s missing is the circuit to the left of the relay that actually drives it. Redarc intended that the ignition key switch would drive the relay, I’ll be exploiting this feature.

Some time this week, I hope to make up the wiring harnesses for the two batteries, and get some charge into them as they’ve sat around for the past two months in their boxes steadily discharging, so I’d be better to get a charger onto them sooner rather than later.

The switch-over circuit can wait for now: just hard-wire it to the mains DC feed for now since there’s no solar yet. The principle of operation is that the comparator (an LM311) compares the solar voltage to a reference (derived from a 5V regulator) and kicks in when the voltage is high enough. (How high? No idea, maybe ~18V?). When that happens, it outputs a logic high signal that turns off the MOSFET. When too low, it pulls the MOSFET gate low, turning it on.

The MOSFET (a P-channel) provides the “ignition key switch” signal to the BCDC1225, fooling it into thinking it is connected to vehicle power, and the charger will boost as needed. The key being that the BCDC1225 makes the decision as to whether the battery needs charging, and how much charge.

By bolting together off-the-shelf parts, we should have something that I can source replacements for should the smoke escape, and there’s no high voltages to deal with.