May 012016
 

So, after putting aside the charge controller for now, I’ve taken some time to see if I can get the software side of things into shape.

In the midst of my development, I found a small wiring fault that was responsible for blowing a couple of fuses. A small nick in the sheath of the positive wire in a power cable was letting the crimp part of a DC barrel connector contact +12V. A tweak of that crimp and things are back to normal. I’ve swapped all the 10A fuses for 5A ones, since the regulators are only rated at 7.5A.

The VLANs are assigned now, and I have bonding going between the two pairs of Ethernet devices. In spite of the switch only supporting 4 LAGs, it seems fine with me doing LACP on effectively 10 LAGs. I’ll see how it goes.

The switch has 5 ports spare after plugging in all 5 nodes and a 16-port switch for the IPMI subnet. One will be used for a management interface so I can plug a laptop in, and the others will be paired with LACP for linking to my two existing Cisco SG200-8s.

One of the goals of this project is to try and push the performance of Ceph. In the office, we tried bare Ceph, and found that, while it’s fine for sequential I/O, it suffers a bit with random read/writes, and Windows-based HyperV images like to do a lot of random reads/writes.

Putting FlashCache in the mix really helped, but I note now, it’s no longer maintained. EnhanceIO had only just forked when I tried FlashCache, now it seems that’s the official successor.

There are two alternatives to FlashCache/EnhanceIO: bcache and dm-cache.

I’ll rule out bcache now as it requires the backing image be “formatted” for use. In other words, the backing image is not a raw image, but some proprietary (to bcache) format. This isn’t unworkable, but it raises concerns with me about portability: if I migrate a VM, do I need to migrate its cache too, or is it sufficient to cleanly shut down and detach the bcache device before re-assembling it on the new host?

By contrast, dm-cache and EnhanceIO/FlashCache work with raw backing images, making them much more attractive. Flush the cache before migration or use writethru mode, and all should be fine. dm-cache does however require a separate metadata device: messy, but not unworkable. We can provision the cache-related devices we need using LVM2, and use the kernel-mode Rados block device as our backing image.

So I think my caching subsystem is a two-horse race: dm-cache or EnhanceIO. I guess we’ll give them a try and see how they go.

For those following along at home, if you’re running kernel >4.3, you might want use this fork of EnhanceIO due to changes in the kernel block I/O layer.

To manage the OpenNebula master node, I’ve installed corosync/pacemaker. Normally these are used with DR:BD, however I figure Ceph can fulfil that role. The concepts are similar: it’s a shared block device. I’m not sure if it’ll be LXC, Docker or a VM at this point that “contains” the server, but whatever it is, it should be possible for it to have its root FS and data on Ceph.

I’m leaning towards LXC for this. Time for some more experimentation.