Apr 012018
 

So yesterday I wound back the mains charger so that the solar would take on the load during the day.  Seems I wound it back a bit far, and the mains charger did almost no work overnight, leaving the battery somewhere around 11.8V.

That’s a wee bit low for my comfort.  Yes, they are deep cycle AGMs, but I’d rather not get that low.

Thus, I wound it up a bit, float at 12.8V, so Vboost at 13.6V.  That looks to be the sweet spot.  Now that the sun is up, I’m getting nice healthy amps of current down the wire from the roof:

The cluster is drawing about 8A, so that’s the cluster powered, and about 6A going to the batteries. It intermittently peaks about 15A or so.

I also found myself fine tuning the Ethernet settings on the border router. For some reason, its Realtek RTL8139 was happy to talk to the Cisco SG-200-08 it was connected to before, but didn’t quite get along with the Linksys LGS326-AU. I’ve told the switch to force 100Mbps full-duplex MDIX (evidently, it’s a cross-over cable), and so far, that seems to have settled things down.

Jul 232017
 

So, having got some instances going… I thought I better sort out the networking issues proper.  While it was working, I wanted to do a few things:

  1. Bring a dedicated link down from my room into the rack directly for redundancy
  2. Define some more VLANs
  3. Sort out the intermittent faults being reported by Ceph

I decided to tackle (1) first.  I have two 8-port Cisco SG-200 switches linked via a length of Cat5E that snakes its way from our study, through the ceiling cavity then comes up through a small hole in the floor of my room, near where two brush-tail possums call home.

I drilled a new hole next to where the existing cable entered, then came the fun of trying to feed the new cable along side the old one.  First attempt had the cable nearly coil itself just inside the cavity.  I tried to make a tool to grab the end of it, but it was well and truly out of reach.  I ended up getting the job done by taping the cable to a section of fibreglass tubing, feeding that in, taping another section of tubing to that, feed that in, etc… but then I ran out of tubing.

Luckily, a rummage around, and I found some rigid plastic that I was able to tape to the tubing, and that got me within a half-metre of my target.  Brilliant, except I forgot to put a leader cable through for next time didn’t I?

So more rummaging around for a length of suitable nylon rope, tape the rope to the Cat5E, haul the Cat5E out, then grab another length of rope and tape that to the end and use the nylon rope to haul everything back in.

The rope should be handy for when I come to install the solar panels.

I had one 16-way patch panel, so wound up terminating the rack-end with that, and just putting a RJ-45 on the end in my room and plugging that directly into the switch.  So on the shopping list will be some RJ-45 wall jacks.

The cable tester tells me I possibly have brown and white-brown switched, but never mind, I’ll be re-terminating it properly when I get the parts, and that pair isn’t used anyway.

The upshot: I now have a nice 1Gbps ring loop between the two SG-200s and the LGS326 in the rack.  No animals were harmed in the formation of this ring, although two possums were mildly inconvenienced.  (I call that payback for the times they’ve held the Marsupial Olympics at 2AM when I’m trying to sleep!)

Having gotten the physical layer sorted out, I was able to introduce the upstairs SG-200 to the new switch, then remove the single-port LAG I had defined on the downstairs SG-200.  A bit more tinkering going, and I had a nice redundant set-up: setting my laptop to ping one of the instances in the cluster over WiFi, I could unplug my upstairs trunk, wait a few seconds, plug it back in, wait some more, unplug the downstairs trunk, wait some more again, then plug in back in again, and not lose a single ICMP packet.

I moved my two switches and my AP over to the new management VLAN I had set up, along side the IPMI interfaces on the nodes.  The SG-200s were easy, aside from them insisting on one port being configured with a PVID equal to the management VLAN (I guess they want to ensure you don’t get locked out), it all went smoothly.

The AP though, a Cisco WAP4410N… not so easy.  In their wisdom, and unlike the SG-200s, the management VLAN settings page is separate from the IP interface page, so you can’t change both at the same time.  I wound up changing the VLAN, only to find I had locked myself out of it.  Much swearing at the cantankerous AP and wondering how could someone overlook such a fundamental requirement!  That, and the switch where the AP plugs in, helpfully didn’t add the management VLAN to the right port like I asked of it.

Once that was sorted out, I was able to configure an IP on the old subnet and move the AP across.

That just left dealing with the intermittent issues with Ceph.  My original intention with the cluster was to use 802.3AD so each node had two 2Gbps links.  Except: the LGS326-AU only supports 4 LAGs.  For me to do this, I need 10!

Thankfully, the bonding support in the Linux kernel has several other options available.  Switching from 802.3ad to balance-tlb, resolved the issue.

slaves_bond0="enp0s20f0 enp0s20f1"
slaves_bond1="enp0s20f2 enp0s20f3"
config_bond0="null"
config_bond1="null"
config_enp0s20f0="null"
config_enp0s20f1="null"
config_enp0s20f2="null"
config_enp0s20f3="null"
rc_net_bond0_need="net.enp0s20f0 net.enp0s20f1"
rc_net_bond1_need="net.enp0s20f2 net.enp0s20f3"
mode_bond0="balance-tlb"
mode_bond1="balance-tlb"

I am now currently setting up a core router instance (with OpenBSD 6.1) and a OpenNebula instance (with Gentoo AMD64/musl libc).

Apr 232016
 

Well, I finally got busy with the soldering iron again. This time, installing the regulators in the cluster nodes and in the 26-port switch.

I had a puzzle as to where to put the regulator, I didn’t want it exposed, as they’re a static-sensitive device, so better to keep them enclosed. It needed somewhere where the air would be flowing, and looking around, I found the perfect spot, just in behind the CPU heatsink. There’s a small gap where the air will be flowing past to cool the CPU, and it’s sufficiently near the ATX PSU to feed the power cabling past.

I found I was able to tap M3 threads into the tops of the heatsinks and fix them to the “front” of the case near where the DIN rail brackets fit in. So from the outside, it looks all neat and tidy.

After installing those, I turned my attention to the switch. Now I had an educated guess that the switch would be stepping down from 12V, so being close to that was not so critical, however going above it would stretch the friendship.

Rather than feeding it 13.1V like the compute nodes, I decided I’d find some alternate resistor values that’d be closer to 12V. Those wound up being R1=3.3kΩ and R2=390Ω, which gave about 11.8V. Close enough. It was then a matter of polarity. The wiring inside this switch uses a non-standard colour code, and as I suspected, the conductors are just paralleled, it’s the one feed of 12V.

Probing with a multimeter revealed the pin pairs were shorted, and removing the PSU confirmed this. I pulled out the switch mainboard and probed around the electrolytics which had their negative sides marked. Sure enough, it’s the Australian Olympic team colours that give away the 0V side.

I’ve shown the original colour code here as coloured dots, but essentially, green and yellow are the 0V side, and red and black are the +12V side. So I had everything necessary. I grabbed a bit of scrap PCB, used the old PSU as a template for drilling out the holes, used a hacksaw to divide the PCB surface up then dead-bugged the rest. To position the heatsink, I drilled a 3mm hole in the bottom of the case and screwed a 10mm M3 stand-off there. Yes, this means there’s an annoying lump on the bottom, I should use a countersunk M3 screw, I’ll fix that later if it bothers me, I’ll be rack-mounting it anyway.

On the input to the regulator, I have a 330uF electrolytic capacitor and 100nF monolithic capacitor in parallel, on the output, it’s a 470uF and a 100nF. A third 100nF hooks the adjust pin to 0V to reduce noise. I de-soldered the original PSUs socket and used that on the new board. It fits beautiful. 100-240V? Not any more Linksys.

So now, the whole lot runs off a single 12V battery supply. The remainder of this project is the charging of that battery and the software configuration of the cluster.

At present, the whole cluster’s doing an `emerge @system`, with distcc running, and drawing about 7.5A with the battery sitting at 12.74V (~95W). Edit: Now that they’ve properly fired up, I’m seeing a drain of 10.3A (126W). Looks that’s going to be the “worst case scenario”.

Mar 272016
 

Probably going to be easier than expected. I popped open the cover to see whether it was like the much older Netcomm switch we have.

Sure enough, Linksys do it the same way in the LGS326AU:

This PSU is an open-frame PSU made by Asian Power Devices, Inc. Model NW-20A12-BAAB. Not sure if there’s someone here who knows more about this particular PSU.

It appears it’s the one 12V feed split in two. Red/Black are negative, green/yellow are positive. I’ll double check this. I think with a nice big inductor/capacitor as an in-line filter, this can be hooked straight up to the 12V line, perhaps with a small amount of zener overvoltage protection. It appears that the voltage is further rectified on the switch mainboard: