May 242019
 

Recently, I had a failure in the cluster, namely one of my nodes deciding to go the way of the dodo. I think I’ve mostly recovered everything from that episode.

I bought some new nodes which I can theoretically deploy as spare nodes, Core i5 Intel NUCs, and for now I’ve temporarily decommissioned one of my compute nodes (lithium) to re-purpose its motherboard to get the downed storage node back on-line. Whilst I was there, I went and put a new 2TB HDD in… and of course I left the 32GB RAM in, so it’s pretty much maxxed out.

I’d like to actually make use of these two new nodes, however I am out of switch capacity, with all 26 ports of the Linksys LGS-326AU occupied or otherwise reserved. I did buy a Netgear GS748T with the intention of moving across to it, but never got around to doing so.

The principle matter here being that the Netgear requires a wee bit more power. AC power ratings are 100-250V, 1.5A max. Now, presumably the 1.5A applies at the 100V scale, that’s ~150W. Some research suggested that internally, they run 12V, that corresponds to about 8.5A maximum current.

This is a bit beyond the capabilities of the MIC29712s.

I wound up buying a DC-DC power supply, an isolated one as that’s all I could get: the Meanwell SD-100A-12. This theoretically can take 9-18V in, and put out 12V at up to 8.5A. Perfect.

Due to lack of time, it sat there. Last week-end though, I realised I’d probably need to consider putting this thing to use. I started by popping open the cover and having a squiz inside. (Who needs warranties?)

The innards of the GS-748Tv5, ruler for scale

I identified the power connections. A probe around with the multimeter revealed that, like the Linksys, it too had paralleled conductors. There were no markings on the PSU module, but un-plugging it from the mainboard and hooking up the multimeter whilst powering it up confirmed it was a 12V output, and verified the polarity. The colour scheme was more sane: Red/Yellow were positive, Black/Blue were negative.

I made a note of the pin-out inside the case.

There’s further DC-DC converters on-board near the connector, what their input range is I have no idea. The connector on the mainboard intrigued me though… I had seen that sort of connector before on ATX power supplies.

The power supply connector, close up.

At the other end of the cable was a simple 4-pole “KK”-like connector with a wider pin spacing (I think ~3mm). Clearly designed with power capacity in mind. I figured I had three options:

  1. Find a mating connector for the mainboard socket.
  2. Find a mating header for the PSU connector.
  3. Ram wires into the plug and hot-glue in place.

As it happens, option (1) turned out easier than I thought it would be. When I first bought the parts for the cluster, the PicoPSU modules came with two cables: one had the standard SATA and Molex power connectors for powering disk drives, the other came out to a 4-pin connector not unlike the 6-pole version being used in the switch.

Now you’ll note of those 6 poles, only 4 are actually populated. I still had the 4-pole connectors, so I went digging, and found them this evening.

One of my 4-pole 12V connectors, with the target in the background.

As it happens, the connectors do fit un-modified, into the wrong 4 holes — if used unmodified, they would only make contact with 2 of the 4 pins. To make it fit, I had to do a slight modification, putting a small chamfer on one of the pins with a sharp knife.

After a slight modification, the connector fits where it is needed.

The wire gauge is close to that used by the original cable, and the colour coding is perfect… black corresponds to 0V, yellow to +12V. I snipped off the JST-style connector at the other end.

I thought about pulling out the original PSU, but then realised that there was a small hole meant for a Kensington-style lock which I wasn’t using. No sharp edges, perfect for feeding the DC cables through. I left the original PSU in-situ, and just unplugged its DC output.

The DC input leads snake through the hole that Netgear helpfully provided.

Bringing the DC power input to the outside.

Before putting the screws in, I decided to give this a test on the bench supply. The switch current fluctuates a bit when booting, but it seems to settle on about 1.75A or so. Not bad.

Testing the switch running on 12V

Terminating this, I decided to use XT-60 connectors. I wanted something other than the 30A “powerpoles” and their larger 50A cousins that are dotted throughout the cluster, as this needed to be regulated 12V. I did not want to get it mixed up with the raw 12V feed from the batteries.

I ran some heavier gauge cable to the DC-DC PSU, terminated with the mating XT-60 connector and hooked that up to my PSU. Providing it with 12V, I dialled the output to 12V exactly. I then gave it a no-load test: it held the output voltage pretty good.

Next, I hooked the switch up to the new PSU. It fired up and I measured the voltage now under load: it still remained at 12V. I wound the voltage down to 9V, then up to 15V… the voltage output never shifted. At 9V, the current consumption jumps up to about 3.5A, as one would expect.

Otherwise, it seemed to be content to draw under 2A so the efficiency of the DC-DC converter is pretty good.

I’ll need to wire in a new fuse box to power everything, but likely the plan will be to decommission the 16-port 100Mbps switch I use for the management network, slide the 48-port switch in its place, then gradually migrate everything across to the new switch.

Overall, the modding of this model switch was even less invasive than that of the Linksys. It’s 100% reversible. I dare say having posted this, there’ll be a GS748Tv6 that’ll move the 240V PSU to the mainboard, but for now at least, this is definitely a switch worth looking at if 12V operation is needed.

Apr 012018
 

So yesterday I wound back the mains charger so that the solar would take on the load during the day.  Seems I wound it back a bit far, and the mains charger did almost no work overnight, leaving the battery somewhere around 11.8V.

That’s a wee bit low for my comfort.  Yes, they are deep cycle AGMs, but I’d rather not get that low.

Thus, I wound it up a bit, float at 12.8V, so Vboost at 13.6V.  That looks to be the sweet spot.  Now that the sun is up, I’m getting nice healthy amps of current down the wire from the roof:

The cluster is drawing about 8A, so that’s the cluster powered, and about 6A going to the batteries. It intermittently peaks about 15A or so.

I also found myself fine tuning the Ethernet settings on the border router. For some reason, its Realtek RTL8139 was happy to talk to the Cisco SG-200-08 it was connected to before, but didn’t quite get along with the Linksys LGS326-AU. I’ve told the switch to force 100Mbps full-duplex MDIX (evidently, it’s a cross-over cable), and so far, that seems to have settled things down.

Jul 232017
 

So, having got some instances going… I thought I better sort out the networking issues proper.  While it was working, I wanted to do a few things:

  1. Bring a dedicated link down from my room into the rack directly for redundancy
  2. Define some more VLANs
  3. Sort out the intermittent faults being reported by Ceph

I decided to tackle (1) first.  I have two 8-port Cisco SG-200 switches linked via a length of Cat5E that snakes its way from our study, through the ceiling cavity then comes up through a small hole in the floor of my room, near where two brush-tail possums call home.

I drilled a new hole next to where the existing cable entered, then came the fun of trying to feed the new cable along side the old one.  First attempt had the cable nearly coil itself just inside the cavity.  I tried to make a tool to grab the end of it, but it was well and truly out of reach.  I ended up getting the job done by taping the cable to a section of fibreglass tubing, feeding that in, taping another section of tubing to that, feed that in, etc… but then I ran out of tubing.

Luckily, a rummage around, and I found some rigid plastic that I was able to tape to the tubing, and that got me within a half-metre of my target.  Brilliant, except I forgot to put a leader cable through for next time didn’t I?

So more rummaging around for a length of suitable nylon rope, tape the rope to the Cat5E, haul the Cat5E out, then grab another length of rope and tape that to the end and use the nylon rope to haul everything back in.

The rope should be handy for when I come to install the solar panels.

I had one 16-way patch panel, so wound up terminating the rack-end with that, and just putting a RJ-45 on the end in my room and plugging that directly into the switch.  So on the shopping list will be some RJ-45 wall jacks.

The cable tester tells me I possibly have brown and white-brown switched, but never mind, I’ll be re-terminating it properly when I get the parts, and that pair isn’t used anyway.

The upshot: I now have a nice 1Gbps ring loop between the two SG-200s and the LGS326 in the rack.  No animals were harmed in the formation of this ring, although two possums were mildly inconvenienced.  (I call that payback for the times they’ve held the Marsupial Olympics at 2AM when I’m trying to sleep!)

Having gotten the physical layer sorted out, I was able to introduce the upstairs SG-200 to the new switch, then remove the single-port LAG I had defined on the downstairs SG-200.  A bit more tinkering going, and I had a nice redundant set-up: setting my laptop to ping one of the instances in the cluster over WiFi, I could unplug my upstairs trunk, wait a few seconds, plug it back in, wait some more, unplug the downstairs trunk, wait some more again, then plug in back in again, and not lose a single ICMP packet.

I moved my two switches and my AP over to the new management VLAN I had set up, along side the IPMI interfaces on the nodes.  The SG-200s were easy, aside from them insisting on one port being configured with a PVID equal to the management VLAN (I guess they want to ensure you don’t get locked out), it all went smoothly.

The AP though, a Cisco WAP4410N… not so easy.  In their wisdom, and unlike the SG-200s, the management VLAN settings page is separate from the IP interface page, so you can’t change both at the same time.  I wound up changing the VLAN, only to find I had locked myself out of it.  Much swearing at the cantankerous AP and wondering how could someone overlook such a fundamental requirement!  That, and the switch where the AP plugs in, helpfully didn’t add the management VLAN to the right port like I asked of it.

Once that was sorted out, I was able to configure an IP on the old subnet and move the AP across.

That just left dealing with the intermittent issues with Ceph.  My original intention with the cluster was to use 802.3AD so each node had two 2Gbps links.  Except: the LGS326-AU only supports 4 LAGs.  For me to do this, I need 10!

Thankfully, the bonding support in the Linux kernel has several other options available.  Switching from 802.3ad to balance-tlb, resolved the issue.

slaves_bond0="enp0s20f0 enp0s20f1"
slaves_bond1="enp0s20f2 enp0s20f3"
config_bond0="null"
config_bond1="null"
config_enp0s20f0="null"
config_enp0s20f1="null"
config_enp0s20f2="null"
config_enp0s20f3="null"
rc_net_bond0_need="net.enp0s20f0 net.enp0s20f1"
rc_net_bond1_need="net.enp0s20f2 net.enp0s20f3"
mode_bond0="balance-tlb"
mode_bond1="balance-tlb"

I am now currently setting up a core router instance (with OpenBSD 6.1) and a OpenNebula instance (with Gentoo AMD64/musl libc).

Apr 232016
 

Well, I finally got busy with the soldering iron again. This time, installing the regulators in the cluster nodes and in the 26-port switch.

I had a puzzle as to where to put the regulator, I didn’t want it exposed, as they’re a static-sensitive device, so better to keep them enclosed. It needed somewhere where the air would be flowing, and looking around, I found the perfect spot, just in behind the CPU heatsink. There’s a small gap where the air will be flowing past to cool the CPU, and it’s sufficiently near the ATX PSU to feed the power cabling past.

I found I was able to tap M3 threads into the tops of the heatsinks and fix them to the “front” of the case near where the DIN rail brackets fit in. So from the outside, it looks all neat and tidy.

After installing those, I turned my attention to the switch. Now I had an educated guess that the switch would be stepping down from 12V, so being close to that was not so critical, however going above it would stretch the friendship.

Rather than feeding it 13.1V like the compute nodes, I decided I’d find some alternate resistor values that’d be closer to 12V. Those wound up being R1=3.3kΩ and R2=390Ω, which gave about 11.8V. Close enough. It was then a matter of polarity. The wiring inside this switch uses a non-standard colour code, and as I suspected, the conductors are just paralleled, it’s the one feed of 12V.

Probing with a multimeter revealed the pin pairs were shorted, and removing the PSU confirmed this. I pulled out the switch mainboard and probed around the electrolytics which had their negative sides marked. Sure enough, it’s the Australian Olympic team colours that give away the 0V side.

I’ve shown the original colour code here as coloured dots, but essentially, green and yellow are the 0V side, and red and black are the +12V side. So I had everything necessary. I grabbed a bit of scrap PCB, used the old PSU as a template for drilling out the holes, used a hacksaw to divide the PCB surface up then dead-bugged the rest. To position the heatsink, I drilled a 3mm hole in the bottom of the case and screwed a 10mm M3 stand-off there. Yes, this means there’s an annoying lump on the bottom, I should use a countersunk M3 screw, I’ll fix that later if it bothers me, I’ll be rack-mounting it anyway.

On the input to the regulator, I have a 330uF electrolytic capacitor and 100nF monolithic capacitor in parallel, on the output, it’s a 470uF and a 100nF. A third 100nF hooks the adjust pin to 0V to reduce noise. I de-soldered the original PSUs socket and used that on the new board. It fits beautiful. 100-240V? Not any more Linksys.

So now, the whole lot runs off a single 12V battery supply. The remainder of this project is the charging of that battery and the software configuration of the cluster.

At present, the whole cluster’s doing an `emerge @system`, with distcc running, and drawing about 7.5A with the battery sitting at 12.74V (~95W). Edit: Now that they’ve properly fired up, I’m seeing a drain of 10.3A (126W). Looks that’s going to be the “worst case scenario”.

Mar 272016
 

Probably going to be easier than expected. I popped open the cover to see whether it was like the much older Netcomm switch we have.

Sure enough, Linksys do it the same way in the LGS326AU:

This PSU is an open-frame PSU made by Asian Power Devices, Inc. Model NW-20A12-BAAB. Not sure if there’s someone here who knows more about this particular PSU.

It appears it’s the one 12V feed split in two. Red/Black are negative, green/yellow are positive. I’ll double check this. I think with a nice big inductor/capacitor as an in-line filter, this can be hooked straight up to the 12V line, perhaps with a small amount of zener overvoltage protection. It appears that the voltage is further rectified on the switch mainboard: