Oct 252019
 

In my last post, I mentioned that I was playing around with SDR a bit more, having bought a couple. Now, my experiments to date were low-hanging fruit: use some off-the-shelf software to receive an existing signal.

One of those off-the-shelf packages was CubicSDR, which gives me AM/FM/SSB/WFM reception, the other is qt-dab which receives DAB+. The long-term goal though is to be able to use GNURadio to make my own tools. Notably, I’d like to set up a Raspberry Pi 3 with a DRAWS board and a RTL-SDR, to control the FT-857D and implement dual-watch for emergency comms exercises, or use the RTL-SDR for DAB+ reception.

In the latter case, while I could use qt-dab, it’ll be rather cumbersome in that use case. So I’ll probably implement my own tool atop GNURadio that can talk to a small microcontroller to drive a keypad and display. As a first step, I thought I’d try a DIY FM stereo receiver. This is a mildly complex receiver that builds on what I learned at university many moons ago.

FM Stereo is actually surprisingly complex. Not DAB+ levels of complex, but still complex. The system is designed to be backward-compatible with mono FM sets. FM itself actually does not provide stereo on its own — a stereo FM station operates by multiplexing a “mono” signal, a “differential” signal, and a pilot signal. The pilot is just a plain 19kHz carrier. Both left and right channels are low-pass filtered to a band-width of 15kHz. The mono signal is generated from the summation of the left and right channels, whilst the differential is produced from the subtraction of the right from the left channel.

The pilot signal is then doubled and used as the carrier for a double-sideband suppressed carrier signal which is modulated by the differential signal. This is summed with the pilot and mono signal, and that is then frequency-modulated.

For reception, older mono sets just low-pass the raw FM discriminator output (or rely on the fact that most speakers won’t reproduce >18kHz well), whilst a stereo set performs the necessary signal processing to extract the left and right channels.

Below, is a flow-graph in GNURadio companion that shows this:

Flow graph for FM stereo reception

The signal comes in at the top-left via a RTL-SDR. We first low-pass filter it to receive just the station we want (in this case I’m receiving Triple M Brisbane at 104.5MHz). We then pass it through the WBFM de-modulator. At this point I pass a copy of this signal to a waterfall plot. A second copy gets low-passed at 15kHz and down-sampled to a 32kHz sample rate (my sound card doesn’t do 500kHz sample rates!).

A third copy is passed through a band-pass filter to isolate the differential signal, and a fourth, is filtered to isolate the pilot at 19kHz.

The pilot in a real receiver would ordinarily be full-wave-bridge-rectified, or passed through a PLL frequency synthesizer to generate a 38kHz carrier. Here, I used the abs math function, then band-passed it again to get a nice clean 38kHz carrier. This is then mixed with the differential signal I isolated before, then the result low-pass filtered to shift that differential signal to base band.

I now have the necessary signals to construct the two channels: M + D gives us (L+R) + (L-R) = 2L, and M – D = (L+R) – (L – R) = 2R. We have our stereo channels.

Below are the three waterfall diagrams showing (from top to bottom) the de-modulated differential signal, the 38kHz carrier for the differential signal and the raw output from the WBFM discriminator.

The constituent components of a FM stereo radio station.

Not decoded here is the RDS carrier which can be seen just above the differential signal in the third waterfall diagram.

Oct 122019
 

Recently, I’ve been doing a lot of work with 6LoWPAN on the 2.4GHz band. I didn’t have anything that would receive arbitrary signals on this frequency, so I decided to splurge. I got myself my first bit of tax-deductible amateur radio equipment: a HackRF One.

It’s been handy, fire up CubicSDR, and immediately you get a picture of what’s happening on the frequency. In the future I hope to get the WIME framework working so I can decode the 802.15.4 frames and pipe them to Wireshark, but so far, this has been handy.

Since I’m not using it every day, I also put it to a second use, DAB+ reception. I used to listen to various stations a lot, and whilst FM stereo is built into my phone, I’ve got nothing that will do medium-wave AM. The HackRF stops short at 1MHz (officially 10MHz), and needs a suitable antenna to do so. However, it occurred to me that it was more than capable of doing DAB+, so after some experimentation, I managed to get qt-dab working.

Since getting that working, I bought a second SDR, a RTL-SDR v3. The idea is I’d be setting this up on the bicycle with a Raspberry Pi 3 which also has a DRAWS board fitted (the successor to the UDRC). I figured I could use this as a second receiver for amateur radio stuff, or use it for FM stereo/DAB+, maybe short wave.

So today, I was testing this: using the RTL-SDR with a Pi 3, seeing whether it would perform acceptably for that task. Interestingly, CubicSDR will de-modulate FM stereo quite happily when you’re running it via a X11 session forwarded over SSH, but it stutters its way though if you try to run it natively. I think the waterfall displays are too much for the machine to cope with: it can render them, but painting them on the screen causes too much CPU load.

qt-dab however works quite well. It occupies about 60% CPU, which means you don’t want to be doing much else. Whether I can do AX.25 packet simultaneously as planned or not is a valid question. Audio quality through the PWM output on the Pi3 is good too — I did try this with an original Pi and got an aural assault courtesy of the noisy 3.3V power rail, but it seems this problem is largely fixed on the Pi3.

In truth, I’ll probably be using the GNURadio framework directly when I get to implementing this on the bicycle. That makes a custom tailored UI a little easier to implement.

The WTF moment though was whilst putting this rig through its paces… I noticed a new station:

ELF Radio, a station dedicated to Christmas Carols

A new station, “ELF Radio” had appeared in multiplex 9A (202.928MHz)… this is exactly what it sounds like, a station dedicated to Christmas carols. We’re not even half-way though October, and they’re already out to flog the genre to death.

Now, Christmas rage was not a thing when I was younger, it seems the marketing world is intent on ruining this tradition by making excuses for starting the sales earlier and earlier… and it seems the “ambience” is part of the package deal that they insist must start long before that Celtic tradition, Halloween! As a result, most of us are thoroughly fed up by the time December rolls around.

Here’s a hint advertisers: playing this crap so soon in the year will not result in higher sales. It’s a sales repellent!

Jul 182019
 

So, a few months back I had the failure of one of my storage nodes. Since I need 3 storage nodes to operate, but can get away with a single compute node, I did a board-shuffle. I just evacuated lithium of all its virtual machines, slapped the SSD, HDD and cover from hydrogen in/on it, and it became the new storage node.

Actually I took the opportunity to upgrade to 2TB HDDs at the same time, as well as adding two new storage nodes (Intel NUCs). I then ordered a new motherboard to get lithium back up again. Again, there was an opportunity to upgrade, so ~$1500 later I ordered a SuperMicro A2SDi-16C-HLN4F. 16 cores, and full-size DDR4 DIMMs, so much easier to get bits for. It also takes M.2 SATA.

The new board arrived a few weeks ago, but I was heavily snowed under with activities surrounding Brisbane Area WICEN Group and their efforts to assist the Stirling’s Crossing Endurance Club running the Tom Quilty 2019. So it got shoved to the side with the RAM I had purchased to be dealt with another day.

I found time on Monday to assemble the hardware, then had fun and games with the UEFI firmware on this board. Put simply, the legacy BIOS support on this board is totally and utterly broken. The UEFI shell is also riddled with bugs (e.g. ifconfig help describes how to bring up an interface via DHCP or statically, but doing so fails). And of course, PXE is not PXE when UEFI is involved.

I ended up using Ubuntu’s GRUB binary and netboot image to boot-strap the machine, after which I could copy my Gentoo install back in. I now have the machine back in the rack, and whilst I haven’t deployed any VMs to it yet, I will do so soon. I did however, give it a burn-in test updating the kernel:

  LD [M]  security/keys/encrypted-keys/encrypted-keys.ko
  MKPIGGY arch/x86/boot/compressed/piggy.S
  AS      arch/x86/boot/compressed/piggy.o
  LD      arch/x86/boot/compressed/vmlinux
ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text'
ld: warning: creating a DT_TEXTREL in object.
  ZOFFSET arch/x86/boot/zoffset.h
  OBJCOPY arch/x86/boot/vmlinux.bin
  AS      arch/x86/boot/header.o
  LD      arch/x86/boot/setup.elf
  OBJCOPY arch/x86/boot/setup.bin
  BUILD   arch/x86/boot/bzImage
Setup is 16444 bytes (padded to 16896 bytes).
System is 6273 kB
CRC ca5d7cb3
Kernel: arch/x86/boot/bzImage is ready  (#1)

real    7m7.727s
user    62m6.396s
sys     5m8.970s
lithium /usr/src/linux-stable # git describe
v5.1.11

7m for make -j 17 to build a current Linux kernel is not bad at all!

Jun 022019
 

There’s a couple of truths in life:

  • You don’t get to choose your biological family
  • You don’t get to choose your place of birth

Now, as it happens I ordinarily do not have any real issues with my family or my place of birth, except on one matter: I have never possessed a driver’s license, and really don’t wish to obtain one.

I can get around just fine on my bicycle when I need to. That mode of transport is not nearly as limiting as people think it is. Sure, it’ll take me longer to get places, and I need to perhaps do more planning than most, but I can get where I’m needed.

Yet, time and time again, I run up against the same problem: people assume that people my age, drive cars. People then make the leap to suggest that you’re a useless person if you don’t drive.

I did try to obtain a learner’s permit some time ago. I tried the written test twice: at $20 a pop, at a time when I was unemployed. I wasn’t sure how I was going to fund obtaining a vehicle and paying the necessary fees, but I figured I’d try the first step.

I failed both attempts on one question.

I decided that an identity card was more important: I researched what documentation was required, paid my dues, handed over said documentation, wandered out with a new 18+ card. I figured if I needed to try the driver’s license again, I’d be back.

That was in December 2007. The requirements for obtaining a license have since become more onerous, and let’s face it, there are too many cars on the road today. I’d be looking at taking about 200 hours off from work in order to get the necessary log-book time up and spending tens of thousands of dollars on driving lessons. It isn’t financially worth it.

I re-discovered cycling about 6 months later. I bought a folding bicycle, and started using that to get around, and realised that this was a viable mode of transport for me. Over time, I did longer and longer trips.

The longest I’ve gone unsupported was about 82km. A ride from my home at The Gap to the park at Logan Central takes about 3 hours each way with a couple of rest stops en route. I get going early, take my time, and get there without any trouble.

My work is at Milton, a run of about 10km: I can get there in an hour: faster than public transport. In the early mornings, my times tend to be closer to 45 minutes.

In short, there is just no useful purpose for me to have a car. More to the point, I’d have nowhere to park it. What limited space is available at the front of our property is occupied by a caravan and the neighbours’ numerous cars. If it weren’t for the caravan in fact, it would be all cars belonging to the neighbours.

Moreover, my body actually needs the physical exercise. It’s a fact that moving around is required to keep bodily functions working. You don’t move around enough: bowel movements slow down. I already had one bowel-related health scare this year.

I have not been riding much lately due to scheduling — and I feel my health is suffering greatly because of it.

In spite of this, I still get people, family included, shaking their metaphorical car keys in my face suggesting I should be driving too.

It’s as if, as a non-driver, you’re not welcome in this society. You’re seen as a waste of space — you don’t belong here. We’re seen as “shits” that are there wasting other peoples’ money.

I’ve had a lifetime of that sort of treatment for numerous reasons.

Back in the late 80s, the argument was that I had an Autism diagnosis, therefore I should be going into institutionalised care. Then the same condition was used to argue that I belonged in a special school. At high school, the same reasoning was probably used to put me in the lowest-grade maths and English classes.

I am generally able to focus on a task and do it well. This is probably the reason why I wound up doing double Bachelor-level IT/electronics degrees at uni, and passing both.

I could have instead just been institutionalised. Occupying a tax-payer funded bed. I’d be a record in the NDIS system today. Completely un-employable, generally useless. Definitely not earning >$60000/year doing full-stack software development. There is income tax being paid amongst that — whether my day job is actually worth what I get paid is a debate I’ll leave for others.

The fact remains that I work for a living, and pay my own way.

However, there is a difference to laying out a PCB or writing a code module; and manoeuvring ~600kg of metal travelling at 50+km/hr through suburban roads. One requires focus and patience, the other requires millisecond-level decision-making and reaction times.

I am not someone who thinks well at speed, and I would make no friends driving a car along Waterworks Road at 30km/hr in the morning peak-hour traffic. At 30-40km/hr, I can just manage on the bicycle. I can do up to 60km/hr, but I’m not comfortable at all going that speed!

In a car, you are expected to do the speed limit (50-60km/hr in the case of Waterworks Road). Brisbane’s drivers are not forgiving of anyone who can’t “keep up”.

There are people who have no place driving a car, and I would count myself as being a member of that group. I avoid being on the roads much of the time for that very reason — as a courtesy to drivers who would likely prefer to not be stuck behind a slow cyclist like myself.

Coupled with the health problems: me taking up driving would be an early death sentence. If this is really what is expected, I might as well stop now and get the dying bit over and done with, it’ll be one less person on this planet consuming ever dwindling resources.

It’ll be more humane for me to just quietly go, then to be constantly in and out of medical care for “this” medical condition, or “that” medical condition, costing my employer sick-leave, costing my health fund, occupying resources in our health system, simply because I didn’t get enough exercise.

If a non-driver like me is as useless as people make out, then I guess it won’t hurt anyone that I’m gone. … or maybe we can re-think the “non-drivers are useless” concept. One of the ideas in this paragraph is wrong. I’ve given up trying to decide which!

May 292019
 

It’s been on my TO-DO list now for a long time to wire in some current shunts to monitor the solar input, replace the near useless Powertech solar controller with something better, and put in some more outlets.

Saturday, I finally got around to doing exactly that. I meant to also add a low-voltage disconnect to the rig … I’ve got the parts for this but haven’t yet built or tested it — I’d like to wait until I have done both,but I needed the power capacity. So I’m running a risk without the over-discharge protection, but I think I’ll take that gamble for now.

Right now:

  • The Powertech MP-3735 is permanently out, the Redarc BCDC-1225 is back in.
  • I have nearly a dozen spare 12V outlet points now.
  • There are current shunts on:
    • Raw solar input (50A)
    • Solar controller output (50A)
    • Battery (100A)
    • Load (100A)
  • The Meanwell HEP-600C-12 is mounted to the back of the server rack, freeing space from the top.
  • The janky spade lugs and undersized cable connecting the HEP-600C-12 to the battery has been replaced with a more substantial cable.

This is what it looks like now around the back:

Rear of the rack, after re-wiring

What difference has this made? I’ll let the graphs speak. This was the battery voltage this time last week:

Battery voltage for 2019-05-22

… and this was today…

Battery voltage 2019-05-29

Chalk-and-bloody-cheese! The weather has been quite consistent, and the solar output has greatly improved just replacing the controller. The panels actually got a bit overenthusiastic and overshot the 14.6V maximum… but not by much thankfully. I think once I get some more nodes on, it’ll come down a bit.

I’ve gone from about 8 hours off-grid to nearly 12! Expanding the battery capacity is an option, and could see the cluster possibly run overnight.

I need to get the two new nodes onto battery power (the two new NUCs) and the Netgear switch. Actually I’m waiting on a rack-mount kit for the Netgear as I have misplaced the one it came with, failing that I’ll hack one up out of aluminium angle — it doesn’t look hard!

A new motherboard is coming for the downed node, that will bring me back up to two compute nodes (one with 16 cores), and I have new 2TB HDDs to replace the aging 1TB drives. Once that’s done I’ll have:

  • 24 CPU cores and 64GB RAM in compute nodes
  • 28 CPU cores and 112GB RAM in storage nodes
  • 10TB of raw disk storage

I’ll have to pull my finger out on the power monitoring, there’s all the shunts in place now so I have no excuse but to make up those INA-219 boards and get everything going.

May 252019
 

So recently I was musing about how I might go about expanding the storage on the cluster. This was largely driven by the fact that I was about 80% full, and thus needed to increase capacity somehow.

I also was noting that the 5400RPM HDDs (HGST HTS541010A9E680), now with a bit of load, were starting to show signs of not keeping up. The cases I have can take two 2.5″ SATA HDDs, one spot is occupied by a boot drive (120GB SSD) and the other a HDD.

A few weeks ago, I had a node fail. That really did send the cluster into a spin, since due to space constraints, things weren’t as “redundant” as I would have liked, and with one disk down, I/O throughput which was already rivalling Microsoft Azure levels of slow, really took a bad downward turn.

I hastily bought two NUCs, which I’m working towards deploying… with those I also bought two 120GB M.2 SSDs (for boot drives) and two 2TB HDDs (WD Blues).

It was at that point I noticed that some of the working drives were giving off the odd read error which was throwing Ceph off, causing “inconsistent” placement groups. At that point, I decided I’d actually deploy one of the new drives (the old drive was connected to another node so I had nothing to lose), and I’ll probably deploy the other shortly. The WD Blue 2TB drives are also 5400RPM, but unlike the 1TB Hitachis I was using before, have 128MB of cache vs just 8MB.

That should boost the read performance just a little bit. We’ll see how they go. I figured this isn’t mutually exclusive to the plans of external storage upgrades, I can still buy and mod external enclosures like I planned, but perhaps with a bit more breathing room, the immediate need has passed.

I’ve since ordered another 3 of these drives, two will replace the existing 1TB drives, and a third will go back in the NUC I stole a 2TB drive from.

Thinking about the problem more, one big issue is that I don’t have room inside the case for 3 2.5″ HDDs, and the motherboards I have do not feature mSATA or M.2 SATA. I might cram a PCIe SSD in, but those are pricey.

The 120GB SSD is only there as a boot drive. If I could move that off to some other medium, I could possibly move to a bigger SSD in place of the 120GB SSD, maybe a ~500GB unit. These are reasonably priced. The issue is then where to put the OS.

An unattractive option is to shove a USB stick in and boot off that. There’s no internal USB ports, but there are two front USB ports in the case I could rig up to an internal header so they’re not sticking out like a sore thumb(-drive) begging to be broken off by a side-wards slap. The flash memory in these is usually the cheapest variety, so maybe if I went this route, I’d buy two: one for the root FS, the other for swap/logs.

The other option is a Disk-on-Module. The motherboards provide the necessary DC power connector for running these things, and there’s a chance I could cram one in there. They’re pricey, but not as bad as going NVMe SSDs, and there’s a greater chance of success squeezing this in.

Right now I’ve just bought a replacement motherboard and some RAM for it… this time the 16-core model, and it takes full-size DIMMs. It’ll go back in as a compute node with 32GB RAM (I can take it all the way to 256GB if I want to). Coupled with that and a purchase of some HDDs, I think I’ll let the bank account cool off before I go splurging more. 🙂

May 242019
 

Recently, I had a failure in the cluster, namely one of my nodes deciding to go the way of the dodo. I think I’ve mostly recovered everything from that episode.

I bought some new nodes which I can theoretically deploy as spare nodes, Core i5 Intel NUCs, and for now I’ve temporarily decommissioned one of my compute nodes (lithium) to re-purpose its motherboard to get the downed storage node back on-line. Whilst I was there, I went and put a new 2TB HDD in… and of course I left the 32GB RAM in, so it’s pretty much maxxed out.

I’d like to actually make use of these two new nodes, however I am out of switch capacity, with all 26 ports of the Linksys LGS-326AU occupied or otherwise reserved. I did buy a Netgear GS748T with the intention of moving across to it, but never got around to doing so.

The principle matter here being that the Netgear requires a wee bit more power. AC power ratings are 100-250V, 1.5A max. Now, presumably the 1.5A applies at the 100V scale, that’s ~150W. Some research suggested that internally, they run 12V, that corresponds to about 8.5A maximum current.

This is a bit beyond the capabilities of the MIC29712s.

I wound up buying a DC-DC power supply, an isolated one as that’s all I could get: the Meanwell SD-100A-12. This theoretically can take 9-18V in, and put out 12V at up to 8.5A. Perfect.

Due to lack of time, it sat there. Last week-end though, I realised I’d probably need to consider putting this thing to use. I started by popping open the cover and having a squiz inside. (Who needs warranties?)

The innards of the GS-748Tv5, ruler for scale

I identified the power connections. A probe around with the multimeter revealed that, like the Linksys, it too had paralleled conductors. There were no markings on the PSU module, but un-plugging it from the mainboard and hooking up the multimeter whilst powering it up confirmed it was a 12V output, and verified the polarity. The colour scheme was more sane: Red/Yellow were positive, Black/Blue were negative.

I made a note of the pin-out inside the case.

There’s further DC-DC converters on-board near the connector, what their input range is I have no idea. The connector on the mainboard intrigued me though… I had seen that sort of connector before on ATX power supplies.

The power supply connector, close up.

At the other end of the cable was a simple 4-pole “KK”-like connector with a wider pin spacing (I think ~3mm). Clearly designed with power capacity in mind. I figured I had three options:

  1. Find a mating connector for the mainboard socket.
  2. Find a mating header for the PSU connector.
  3. Ram wires into the plug and hot-glue in place.

As it happens, option (1) turned out easier than I thought it would be. When I first bought the parts for the cluster, the PicoPSU modules came with two cables: one had the standard SATA and Molex power connectors for powering disk drives, the other came out to a 4-pin connector not unlike the 6-pole version being used in the switch.

Now you’ll note of those 6 poles, only 4 are actually populated. I still had the 4-pole connectors, so I went digging, and found them this evening.

One of my 4-pole 12V connectors, with the target in the background.

As it happens, the connectors do fit un-modified, into the wrong 4 holes — if used unmodified, they would only make contact with 2 of the 4 pins. To make it fit, I had to do a slight modification, putting a small chamfer on one of the pins with a sharp knife.

After a slight modification, the connector fits where it is needed.

The wire gauge is close to that used by the original cable, and the colour coding is perfect… black corresponds to 0V, yellow to +12V. I snipped off the JST-style connector at the other end.

I thought about pulling out the original PSU, but then realised that there was a small hole meant for a Kensington-style lock which I wasn’t using. No sharp edges, perfect for feeding the DC cables through. I left the original PSU in-situ, and just unplugged its DC output.

The DC input leads snake through the hole that Netgear helpfully provided.

Bringing the DC power input to the outside.

Before putting the screws in, I decided to give this a test on the bench supply. The switch current fluctuates a bit when booting, but it seems to settle on about 1.75A or so. Not bad.

Testing the switch running on 12V

Terminating this, I decided to use XT-60 connectors. I wanted something other than the 30A “powerpoles” and their larger 50A cousins that are dotted throughout the cluster, as this needed to be regulated 12V. I did not want to get it mixed up with the raw 12V feed from the batteries.

I ran some heavier gauge cable to the DC-DC PSU, terminated with the mating XT-60 connector and hooked that up to my PSU. Providing it with 12V, I dialled the output to 12V exactly. I then gave it a no-load test: it held the output voltage pretty good.

Next, I hooked the switch up to the new PSU. It fired up and I measured the voltage now under load: it still remained at 12V. I wound the voltage down to 9V, then up to 15V… the voltage output never shifted. At 9V, the current consumption jumps up to about 3.5A, as one would expect.

Otherwise, it seemed to be content to draw under 2A so the efficiency of the DC-DC converter is pretty good.

I’ll need to wire in a new fuse box to power everything, but likely the plan will be to decommission the 16-port 100Mbps switch I use for the management network, slide the 48-port switch in its place, then gradually migrate everything across to the new switch.

Overall, the modding of this model switch was even less invasive than that of the Linksys. It’s 100% reversible. I dare say having posted this, there’ll be a GS748Tv6 that’ll move the 240V PSU to the mainboard, but for now at least, this is definitely a switch worth looking at if 12V operation is needed.

May 242019
 

So, in my workplace we’re developing a small energy/water metering device, which runs on a 6LoWPAN network and runs OpenThread-based firmware. The device itself is largely platform-agnostic, requiring a simple CoAP gateway to provide it with a configuration blob and history write end-point. The gateway service I’m running is intended to talk to WideSky.

One thorny issue we need to solve before deploying these things in the wild, is over-the-air updates. So we need a way to transfer the firmware image to the device over the mesh network. Obviously, this firmware image needs to be digitally signed and hashed using a strong cryptographic hash — I’ve taken care of this already. My problem is downloading an image that will be up to 512kB in size.

Thankfully, the IETF has thought of this, the solution to big(gish) files over CoAP is Block-wise transfers (RFC-7959). This specification gives you the ability to break up a large payload into smaller chunks that are powers of two in size between 16 and 2048 bytes.

6LoWPAN itself though has a limitation: the IEEE 802.15.4 radio specification it is built on cannot send frames bigger than 128 bytes. Thus, any message sent via this network must be that size or smaller. IPv6 has a minimum MTU of 1280 bytes, so how do they manage? They fragment the IPv6 datagram into multiple 802.15.4 frames. The end device re-assembles the fragments as it receives them.

The catch is, if a fragment is lost, you lose the entire datagram, there’s no repeats of individual fragments, the entire datagram must be re-sent. The question in my mind was this: Is it faster to rely on block-wise transfers to break the payload up and make lots of small requests, or is it faster to rely on 6LoWPAN fragmentation?

The test network here has a few parts:

  • The target device, which will be downloading a 512kB firmware image to a separate SPI flash chip.
  • The border router, which provides a secure IPv6 tunnel to a cloud host.
  • The cloud server which runs the CoAP service we’ll be talking to.

The latency between the office (in Brisbane) and the cloud server (in Sydney) isn’t bad, about 30~50ms. The CoAP service is built using node-coap and coap-polka.

My CoAP requests have some overheads:

  • The path being downloaded is about 19 bytes long.
  • There’s an authentication token given as a query string, so this adds an additional 12 bytes.

The data link is not 100% reliable, with the device itself dropping some messages. This is leading to some retransmits. The packet loss is not terrible, but probably in the region of around 5%. Over this slightly lossy link, I timed the download of my 512kB firmware image by my device with varying block size settings.

Note that node-coap seems to report a “Bad Option” error for szx=0 and szx=7, even though both are legitimately within specification. (I’d expect node-coap to pass szx=7 through and allow the application to clamp it to 6, but it seems node-coap‘s behaviour is to report “Bad option”, but then pass the payload through anyway.)

Size Exponent (szx)Block sizeStart time (UTC)End time (UTC)Effective data rate
6102403:27:0403:37:52809B/s
551203:41:2503:53:40713B/s
425603:57:1504:16:16458B/s
312804:17:4604:54:17239B/s
26404:56:0905:54:53150B/s
13223:31:3301:39:4468B/s

It remains to be seen how much multiple hops and outdoor atmospherics affect the situation. A factor here is how quickly the device can turn-around between processing a response and sending the next request, which in this case is governed by the speed of the SPI flash and its driver.

Effects on “busy” networks

So, I haven’t actually done any hard measurements here, but doing testing on a busier network with about 30 nodes, the block size equation tips more in favour of a smaller block size.

I’ll try to quantify the delays at some point, but right now 256 byte blocks are the clear winner, with 512 and 1024 byte block transfers proving highly unreliable. The speed advantage between 1k and 512 bytes in ideal conditions was a big over 10%… which really doesn’t count for much. At 256 bytes, the speed difference was about 43%, quite significant. You’re better off using 512-byte blocks if the network is quiet.

On a busy network, with all the retransmissions, smaller is better. No hard numbers yet, but right now at 256 byte blocks, the effective rate is around 118 bytes/sec. I’ll have to analyse the logs a bit to see where 512/1024 byte block sizes sat, but the fact they rarely completed says it all IMO.

Slow and steady beats fast and flakey!

May 182019
 

Seriously, if you think this is a good way to earn some yuan, think again. I just got this email this afternoon:


Dear CEO,
(It’s very urgent, please transfer this email to your CEO. If this email affects you, we are very sorry, please ignore this email. Thanks)
We are a Network Service Company which is the domain name registration center in China.
We received an application from Hua Hai Ltd on May 14
, 2019. They want to register ” stuartl.longlandclan ” as their Internet Keyword and ” stuartl.longlandclan .cn “、” stuartl.longlandclan .com.cn ” 、” stuartl.longlandclan .net.cn “、” stuartl.longlandclan .org.cn ” 、” stuartl.longlandclan .asia “、domain names, they are in China and Asia domain names. But after checking it, we find ” stuartl.longlandclan ” conflicts with your company. In order to deal with this matter better, so we send you email and confirm whether this company is your distributor or business partner in China or not?
 


Best Regards
**************************************
Mike Zhang | Service Manager
Cn YG Domain (Head Office)
Contact details censored as I do not wish to promote their business
*************************************

The wording is identical to that seen in this article on squelchdesign. Knowing this to be a scam, I did two things:

  1. As per my standard policy, I forwarded it to SpamCop. The source of the email was Baidu’s own network.
  2. I figured since it’s obviously a scam and since these people seemingly do not learn from the skirmishes with others, I’d have some fun with them:

On 18/5/19 11:46 am, Mike Zhang wrote:

Dear CEO, (It’s very urgent, please transfer this email to your CEO. If this email affects you, we are very sorry, please ignore this email. Thanks)

You want this to go to my CEO? Does every individual person in China have their own personal CEO? Is that why they have such a big population? Please keep in mind what the .id.au domain suffix is for: INDIVIDUALS.

We are a Network Service Company which is the domain name registration center in China.

Ahh, so you must know the rules around domain registrations, like the .id.au domain suffix being non-commercial.

We received an application from Hua Hai Ltd on May 14, 2019. They want to register ” stuartl.longlandclan ” as their Internet Keyword and ” stuartl.longlandclan .cn “、” stuartl.longlandclan .com.cn ” 、” stuartl.longlandclan .net.cn “、” stuartl.longlandclan .org.cn ” 、” stuartl.longlandclan .asia “、domain names, they are in China and Asia domain names.

They must be rich. They also wanted bellavitosi .cn, bellavitosi.com.cn, bellavitosi.net.cn, bellavitosi.org.cn, bellavitosi.asia, formula1-dictionary.cn, formula1-dictionary.com.cn, formula1-dictionary.net.cn, formula1-dictionary.org.cn and formula1-dictionary.asia.

What does this group do? Are they a subsiduary of BaoYuan Ltd? I hear pan xiaohong has wealth that rivals Jack Ma.

But after checking it, we find ” stuartl.longlandclan ” conflicts with your company. In order to deal with this matter better, so we send you email and confirm whether this company is your distributor or business partner in China or not?

Well, this “company” does not exist, so can’t possibly have a partner in China. I say to them, go ahead and register those domain names, I dare you, it’ll cost you a lot more than it will cost me.

Errm, yeah… the SEO spammers are slowly learning not to mess with me as I’ll just report the email as spam and will tweak mail server settings to ensure you stay blocked. Or I may choose to publicly ridicule you like I have done here.

The worst they can do is actually follow through and register all those domains, which will cost them an absolute bloody fortune (.asia domains are not cheap!) and my content is already well known with the search engines — it’s not like I rely on my online presence for an income anyway as I have a day job. Anything I do here is for self-education and training.

All this mob is doing, is destroying the image of some innocent company in Hong Kong, which are likely nothing to do with this scam. Seriously guys, get a real job!

May 142019
 

Well, it had to happen some day, but I was hoping it’d be a few more years off… I’ve had the first node failure on the cluster.

One of my storage nodes decided to keel over this morning, some time between 5 and 8AM… sending the cluster into utter chaos. I tried power cycling the host a few times before finally yanking it from the DIN rail and trying it on the bench supply. After about 10 minutes of pulling SO-DIMMs and general mucking around trying to coax it to POST, I pulled the HDD out, put that in an external dock and connected that to one of the other storage nodes. After all, it was approaching 9AM and I needed to get to work!

A quick bit of work with ceph-bluestore-tool and I had the OSD mounted and running again. The cluster is moaning that it’s lost a monitor daemon… but it’s still got the other two so provided that I can keep O’Toole away (Murphy has already visited), I should be fine for now.

This evening I took a closer look, tried the RAM I had in different slots, even with the RAM removed, there’s no signs of life out of the host itself: I should get beep codes with no RAM installed. I ran my multimeter across the various power rails I could get at: the 5V and 12V rails look fine. The IPMI BMC works, but that’s about as much as I get. I guess once the board is replaced, I might take a closer look at that BMC, see how hackable it is.

I’ve bought a couple of spare nodes which will probably find themselves pressed into monitor node duty, two Intel NUC7I5BNHs have been ordered, and I’ll pick these up later in the week. Basically one is to temporarily replace the downed node until such time as I can procure a more suitable motherboard, and the other is a spare.

I have a M.2 SATA SSD I can drop in along with some DDR4 RAM I bought by mistake, and of course the HDD for that node is sitting in the dock. The NUCs are perfectly fine running between 10.8V right up to 19V — verified on a NUC6CAYS, so no 12V regulator is needed.

The only down-side with these units is the single Ethernet port, however I think this will be fine for monitor node duty, and two additional nodes should mean the storage cluster becomes more resilient.

The likely long-term plan may be an upgrade of one of the compute nodes. For ~$1600, I can get a A2SDi-16C-HLN4F, which sports 16 cores and takes full-size DDR4 DIMMs. I can then rotate the board out of that into the downed node.

The full-size DIMMS are much more readily available in ECC format, so that should make long-term support of this cluster much easier as the supplies of the SO-DIMMs are quickly drying up.

This probably means I should pull my finger out and actually do some of the maintenance I had been planning but put off… largely due to a lack of time. It’s just typical that everything has to happen when you are least free to deal with it.