Projects

Project logs for what I’ve been working on

Leave at last

So… I’ve been busy at work lately, and that for the last few months has been my primary focus. A big reason why I’ve been keeping my head low is because a few years ago, it was pointed out that I had been physically with the company I’m working for for about 10 years.

Here in Australia, that milestone grants long-service leave; a bonus 8⅔ weeks of leave. This is something that’s automatic for full-time employees of a company, and harks back to the days of when people used to travel to Australia from England by ship to work, this gave that person an opportunity to travel back and visit family “back home”.

But at the time, I wasn’t there yet! See, for the first few years I was a contractor, so not a full-time employee. I didn’t become full-time until 2013, meaning my 10 years would not tick up until this year.

While the milestone is 99% symbolic, the thing is at my age (nearing 40), I’m unlikely to ever see that milestone come up again. If I did something that blew it or put it in jeopardy in any way, it’d be up in smoke.

There are some select cases where such leave may be granted early (between 7-10 years):

  • if the person dies, suffers total physical disability or serious illness
  • the person’s position becomes untenable
  • the person’s domestic situation forces them to leave (e.g. dropping out of work to become a carer for a family member)
  • the employer dismisses the person for reasons other than that person’s performance, conduct or capacity
  • unfair dismissal

I thought, it was worth sticking it out… after 10 years, it’s a done deal, the person is entitled to the full amount. If they booted me out after that, they’d still have to pay out that, plus the holiday leave (which I still have lots because I haven’t taken much since 2018).

Employment plans

Right now, I’m not going anywhere, I’ve got nowhere to go anyway. While doing work on things like electricity billing brings me no joy whatsoever (“bill shock as-a-service” is what it feels like), it pays the bills, and I’m not quite at the point where I can safely let it all go and coast into an early retirement.

Work has actually changed a lot in the past few years. Years ago, we did a lot of Python work, I also did some work in C. Today, it’s lots of JavaScript, which is an idiosyncratic language to say the least, and don’t get me started on the moving target that is UI frameworks for web pages!

Dealing with the disaster that is Office365 (which is threatening to invade even more into my work) doesn’t make this any easier, but sadly that piece of malware has infected most organisations now. (Just do a dig -t MX <employer-domain>, or just look at the headers of an email from their employees, many show Office365 today). I’ve so far dodged Microsoft Teams which I now flatly refuse to use as I do not consent to my likeness/voice being used in AI models and Microsoft isn’t being open about how it uses AI.

Most people my age get shepherded into management positions, really not my scene. In a new job I’d be competing with 20-somethings that are more familiar with the current software stacks. Non-technical jobs exist, but many assume you own a motor vehicle and possess the requisite license to operate it.

This pretty much means I’m unemployable if I leave or are booted out, so whatever I have in the bank balance needs to make it through until my time on this planet is done.

Thus, I must stick it out a bit longer… I might not get the 15-year bonus (4⅓ weeks), but at least I can’t lose what I have now. If excrement does meet a rotary cooling device though, simulations suggest with some creative accounting, I may just scrape through. I don’t plan on setting up a donations page and talking to Centrelink is a waste of time, I’ll die a pauper before they answer the phone.

Plans for this month

So I have holiday leave off until November. Unlike previous times I’ve taken big amounts off, I won’t be travelling far this time around. Instead, it’s a project month.

Financial work

I need to plan ahead for the possibility that I wind up in long-term unemployment. I don’t expect to live long (the planet cannot sustain everyone living to >100 years), but I do need to be around to finalise the estates of my parents and see my cat out.

That suggests I need to keep the lights on for another 20~30 years. Presently my annual expenditure has been around the $30k mark, but much of that is discretionary (most of it has been on music), and I could possibly reduce that to around the $10k mark.

I have some shares, but need to expand on this further. David Rowe posted some ideas in a two part series which provides some food for thought here.

At the moment, I’m nowhere near that 10% yield figure mentioned…that post was written in 2015 and lot has changed in 8 years. Interest rates are currently at ~5% for term deposits.

I do plan to start one though all the same. After Suncorp closed both The Gap and Ashgrove branches (forcing me all the way to Michelton), I set up an account at BOQ who have branches in both Ashgrove and The Gap… so I can do a term deposit with either, and they’re both offering a 5% 12-month term deposit.

I have a year’s worth sitting at BOQ in an interest bearing account… so that’s money that’s readily accessible. The remainder I have, I plan to split — some going into the aforementioned term deposit, the other will go into that interest bearing account in case I decide to buy more shares.

That should start building the reserves up.

Hardware refurbishment and replacement

Some of my equipment is getting a bit long in the tooth. The old desktop I bought back in 2010 is playing silly-buggers at the moment, and even the laptop I’m typing this on is nearing 10 years old. I have one desktop which used to be my office workstation before the pandemic, so between it and the old desktop, I have decent processing capacity.

The server rack needs work though. One compute node is down, and I’m actually running out of space. I also need to greatly expand the battery bank. I bought a full-height open-frame rack to replace the old one, and was gifted a new solar controller, so some time during this break, I’ll be assembling that, moving the old servers into it… and getting the replacement compute node up and running.

Software updates

I’ve been doing this to critical servers… I recently replaced the mail server with a new VM instance which made the maintenance work-load a lot lower… but there’s still some machines that need my attention.

I’m already working on getting my Mastodon instance up to release 4.2.0 (I bumped it to 4.1.9 to at least get a security patch off my back), there are a couple of OpenBSD routers that need updates and some similar remedial work.

Projects

Already mentioned is the server cluster (hardware and software), but there are some other projects that need my attention.

  • setuptools-pyproject-migration is a project that David Zaslavsky and I have been working on that is intended to help projects migrate from the old setup.py scripts in Python projects to the new pyproject.toml structure. Work has kept me busy, but the project is nearly ready for the first release. I need to help finish up the bits that are missing, and get that out there.
  • aioax25 could use some love, connected mode nearly works, plus it could do with a modernisation.
  • Brisbane WICEN‘s RFID tracking project is something I have not posted much about, but nonetheless got a lot of attention at the Tom Quilty this year, this needs further work.

Self-Training

Some things I’d like to try and get my head around, if possible…

  • Work uses NodeJS for a lot of things, but we’re butting up against its limits a lot. We use a lot of projects that are written in GoLang (e.g. InfluxDB, Grafana, Terraform, Vault), and while I did manage to hack some features into s3sync needed for work, I should get to know GoLang properly.
  • Rust interests me a lot. I should at least have a closer look at this and learn a little. It has been getting a mention around the office in the context of writing NodeJS extensions. Definitely worth looking into further.
  • I need to properly get to understand OAuth2, as I don’t think I completely understand it as it stands now. I’m not sure I’m doing it “right”.
  • COSE would have applications in both the WideSky Hub (end-to-end encryption) and in Brisbane WICEN’s RFID tracking system (digital signatures).

Physical exercise

I have not been out on the bike for some time, and it shows! I need to get out more. I intend to do quite a bit of that over the next few weeks.

Maybe I might do the odd over-nighter, but we’ll see.

6LoWHAM: TCP over sensor networks

I stumbled across this article regarding the use of TCP over sensor networks. Now, TCP has been done with AX.25 before, and generally suffers greatly from packet collisions. Apparently (I haven’t read more than the first few paragraphs of this article), implementations TCP can be tuned to improve performance in such networks, which may mean TCP can be made more practical on packet radio networks.

Prior to seeing this, I had thought 6LoWHAM would “tunnel” TCP over a conventional AX.25 connection using I-frames and S-frames to carry TCP segments with some header prepended so that multiple TCP connections between two peers can share the same AX.25 connection.

I’ve printed it out, and made a note of it here… when I get a moment I may give this a closer look. Ultimately I still think multicast communications is the way forward here: radio inherently favours one-to-many communications due to it being a shared medium, but there are definitely situations in which being able to do one-to-one communications applies; and for those, TCP isn’t a bad solution.

Comments having read the article

So, I had a read through it. The take-aways seem to be this:

  • TCP was historically seen as “too heavy” because the MCUs of the day (circa 2002) lacked the RAM needed for TCP data structures. More modern MCUs have orders of magnitude more RAM (32KiB vs 512B) today, and so this is less of an issue.
    • For 6LoWHAM, intended for single-board computers running Linux, this will not be an issue.
  • A lot of early experiments with TCP over sensor networks tried to set a conservative MSS based on the actual link MTU, leading to TCP headers dominating the lower-level frame. Leaning on 6LoWPAN’s ability to fragment IP datagrams lead to much improved performance.
    • 6LoWHAM uses AX.25 which can support 256-byte frames; vs 128-byte 802.15.4 frames on 6LoWPAN. Maybe gains can be made this way, but we’re already a bit ahead on this.
  • Much of the document considered battery-powered nodes, in which the radio transceiver was powered down completely for periods of time to save power, and the effects this had on TCP communications. Optimisations were able to be made that reduced the impact of such power-down events.
    • 6LoWHAM will likely be using conventional VHF/UHF transceivers. Hand-helds often implement a “battery saver” mode — often this is configured inside the device with no external control possible (thus it will not be possible for us to control, or even detect, when the receiver is powered down). Mobile sets often do not implement this, and you do not want to frequently power-cycle a modern mobile transceiver at the sorts of rates that 802.15.4 radios get power-cycled!
  • Performance in ideal conditions favoured TCP, with the article authors managing to achieve 30% of the raw link bandwidth (75kbps of a theoretical 250kbps maximum), with the underlying hardware being fingered as a possible cause for performance issues.
    • Assuming we could manage the same percentage; that would equate to ~360bps on 1200-baud networks, or 2.88kbps on 9600-baud networks.
  • With up to 15% packet loss, TCP and CoAP (its nearest contender) can perform about the same in terms of reliability.
  • A significant factor in effective data rate is CSMA/CA. aioax25 effectively does CSMA/CA too.

Its interesting to note they didn’t try to do anything special with the TCP headers (e.g. Van Jacobson compression). I’ll have to have a look at TCP and see just how much overhead there is in a typical segment, and whether the roughly double MTU of AX.25 will help or not: the article recommends using MSS of approximately 3× the link MTU for “fair” conditions (so ~384 bytes), and 5× in “good” conditions (~640 bytes).

It’s worth noting a 256-byte AX.25 frame takes ~2 seconds to transmit on a 1200-baud link. You really don’t want to make that a habit! So smaller transmissions using UDP-based protocols may still be worthwhile in our application.

6LoWHAM: Thoughts on how to distribute context and applications

So, one evening I was having difficulty sleeping, so like some people count sheep, turned to a different problem…6LoWPAN relies on all nodes sharing a common “context”. This is used as a short-hand to “compress” the rather lengthy IPv6 addresses for allowing two nodes to communicate with one another by substituting particular IPv6 address subnets with a “context number” which can be represented in 4 bits.

Fundamentally, this identifier is a stand-in for the subnet address. This was a sticking-point with earlier thoughts on 6LoWHAM: how do we agree on what the context should be? My thought was, each network should be assigned a 3-bit network ID. Why 3-bit? Well, this means we can reserve some context IDs for other uses. We use SCI/DCI values 0-7 and leave 8-15 reserved; I’ll think of a use for the other half of the contexts.

The node “group” also share a SSID; the “group” SSID. This is a SSID that receives all multicast traffic for the nodes on the immediate network. This might be just a generic MCAST-n SSID, where n is the network ID; or it could be a call-sign for a local network coordinator, e.g. I might decide my network will use VK4MSL-0 for my group SSID (network 0). Probably nodes that are listening on a custom SSID should still listen for MCAST-n traffic, in case a node is attempting to join without knowing the group SSID.

AX.25 allows for 16 SSIDs per call-sign, so what about the other 8? Well, if we have a convention that we reserve SSIDs 0-7 for groups; that leaves 8-15 for stations. This can be adjusted for local requirements where needed, and would not be enforced by the protocol.

Joining a network

How does a new joining node “discover” this network? Firstly, the first node in an area is responsible for “forming” the network — a node which “forms” a network must be manually programmed with the local subnet, group SSID and other details. Ensuring all nodes with “formation” capability for a given network is beyond the scope of 6LoWHAM.

When a node joins; at first it only knows how to talk to immediate nodes. It can use MCAST-n to talk to immediate neighbours using the fe80::/64 subnet. Anyone in earshot can potentially reply. Nodes simply need to be listening for traffic on a reserved UDP port (maybe 61631; there’s an optimisation in 6LoWPAN for 61616-61631). The joining node can ask for the network context, maybe authenticate itself if needed (using asymmetric cryptography – digital signatures, no encryption).

The other nodes presumably already know the answer, but for all nodes to reply simultaneously, would lead to a pile-up. Nodes should wait a randomised delay, and if nothing is heard in that period, they then transmit what they know of the context for the given network ID.

The context information sent back should include:

  • Group SSID
  • Subnet prefix
  • (Optional) Authentication data:
    • Public key of the forming network (joining node will need to maintain its own “trust” database)
    • Hash of all earlier data items
    • Digital signature signed with included public key

Once a node knows the context for its chosen network, it is officially “joined”.

Routing to non-local endpoints

So, a node may wish to send a message to another node that’s not directly reachable. This is, after-all, the whole point of using a routing protocol atop AX.25. If we knew a route, we could encode it in the digipeater path, and use conventional AX.25 source routing. Nodes that know a reliable route are encouraged to do exactly that. But what if you don’t know your way around?

APRS uses WIDEN-n to solve this problem: it’s a dumb broadcast, but it achieves this aim beautifully. n just stands for the number of hops, and it gets decremented with each hop. Each digipeater inserts itself into the path as it sends the frame on. APRS specs normally call for everyone to broadcast all at once, pile-up be damned. FM capture effect might help here, but I’m not sure its a good policy. Simple, but in our case, we can do a little better.

We only need to broadcast far enough to reach a node that knows a route. We’ll use ROUTE-n to stand for a digipeater that is no more than n hops away from the station listed in the AX.25 destination field. n must be greater than 0 for a message to be relayed. AX.25 2.0 limits the number of digipeaters to 8 (and 2.2 to 2!), so naturally n cannot be greater than 8.

So we’ll have a two-tier approach.

Routing from a node that knows a viable route

If a node that receives a ROUTE-n destination message, knows it has a good route that is n or less hops away from the target; it picks a randomised delay (maybe 0-5 seconds range), and if no reply is heard from another node; it relays the message: the ROUTE-n is replaced by its own SSID, followed by the required digipeater path to reach the target node.

Routing from a node that does not know a viable route

In the case where a node receives this same ROUTE-n destination message, does not know a route, and hasn’t heard anyone else relay that same message; it should pick a randomised delay (5-10 second range), and if it hasn’t heard the message relayed via a specific path in that time, should do one of the following:

If n is greater than 1:

Substitute ROUTE-n in the digipeater path with its own SSID followed by ROUTE-(n-1) then transmit the message.

If n is 1 (or 0):

Substitute ROUTE-n with its own SSID (do not append ROUTE-0) then transmit the message.

Routing multicast traffic

Discovering multicast listeners

I’ll have to research MLD (RFC-3810 / RFC-4604), but that seems the sensible way forward from here.

Relaying multicast traffic

If a node knows of downstream nodes that ordinarily rely on it to contact the sender of a multicast message, and it knows the downstream nodes are subscribers to the destination multicast group, it should wait a randomised period, and forward the message on (appending its SSID in the digipeater path) to the downstream nodes.

Application thoughts

I think I have done some thoughts on what the applications for this system may be, but the other day I was looking around for “prior art” regarding one-to-many file transfer applications.

One such system that could be employed is UFTP. Yes, it mentions encryption, but that is an optional feature (and could be useful in emcomm situations). That would enable SSTV-style file sharing to all participants within the mesh network. Its ability to be proxied also lends itself to bridging to other networks like AMPRnet, D-Star packet, DMR and other systems.

Afternoon project: DIY Passive 24V PoE injector

Recently, I discovered that in spite of my attempts to ensure my Internet connection would remain reliable throughout adverse conditions, I discovered a simple power outage basically left the ohh-so-wonderful HFC NBN NTD blinking and boot-looping helplessly.

In the last major storm event, the PSTN land-line was the only way we got a phone service. Sadly I was not geared up to test whether ADSL worked at that time, but the PSTN did, which was good because Telstra’s mobile network didn’t!

Armed with this knowledge, I decided to protect myself. My choices for an Internet link here are 4G and NBN. That does not give me much hope in a major calamity, but you know, do the best you can. At least in simple black-outs, 4G should stay up. 4G exclusively is too expensive, especially for a connection comparable to the NBN link I have, so the next best thing is to set up a back-up link using 4G. Since local towers may be down-and-out, best hope I have is to put the 4G antennas up as high as I possibly can. I looked at possible options, and one locally-produced option I stumbled on is the Telco Electronics T1. This is an outdoor rated 4G router, powered using PoE. The PoE scheme i simple: 24V DC nominal voltage, with the blue pair (pins 4 & 5) carrying the positive leg, and the brown pair (7 & 8) carrying the negative.

Talking with the vendor, I discovered that while these things can run down to 12V, they don’t recommend it. I guess I²R losses are a big factor there: CAT5e isn’t known for its power carrying capability. My thinking since my system is all 12V, is to simply run a 12V cable using 15A-rated DC cable alongside the Ethernet cable up to my bedroom, then from there I can split off a few 12V feeds: one for my 8-port switch, one for my access point, and one going to the 4G router.

Since the router expects 24V, I’ll use a boost converter so that the “PoE” run is as short as practical. I found an inexpensive 24V boost converter which could tolerate input voltages as low as 3.3V and input currents up to 5A. Mount this into a little wall-plate box with a couple of RJ-45 jacks and a barrel jack for the DC input, and we’d have a quick and easy boost converter.

I won’t put the wiring diagram up because honestly, it’s pretty straightforward! I haven’t tried running an Ethernet signal through this, but I’m confident it’ll work just fine. It does however power the T1 beautifully… the T1 drawing about 150mA when running at 14.4V (which is what my bench supply was set to). Some things I should possibly add would be fuses on the input and output: 1A on the input, 500mA on the output. For now I’ll just wing it. I’ll probably put the fuse at the socket in my room. There’s plenty of room to add this to the enclosure as it is now.

The business end of the PoE injector
Wiring job inside the enclosure.

Pondering audio streaming over LANs

Lately, I’ve been socially distancing a home and so there’s been a few projects that have been considered that otherwise wouldn’t ordinarily get a look in on a count of lack-of-time.

One of these has been setting up a Raspberry Pi with DRAWS board for use on the bicycle as a radio interface. The DRAWS interface is basically a sound card, RTC, GPS and UART interface for radio interfacing applications. It is built around the TI TMS320AIC3204.

Right now, I’m still waiting for the case to put it in, even though the PCB itself arrived months ago. Consequently it has not seen action on the bike yet. It has gotten some use though at home, primarily as an OpenThread border router for 3 WideSky hubs.

My original idea was to interface it to Mumble, a VoIP server for in-game chat. The idea being that, on events like the Yarraman to Wulkuraka bike ride, I’d fire up the phone, connect it to an AP run by the Raspberry Pi on the bike, and plug my headset into the phone:144/430MHz→2.4GHz cross-band.

That’s still on the cards, but another use case came up: digital. It’d be real nice to interface this over WiFi to a stronger machine for digital modes. Sound card over network sharing. For this, Mumble would not do, I need a lossless audio transport.

Audio streaming options

For audio streaming, I know of 3 options:

  • PulseAudio network streaming
  • netjack
  • trx

PulseAudio I’ve found can be hit-and-miss on the Raspberry Pi, and IMO, is asking for trouble with digital modes. PulseAudio works fine for audio (speech, music, etc). It will make assumptions though about the nature of that audio. The problem is we’re not dealing with “audio” as such, we’re dealing with modem tones. Human ears cannot detect phase easily, data modems can and regularly do. So PA is likely to do things like re-sample the audio to synchronise the two stations, possibly use lossy codecs like OPUS or CELT, and make other changes which will mess with the signal in unpredictable ways.

netjack is another possibility, but like PulseAudio, is geared towards low-latency audio streaming. From what I’ve read, later versions use OPUS, which is a no-no for digital modes. Within a workstation, JACK sounds like a close fit, because although it is geared to audio, its use in professional audio means it’s less likely to make decisions that would incur loss, but it is a finicky beast to get working at times, so it’s a question mark there.

trx was a third option. It uses RTP to stream audio over a network, and just aims to do just that one thing. Digging into the code, present versions use OPUS, older versions use CELT. The use of RTP seemed promising though, it actually uses oRTP from the Linphone project, and last weekend I had a fiddle to see if I could swap out OPUS for linear PCM. oRTP is not that well documented, and I came away frustrated, wondering why the receiver was ignoring the messages being sent by the sender.

It’s worth noting that trx probably isn’t a good example of a streaming application using oRTP. It advertises the stream as G711u, but then sends OPUS data. What it should be doing is sending it as a dynamic content type (e.g. 96), and if this were a SIP session, there’d be a RTPMAP sent via Session Description Protocol to say content type 96 was OPUS.

I looked around for other RTP libraries to see if there was something “simpler” or better documented. I drew a blank. I then had a look at the RTP/RTCP specs themselves published by the IETF. I came to the conclusion that RTP was trying to solve a much more complicated use case than mine. My audio stream won’t traverse anything more sophisticated than a WiFi AP or an Ethernet switch. There’s potential for packet loss due to interference or weak signal propagation between WiFi nodes, but latency is likely to remain pretty consistent and out-of-order handling should be almost a non-issue.

Another gripe I had with RTP is its almost non-consideration of linear PCM. PCMA and PCMU exist, 16-bit linear PCM at 44.1kHz sampling exists (woohoo, CD quality), but how about 48kHz? Nope. You have to use SDP for that.

Custom protocol ideas

With this in mind, my own custom protocol looks like the simplest path forward. Some simple systems that used by GQRX just encapsulate raw audio in UDP messages, fire them at some destination and hope for the best. Some people use TCP, with reasonable results.

My concern with TCP is that if packets get dropped, it’ll try re-sending them, increasing latency and never quite catching up. Using UDP side-steps this, if a packet is lost, it is forgotten about, so things will break up, then recover. Probably a better strategy for what I’m after.

I also want some flexibility in audio streams, it’d be nice to be able to switch sample rates, bit depths, channels, etc. RTP gets close with its L16/44100/2 format (the Philips Red-book standard audio format). In some cases, 16kHz would be fine, or even 8kHz 16-bit linear PCM. 44.1k works, but is wasteful. So a header is needed on packets to at least describe what format is being sent. Since we’re adding a header, we might as well set aside a few bytes for a timestamp like RTP so we can maintain synchronisation.

So with that, we wind up with these fields:

  • Timestamp
  • Sample rate
  • Number of channels
  • Sample format

Timestamp

The timestamp field in RTP is basically measured in ticks of some clock of known frequency, e.g. for PCMU it is a 8kHz clock. It starts with some value, then increments up monotonically. Simple enough concept. If we make this frequency the sample rate of the audio stream, I think that will be good enough.

At 48kHz 16-bit stereo; data will be streaming at 192kbps. We can tolerate wrap-around, and at this data rate, we’d see a 16-bit counter overflow every ~341ms, which whilst not unworkable, is getting tight. Better to use a 32-bit counter for this, which would extend that overflow to over 6 hours.

Sample rate encoding

We can either support an integer field, or we can encode the rate somehow. An integer field would need a range up to 768k to support every rate ALSA supports. That’s another 32-bit integer. Or, we can be a bit clever: nearly every sample rate in common use is a harmonic of 8kHz or 11.025kHz, so we devise a scheme consisting of a “base” rate and multiplier. 48kHz? That’s 8kHz×6. 44.1kHz? That’s 11.025kHz×4.

If we restrict ourselves to those two base rates, we can support standard rates from 8kHz through to 1.4MHz by allocating a single bit to select 8kHz/11.025kHz and 7 bits for the multiplier: the selected sample rate is the base rate multiplied by the multipler incremented by one. We’re unlikely to use every single 8kHz step though. Wikipedia lists some common rates and as we go up, the steps get bigger, so let’s borrow 3 multiplier bits for a left-shift amount.

7 6 5 4 3 2 1 0
B S S S M M M M

B = Base rate: (0) 8000 Hz, (1) 11025 Hz
S = Shift amount
M = Multiplier - 1

Rate = (Base << S) * (M + 1)

Examples:
  00000000b (0x00): 8kHz
  00010000b (0x10): 16kHz
  10100000b (0xa0): 44.1kHz
  00100000b (0x20): 48kHz
  01010010b (0x52): 768kHz (ALSA limit)
  11111111b (0xff): 22.5792MHz (yes, insane)

Other settings

I primarily want to consider linear PCM types. Technically that includes unsigned PCM, but since that’s losslessly transcodable to signed PCM, we could ignore it. So we could just encode the number of bytes needed for a single channel sample, minus one. Thus 0 would be 8-bits; 1 would be 16-bits; 2 would be 32-bits and 3 would be 64-bits. That needs just two bits. For future-proofing, I’d probably earmark two extra bits; reserved for now, but might be used to indicate “compressed” (and possibly lossy) formats.

The remaining 4 bits could specify a number of channels, again minus 1 (mono would be 0, stereo 1, etc up to 16).

Packet type

For the sake of alignment, I might include a 16-bit identifier field so the packet can be recognised as being this custom audio format, and to allow multiplexing of in-band control messages, but I think the concept is there.

Implementing BlueTrace: the guts of TraceTogether

COVID-SARS-2 is a nasty condition caused by COVID-19 that has seen many a person’s life cut short. The COVID-19 virus which originated from Wuhan, China has one particularly insidious trait: it can be spread by asymptomatic people. That is, you do not have to be suffering symptoms to be an infectious carrier of the condition.

As frustrating as isolation has been, it’s really our only viable solution to preventing this infectious condition from spreading like wildfire until we get a vaccine that will finally knock it on the head.

One solution that has been proposed has been to use contract tracing applications which rely on Bluetooth messaging to detect when an infected person comes into contact with others. Singapore developed the TraceTogether application. The Australian Government look like they might be adopting this application, our deputy CMO even suggesting it’d be made compulsory (before the PM poured water on that plan).

Now, the Android version of this, requires Android 5.1. My phone runs 4.1: I cannot run this application. Not everybody is in the habit of using Bluetooth, or even carries a phone. This got me thinking: can this be implemented in a stand-alone device?

The guts of this application is a protocol called BlueTrace which is described in this whitepaper. Reference implementations exist for Android and iOS.

I’ll have to look at the nitty-gritty of it, but essentially it looks like a stand-alone implementation on a ESP32 module maybe a doable proposition. The protocol basically works like this:

  • Clients register using some contact details (e.g. a telephone number) to a server, which then issues back a “user ID” (randomised).
  • The server then uses this to generate “temporary IDs” which are constructed by concatenating the “User ID” and token life-time start/finish timestamps together, encrypting that with the secret key, then appending the IV and an authentication token. This BLOB is then Base64-encoded.
  • The client pulls down batches of these temporary IDs (forward-dated) to use for when it has no Internet connection available.
  • Clients, then exchange these temporary IDs using BLE messaging.

This, looks doable in an ESP32 module. The ESP32 could be loaded up with tokens by a workstation. You then go about your daily business, carrying this device with you. When you get home, you plug the device into your workstation, and it uploads the “temporary IDs” it saw.

I’ll have to dig out my ESP32 module, but this looks like a doable proposition.

6LoWHAM: Working towards connected-mode operation in aioax25

The past few months have been quiet for this project, largely because Brisbane WICEN has had my spare time soaked up with an RFID system they are developing for tracking horse rides through the Imbil State Forest for the Stirling’s Crossing Endurance Club.

Ultimately, when we have some decent successes, I’ll probably be reporting more on this on WICEN’s website. Suffice to say, it’s very much a work-in-progress, but it has proved a valuable testing ground for aioax25. The messaging system being used is basically just plain APRS messaging, with digipeating thrown in as well.

Since I’ve had a moment to breathe, I’ve started filling out the features in aioax25, starting with connected-mode operation. The thinking is this might be useful for sending larger payloads. APRS messages are limited to a 63 character message size with only a subset of ASCII being permitted.

Thankfully that subset includes all of the Base64 character set, so I’m able to do things like tunnel NTP packets and CBOR blobs through it, so that stations out in the field can pull down configuration settings and the current time.

As for the RFID EPCs, we’re sending those in the canonical hexadecimal format, which works, but the EPC occupies most of the payload size. At 1200 bits per second, this does slow things down quite a bit. We get a slight improvement if we encode the EPCs as Base64. We’d get a 200% efficiency increase if we could send it as binary bytes instead. Sending a CBOR blob that way would be very efficient.

The thinking is that the nodes find each-other via APRS, then once they’ve discovered a path, they can switch to connected mode to send bulk transfers back to base.

Thus, I’ve been digging into connected mode operation. AX.25 2.2 is not the most well-written spec I’ve read. In fact, it is down-right confusing in places. It mixes up little-endian and big-endian fields, certain bits have different meanings in different contexts, and it uses concepts which are “foreign” to someone like myself who’s used to TCP/IP.

Right now I’m making progress, there’s an untested implementation in the connected-mode branch. I’m writing unit test cases based on what I understand the behaviour to be, but somehow I think this is going to need trials with some actual AX.25 implementations such as Direwolf, the Linux kernel stack, G8BPQ stack and the implementation on my Kantronics KPC3 and my Kenwood TH-D72A.

Some things I’m trying to get an answer to:

  • In the address fields at the start of a frame, you have what I’ve been calling the ch bit.
    On digipeater addresses, it’s called H and it is used to indicate that a frame has been digipeated by that digipeater.
    When seen in the source or destination addresses, it is called C, and it describes whether the frame is a “command” frame, or a “response” frame.

    An AX.25 2.x “command” frame sets the destination address’s C bit to 1, and the source address’s C bit to 0, whilst a “response” frame in AX.25 does the opposite (destination C is 0, source C is 1).

    In prior AX.25 versions, they were set identically. Question is, which is which? Is a frame a “command” when both bits are set to 1s and a “response” if both C bits are 0s? (Thankfully, I think my chances of meeting an AX.25 1.x station are very small!)
  • In the Control field, there’s a bit marked P/F (for Poll/Final), and I’ve called it pf in my code. Sometimes this field gets called “Poll”, sometimes it gets called “Final”. It’s not clear on what occasions it gets called “Poll” and when it is called “Final”. It isn’t as simple as assuming that pf=1 means poll and pf=0 means final. Which is which? Who knows?
  • AX.25 2.0 allowed up to 8 digipeaters, but AX.25 2.2 limits it to 2. AX.25 2.2 is supposed to be backward compatible, so what happens when it receives a frame from a digipeater that is more than 2 digipeater hops away? (I’m basically pretending the limitation doesn’t exist right now, so aioax25 will handle 8 digipeaters in AX.25 2.2 mode)
  • The table of PID values (figure 3.2 in the AX.25 2.2 spec) mentions several protocols, including “Link Quality Protocol”. What is that, and where is the spec for it?
  • Is there an “experimental” PID that can be used that says “this is a L3 protocol that is a work in progress” so I don’t blow up someone’s station with traffic they can’t understand? The spec says contact the ARRL, which I have done, we’ll see where that gets me.
  • What do APRS stations do with a PID they don’t recognise? (Hopefully ignore it!)

Right at this point, the Direwolf sources have proven quite handy. Already I am now aware of a potential gotcha with the AX.25 2.0 implementation on the Kantronics KPC3+ and the Kenwood TM-D710.

I suspect my hand-held (Kenwood TH-D72A) might do the same thing as the TM-D710, but given JVC-Kenwood have pulled out of the Australian market, I’m more like to just say F### you Kenwood and ignore the problem since these can do KISS mode, bypassing the buggy AX.25 implementation on a potentially resource-constrained device.

NET/ROM is going to be a whole different ball-game, and yes, that’s on the road map. Long-term, I’d like 6LoWHAM stations to be able to co-exist peacefully with other stations. Much like you can connect to a NET/ROM node using traditional AX.25, then issue connect commands to jump from there to any AX.25 or NET/ROM station; I intend to offer the same “feature” on a 6LoWHAM station — you’ll be able to spin up a service that accepts AX.25 and NET/ROM connections, and allows you to hit any AX.25, NET/ROM or 6LoWHAM station.

I might park the project for a bit, and get back onto the WICEN stuff, as what we have in aioax25 is doing okay, and there’s lots of work to be done on the base software that’ll keep me busy right up to when the horse rides re-start in 2020.

Solar Cluster: Second compute node operational again

So, a few months back I had the failure of one of my storage nodes. Since I need 3 storage nodes to operate, but can get away with a single compute node, I did a board-shuffle. I just evacuated lithium of all its virtual machines, slapped the SSD, HDD and cover from hydrogen in/on it, and it became the new storage node.

Actually I took the opportunity to upgrade to 2TB HDDs at the same time, as well as adding two new storage nodes (Intel NUCs). I then ordered a new motherboard to get lithium back up again. Again, there was an opportunity to upgrade, so ~$1500 later I ordered a SuperMicro A2SDi-16C-HLN4F. 16 cores, and full-size DDR4 DIMMs, so much easier to get bits for. It also takes M.2 SATA.

The new board arrived a few weeks ago, but I was heavily snowed under with activities surrounding Brisbane Area WICEN Group and their efforts to assist the Stirling’s Crossing Endurance Club running the Tom Quilty 2019. So it got shoved to the side with the RAM I had purchased to be dealt with another day.

I found time on Monday to assemble the hardware, then had fun and games with the UEFI firmware on this board. Put simply, the legacy BIOS support on this board is totally and utterly broken. The UEFI shell is also riddled with bugs (e.g. ifconfig help describes how to bring up an interface via DHCP or statically, but doing so fails). And of course, PXE is not PXE when UEFI is involved.

I ended up using Ubuntu’s GRUB binary and netboot image to boot-strap the machine, after which I could copy my Gentoo install back in. I now have the machine back in the rack, and whilst I haven’t deployed any VMs to it yet, I will do so soon. I did however, give it a burn-in test updating the kernel:

  LD [M]  security/keys/encrypted-keys/encrypted-keys.ko
  MKPIGGY arch/x86/boot/compressed/piggy.S
  AS      arch/x86/boot/compressed/piggy.o
  LD      arch/x86/boot/compressed/vmlinux
ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text'
ld: warning: creating a DT_TEXTREL in object.
  ZOFFSET arch/x86/boot/zoffset.h
  OBJCOPY arch/x86/boot/vmlinux.bin
  AS      arch/x86/boot/header.o
  LD      arch/x86/boot/setup.elf
  OBJCOPY arch/x86/boot/setup.bin
  BUILD   arch/x86/boot/bzImage
Setup is 16444 bytes (padded to 16896 bytes).
System is 6273 kB
CRC ca5d7cb3
Kernel: arch/x86/boot/bzImage is ready  (#1)

real    7m7.727s
user    62m6.396s
sys     5m8.970s
lithium /usr/src/linux-stable # git describe
v5.1.11

7m for make -j 17 to build a current Linux kernel is not bad at all!

Solar Cluster: Re-wiring the rack

It’s been on my TO-DO list now for a long time to wire in some current shunts to monitor the solar input, replace the near useless Powertech solar controller with something better, and put in some more outlets.

Saturday, I finally got around to doing exactly that. I meant to also add a low-voltage disconnect to the rig … I’ve got the parts for this but haven’t yet built or tested it — I’d like to wait until I have done both,but I needed the power capacity. So I’m running a risk without the over-discharge protection, but I think I’ll take that gamble for now.

Right now:

  • The Powertech MP-3735 is permanently out, the Redarc BCDC-1225 is back in.
  • I have nearly a dozen spare 12V outlet points now.
  • There are current shunts on:
    • Raw solar input (50A)
    • Solar controller output (50A)
    • Battery (100A)
    • Load (100A)
  • The Meanwell HEP-600C-12 is mounted to the back of the server rack, freeing space from the top.
  • The janky spade lugs and undersized cable connecting the HEP-600C-12 to the battery has been replaced with a more substantial cable.

This is what it looks like now around the back:

Rear of the rack, after re-wiring

What difference has this made? I’ll let the graphs speak. This was the battery voltage this time last week:

Battery voltage for 2019-05-22

… and this was today…

Battery voltage 2019-05-29

Chalk-and-bloody-cheese! The weather has been quite consistent, and the solar output has greatly improved just replacing the controller. The panels actually got a bit overenthusiastic and overshot the 14.6V maximum… but not by much thankfully. I think once I get some more nodes on, it’ll come down a bit.

I’ve gone from about 8 hours off-grid to nearly 12! Expanding the battery capacity is an option, and could see the cluster possibly run overnight.

I need to get the two new nodes onto battery power (the two new NUCs) and the Netgear switch. Actually I’m waiting on a rack-mount kit for the Netgear as I have misplaced the one it came with, failing that I’ll hack one up out of aluminium angle — it doesn’t look hard!

A new motherboard is coming for the downed node, that will bring me back up to two compute nodes (one with 16 cores), and I have new 2TB HDDs to replace the aging 1TB drives. Once that’s done I’ll have:

  • 24 CPU cores and 64GB RAM in compute nodes
  • 28 CPU cores and 112GB RAM in storage nodes
  • 10TB of raw disk storage

I’ll have to pull my finger out on the power monitoring, there’s all the shunts in place now so I have no excuse but to make up those INA-219 boards and get everything going.

Solar Cluster: Storage replacements and upgrades

So recently I was musing about how I might go about expanding the storage on the cluster. This was largely driven by the fact that I was about 80% full, and thus needed to increase capacity somehow.

I also was noting that the 5400RPM HDDs (HGST HTS541010A9E680), now with a bit of load, were starting to show signs of not keeping up. The cases I have can take two 2.5″ SATA HDDs, one spot is occupied by a boot drive (120GB SSD) and the other a HDD.

A few weeks ago, I had a node fail. That really did send the cluster into a spin, since due to space constraints, things weren’t as “redundant” as I would have liked, and with one disk down, I/O throughput which was already rivalling Microsoft Azure levels of slow, really took a bad downward turn.

I hastily bought two NUCs, which I’m working towards deploying… with those I also bought two 120GB M.2 SSDs (for boot drives) and two 2TB HDDs (WD Blues).

It was at that point I noticed that some of the working drives were giving off the odd read error which was throwing Ceph off, causing “inconsistent” placement groups. At that point, I decided I’d actually deploy one of the new drives (the old drive was connected to another node so I had nothing to lose), and I’ll probably deploy the other shortly. The WD Blue 2TB drives are also 5400RPM, but unlike the 1TB Hitachis I was using before, have 128MB of cache vs just 8MB.

That should boost the read performance just a little bit. We’ll see how they go. I figured this isn’t mutually exclusive to the plans of external storage upgrades, I can still buy and mod external enclosures like I planned, but perhaps with a bit more breathing room, the immediate need has passed.

I’ve since ordered another 3 of these drives, two will replace the existing 1TB drives, and a third will go back in the NUC I stole a 2TB drive from.

Thinking about the problem more, one big issue is that I don’t have room inside the case for 3 2.5″ HDDs, and the motherboards I have do not feature mSATA or M.2 SATA. I might cram a PCIe SSD in, but those are pricey.

The 120GB SSD is only there as a boot drive. If I could move that off to some other medium, I could possibly move to a bigger SSD in place of the 120GB SSD, maybe a ~500GB unit. These are reasonably priced. The issue is then where to put the OS.

An unattractive option is to shove a USB stick in and boot off that. There’s no internal USB ports, but there are two front USB ports in the case I could rig up to an internal header so they’re not sticking out like a sore thumb(-drive) begging to be broken off by a side-wards slap. The flash memory in these is usually the cheapest variety, so maybe if I went this route, I’d buy two: one for the root FS, the other for swap/logs.

The other option is a Disk-on-Module. The motherboards provide the necessary DC power connector for running these things, and there’s a chance I could cram one in there. They’re pricey, but not as bad as going NVMe SSDs, and there’s a greater chance of success squeezing this in.

Right now I’ve just bought a replacement motherboard and some RAM for it… this time the 16-core model, and it takes full-size DIMMs. It’ll go back in as a compute node with 32GB RAM (I can take it all the way to 256GB if I want to). Coupled with that and a purchase of some HDDs, I think I’ll let the bank account cool off before I go splurging more. 🙂