Computing

HTML Email ought to be considered harmful: auDA shows us why

I’m the owner of two domain licenses, longlandclan.id.au and vk4msl.id.au, both purchased for personal use. The former I share with other family members where as the latter I use for my own use. Consequently, I’m on auDA’s mailing lists and receive the occasional email from them. No big deal. Lately, they’ve been pushing .au domains (i.e. dropping the .id bit out), which I’m not worried about myself, but I can see the appeal for businesses.

Anyway… I practice what I preach with regards to email: I do not send email in HTML format — and my email client is set to receive emails in plain text, not HTML, unless there is no plain-text component. This morning, I received what I consider, a textbook example of why I think HTML email is so bad for the Internet today.

From: .au Domain Administration <noreply@auda.com.au>
Subject: Notice: .au Direct Registration
Date: Wed, 10 Aug 2022 23:00:04 +0000
Reply-To: .au Domain Administration <noreply@auda.com.au>
X-Mailer: Mailchimp Mailer - **CID292f65320f63be5c3fcd**

The .au Domain Administration (auDA) recently launched Australia’s newest domain namespace – .au direct.

Dear Stuart Longland,

The .au Domain Administration (https://www.auda.org.au/)  (auDA), recently launched Australia’s newest domain namespace – .au direct. The new namespace provides eligible registrants the option to register domain names directly before the .au for the first time (e.g. forexample.au).

Registrants with an existing .au domain name licence are eligible to apply for a direct match of their .au direct domain name through the Priority Allocation Process (e.g. if you hold forexample.com.au (https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fforexample.com.au%2F&data=05%7C01%7Cprivate.address%40auda.org.au%7C95a9271d4eff4973013b08da3240a115%7C81810bc45d6845f6ba4e3d6c9fb37e43%7C0%7C0%7C637877550424818538%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2WhPYMxV3FI9nEpXDEk8KdyJwWGyqcI%2FwRd%2FNc7DQks%3D&reserved=0) , you can apply for Priority Status to register forexample.au). Information about your existing domain name licence is available here:  https://whois.auda.org.au/. The Priority Allocation Process is now open and will close on 20 Sept 2022.

That is the email, as it appeared in my email client (I have censored the unfortunate auDA employee’s email address). I can see what happened:

Someone composed an email (likely in HTML format) that would be part of the marketing campaign they were going to send via MailChimp. The person composing the email for MailChimp clearly is using Microsoft Outlook (or maybe that should be called Microsoft LookOut!). Microsoft’s software saw what it thought was a hyperlink and thought, “I need to ‘protect’ this”, and made it a “safe” link. A link with the user’s email address embedded in it!

Funnily enough, this seems to be the only place where a link was mangled by Microsoft’s mal^H^H^Hsoftware. I think this underscores the importance of verifying that you are indeed sending what you think you are sending — and highlights how difficult HTML (and Microsoft) have made this task.

  1. don’t assume that people will only see the HTML email
  2. don’t assume that what you see in the HTML view is identical to what will be seen in plain text

Might be better to compose the plain text, get that right… then paste that into the HTML view and “make it pretty”… or perhaps don’t bother and just go back to plain-text? KISS principle!

Resurrecting an SGI O2

Years ago, I was getting into Linux on esoteric architectures, which started with a Gateway Microserver which runs the MIPS architecture… to better understand this platform I also obtained a few SiliconGraphics workstations, including this O2.

Originally a R5000SC at 180MHz and 128MB RAM, I managed to get hold of an RM5200 300MHz CPU module for it, and with the help of people on the #mipslinux channel on the old irc.freenode.net, managed to obtain a PROM update to get the RM5200 working. Aside from new HDDs (the original died just recently), it’s largely the stock hardware.

I figured it deserved to go to a new home, and a fellow on Gumtree (in WA) happened to be looking for some of these iconic machines, so I figured I might as well clean this machine up as best I can and get it over there while it’s still somewhat functioning. That said, age has not been friendly to this beast.

  • the CD-ROM drive tray gear has popped off the motor spindle, rendering the CD-ROM drive non-functional
  • in trying to fix the CD-ROM issue (I tried disassembling the drive but couldn’t get at the parts needed), the tab that holds the lid of the machine on broke
  • the PSU fan, although free to spin, does not appear to be operational
  • the machine seems to want to shut off after a few minutes of run-time

The latter two are related I think: likely things get too hot and a protection circuit kicks in to turn the machine off. There’s no dust in the machine to cause a lack of air flow, I thus suspect the fan is the issue. This will be my biggest challenge I suspect. It looks to be a fairly standard case fan, so opening up the power supply (not for the feint of heart!) to replace it with a modern one isn’t out of the question.

The CD-ROM drive is a different matter. SGI machines use 512-byte sectors on their CDs, and this requires CD-ROM firmware that supports this. I have a couple of Plextor SCSI drives that do offer this (there is a jumper marked “BLOCK”), but they won’t physically fit in the O2 (they are caddy-loading drives). Somewhere around the house I have a 68-pin SCSI cable, I should be able to link that to the back of the O2 via its external SCSI port, then cobble together a power supply to run the drive externally… but that’ll be a project for another day.

A working monitor was a possible challenge, but a happy accident: it seems some LCD montiors can do sync-on-green, and thus are compatible with the O2. I’m using a little 7″ USB-powered WaveShare LCD which I normally use for provisioning Raspberry Pi PCs. I just power the monitor via a USB power supply and use the separately-provided VGA adaptor cable to plug it into the O2. So I don’t have to ship a bulky 20″ CRT across the country.

The big issue is getting an OS onto the thing. I may have to address the sudden-shutdown issue first before I can get a reasonable chance at an OS install. The big problem being an OS that these things can run. My options today seem to be:

  • Debian Jessie and earlier (Stretch dropped support for mips4 systems, favouring newer mips64r2/mips32r2 systems)
  • Gentoo Linux (which it currently does run)
  • OpenBSD 6.9 and earlier (7.0 discontinues the sgi port)
  • NetBSD 9.2
  • IRIX 6.5

The fellow ideally wants IRIX 6.5 on the thing, which is understandable, that is the native OS for these machines. I never had a full copy of IRIX install media, and have only used IRIX once (it came installed on the Indy). I’ve only ever installed Gentoo on the O2.

Adding to the challenge, I’ll have to network boot the thing because of the duff CD-ROM drive. I had thought I’d just throw NetBSD on the thing since that is “current” and would at least prove the hardware works with a minimum of fuss… but then I stumbled on some other bits and pieces:

  • irixboot is a Vagrant-based virtual machine with tools needed to network-boot a SGI workstation. The instructions used for IP22 hardware (Indy/Indigo² should work here because IP32 hardware like the O2 also have 32-bit PROMs)
  • The Internet Archive provides CD images for IRIX 6.5, including the foundation discs which I’ve never posessed

Thus, there seems to be all the bits needed to get IRIX onto this thing, if I can get the machine to stay running long enough.

Network juju on the fly: migrating a corporate network to VPN-based connectivity

So, this week mother nature threw South East Queensland a curve-ball like none of us have seen in over a decade: a massive flood. My workplace, VRT Systems / WideSky.Cloud Pty Ltd resides at 38b Douglas Street, Milton, which is a low-lying area not far from the Brisbane River. Sister company CETA is just two doors down at no. 40. Mid-February, a rain depression developed in the Sunshine Coast Hinterland / Wide Bay area north of Brisbane.

That weather system crawled all the way south, hitting Brisbane with constant heavy rain for 5 days straight… eventually creeping down to the Gold Coast and over the border to the Northern Rivers part of NSW.

The result on our offices was devastating. (Copyright notice: these images are placed here for non-commercial use with permission of the original photographers… contact me if you wish to use these photos and I can forward your request on.)

Some of the stock still worked after the flood — the Siemens UH-40s pictured were still working (bar a small handful) and normally sell for high triple-figures. The WideSky Hubs and CET meters all feature a conformal coating on the PCBs that will make them robust to water ingress and the Wavenis meters are potted sealed units. So not all a loss — but there’s a big pile of expensive EDMI meters (Mk7s and Mk10s) though that are not economic to salvage due to approval requirements which is going to hurt!

Le Mans Motors, pictured in those photos is an automotive workshop, so would have had lots of lubricants, oils and grease in stock needed to service vehicles — much of those contaminants were now across the street, so washing that stuff off the surviving stock was the order of the day for much of Wednesday, before demolition day Friday.

As for the server equipment, knowing that this was a flood-prone area (which also by the way means insurance is non-existent), we deliberately put our server room on the first floor, well above the known flood marks of 1974 and 2011. This flood didn’t get that high, getting to about chest-height on the ground floor. Thus, aside from some desktops, laptops, a workshop (including a $7000 oscilloscope belonging to an employee), a new coffee machine (that hurt the coffee drinkers), and lots of furniture/fittings, most of the IT equipment came through unscathed. The servers “had the opportunity to run without the burden of electricity“.

We needed our stuff working, so we needed to first rescue the machines from the waterlogged building and set them up elsewhere. Elsewhere wound up being at the homes of some of our staff with beefy NBN Internet connections. Okay, not beefy compared to the 500Mbps symmetric microwave link we had, but 50Mbps uplinks were not to be snorted at in this time of need.

The initial plan was the machines that once shared an Ethernet switch, now would be in physically separate locations — but we still needed everything to look like the old network. We also didn’t want to run more VPN tunnels than necessary. Enter OpenVPN L2 mode.

Establishing the VPN server

Up to this point, I had deployed a temporary VPN server as a VPS in a Sydney data centre. This was a plain-jane Ubuntu 20.04 box with a modest amount of disk and RAM, but hopefully decent CPU grunt for the amount of cryptographic operations it was about to do.

Most of our customer sites used OpenVPN tunnels, so I migrated those first — I managed to grab a copy of the running server config as the waters rose before the power tripped out. Copying that config over to the new server, start up OpenVPN, open a UDP port to the world, then fiddled DNS to point the clients to the new box. They soon joined.

Connecting staff

Next problem was getting the staff linked — originally we used a rather aging Cisco router with its VPN client (or vpnc on Linux/BSD), but I didn’t feel like trying to experiment with an IPSec server to replicate that — so up came a second OpenVPN instance, on a new subnet. I got the Engineering team to run the following command to generate a certificate signing request (CSR):

openssl req -newkey rsa:4096 -nodes -keyout <name>.key -out <name>.req

They sent me their .req files, and I used EasyRSA v3 to manage a quickly-slapped-together CA to sign the requests. Downloading them via Slack required that I fish them out of the place where Slack decided to put them (without asking me) and place it in the correct directory. Sometimes I had to rename the file too (it doesn’t ask you what you want to call it either) so it had a .req extension. Having imported the request, I could sign it.

$ mv ~/Downloads/theclient.req pki/reqs/
$ ./easyrsa sign-req client theclient

A new file pki/issued/theclient.crt could then be sent back to the user. I also provided them with pki/ca.crt and a configuration file derived from the example configuration files. (My example came from OpenBSD’s OpenVPN package.)

They were then able to connect, and see all the customer site VPNs, so could do remote support. Great. So far so good. Now the servers.

Server connection VPN

For this, a third OpenVPN daemon was deployed on another port, but this time in L2 mode (dev tap) not L3 mode. In addition, I had servers on two different VLANs, I didn’t want to have to deploy yet more VPN servers and clients, so I decided to try tunnelling 802.1Q. This required boosting the MTU from the default of 1500 to 1518 to accommodate the 802.1Q VLAN tag.

The VPN server configuration looked like this:

port 1196
proto udp
dev tap
ca l2-ca.crt
cert l2-server.crt
key l2-server.key
dh data/dh4096.pem
server-bridge
client-to-client
keepalive 10 120
cipher AES-256-CBC
persist-key
persist-tun
status /etc/openvpn/l2-clients.txt
verb 3
explicit-exit-notify 1
tun-mtu 1518

In addition, we had to tell netplan to create some bridges, we created a vpn.conf in /etc/netplan/vpn.yaml that looked like this:

network:
    version: 2
    ethernets:
        # The VPN tunnel itself
        tap0:
            mtu: 1518
            accept-ra: false
            dhcp4: false
            dhcp6: false
    vlans:
        vlan10-phy:
            id: 10
            link: tap0
        vlan11-phy:
            id: 11
            link: tap0
        vlan12-phy:
            id: 12
            link: tap0
        vlan13-phy:
            id: 13
            link: tap0
    bridges:
        vlan10:
            interfaces:
                - vlan10-phy
            accept-ra: false
            link-local: [ ipv6 ]
            addresses:
                - 10.0.10.1/24
                - 2001:db8:10::1/64
        vlan11:
            interfaces:
                - vlan11-phy
            accept-ra: false
            link-local: [ ipv6 ]
            addresses:
                - 10.0.11.1/24
                - 2001:db8:11::1/64
        vlan12:
            interfaces:
                - vlan12-phy
            accept-ra: false
            link-local: [ ipv6 ]
            addresses:
                - 10.0.12.1/24
                - 2001:db8:12::1/64
        vlan13:
            interfaces:
                - vlan13-phy
            accept-ra: false
            link-local: [ ipv6 ]
            addresses:
                - 10.0.13.1/24
                - 2001:db8:13::1/64

Those aren’t the real VLAN IDs or IP addresses, but you get the idea. Bridge up on the cloud end isn’t strictly necessary, but it does mean we can do other forms of tunnelling if needed.

On the clients, we did something very similar. OpenVPN client config:

client
dev tap
proto udp
remote vpn.example.com 1196
resolv-retry infinite
nobind
persist-key
persist-tun
ca l2-ca.crt
cert l2-client.crt
key l2-client.key
remote-cert-tls server
cipher AES-256-CBC
verb 3
tun-mtu 1518

and for netplan:

network:
    version: 2
    ethernets:
        tap0:
            accept-ra: false
            dhcp4: false
            dhcp6: false
    vlans:
        vlan10-eth:
            id: 10
            link: eth0
        vlan11-eth:
            id: 11
            link: eth0
        vlan12-eth:
            id: 12
            link: eth0
        vlan13-eth:
            id: 13
            link: eth0
        vlan10-vpn:
            id: 10
            link: tap0
        vlan11-vpn:
            id: 11
            link: tap0
        vlan12-vpn:
            id: 12
            link: tap0
        vlan13-vpn:
            id: 13
            link: tap0
    bridges:
        vlan10:
            interfaces:
                - vlan10-vpn
                - vlan10-eth
            accept-ra: false
            link-local: [ ipv6 ]
            addresses:
                - 10.0.10.2/24
                - 2001:db8:10::2/64
        vlan11:
            interfaces:
                - vlan11-vpn
                - vlan11-eth
            accept-ra: false
            link-local: [ ipv6 ]
            addresses:
                - 10.0.11.2/24
                - 2001:db8:11::2/64
        vlan12:
            interfaces:
                - vlan12-vpn
                - vlan12-eth
            accept-ra: false
            link-local: [ ipv6 ]
            addresses:
                - 10.0.12.2/24
                - 2001:db8:12::2/64
        vlan13:
            interfaces:
                - vlan13-vpn
                - vlan13-eth
            accept-ra: false
            link-local: [ ipv6 ]
            addresses:
                - 10.0.13.2/24
                - 2001:db8:13::2/64

I also tried using a Raspberry Pi with Debian, the /etc/network/interfaces config looked like this:

auto eth0
iface eth0 inet dhcp
        mtu 1518

auto tap0
iface tap0 inet manual
        mtu 1518

auto vlan10
iface vlan10 inet static
        address 10.0.10.2
        netmask 255.255.255.0
        bridge_ports tap0.10 eth0.10
iface vlan10 inet6 static
        address 2001:db8:10::2
        netmask 64

auto vlan11
iface vlan11 inet static
        address 10.0.11.2
        netmask 255.255.255.0
        bridge_ports tap0.11 eth0.11
iface vlan11 inet6 static
        address 2001:db8:11::2
        netmask 64

auto vlan12
iface vlan12 inet static
        address 10.0.12.2
        netmask 255.255.255.0
        bridge_ports tap0.12 eth0.12
iface vlan12 inet6 static
        address 2001:db8:12::2
        netmask 64

auto vlan13
iface vlan13 inet static
        address 10.0.13.2
        netmask 255.255.255.0
        bridge_ports tap0.13 eth0.13
iface vlan13 inet6 static
        address 2001:db8:13::2
        netmask 64

Having done this, we had the ability to expand our virtual “L2” network by simply adding more clients on other home Internet connections, the bridges would allow all servers to see each-other as if they were connected to the same Ethernet switch.

Building my own wireless headset interface

So, I’ve been wanting to do this for the better part of a decade… but lately, the cost of more capable embedded devices has come right down to make this actually feasible.

It’s taken a number of incarnations, the earliest being the idea of DIYing it myself with a UHF-band analogue transceiver. Then the thought was to pair a I²S audio CODEC with a ESP8266 or ESP32.

I don’t want to rely on technology that might disappear from the market should relations with China suddenly get narky, and of course, time marches on… I learn about protocols like ROC. Bluetooth also isn’t what it was back when I first started down this path — back then A2DP was one-way and sounded terrible, HSP was limited to 8kHz mono audio.

Today, Bluetooth headsets are actually pretty good. I’ve been quite happy with the Logitech Zone Wireless for the most part — the first one I bought had a microphone that failed, but Logitech themselves were good about replacing it under warranty. It does have a limitation though: it will talk to no more than two Bluetooth devices. The USB dongle it’s supplied with, whilst a USB Audio class device, also occupies one of those two slots.

The other day I spent up on a DAB+ radio and a shortwave radio — it’d be nice to listen to these via the same Bluetooth headset I use for calls and the tablet. There are Bluetooth audio devices that I could plug into either of these, then pair with my headset, but I’d have to disconnect either the phone or the tablet to use it.

So, bugger it… the wireless headset interface will get an upgrade. The plan is a small pocket audio swiss-army-knife that can connect to…

  • an analogue device such as a wired headset or radio receiver/transceiver
  • my phone via Bluetooth
  • my tablet via Bluetooth
  • the aforementioned Bluetooth headset
  • a desktop PC or laptop over WiFi

…and route audio between them as needs require.

The device will have a small LCD display for control with a directional joystick button for control, and will be able to connect to a USB host for management.

Proposed parts list

The chip crisis is actually a big limitation, some of the bits aren’t as easily available as I’d like. But, I’ve managed to pull together the following:

The only bit that’s old stock is the LCD, it’s been sitting on my shelf gathering dust for over a decade. Somewhere in one of my junk boxes I’ve got some joystick buttons also bought many years ago.

Proposed software

For the sake of others looking to duplicate my efforts, I’ll stick with Raspberry Pi OS. As my device is an ARMv6 device, I’ll have to stick with the 32-bit release. Not that big a deal, and long-term I’ll probably look at using OpenEmbedded or Gentoo Embedded long-term to make a minimalist image that just does what I need it to do.

The starter kit came with a SD card loaded with NOOBS… I ignored this and just flashed the SD card with a bare minimum Debian Bullseye image. The plan is I’ll get PipeWire up and running on this for its Bluetooth audio interface. Then we’ll try and get the hardware bits going.

Right now, I have the zero booting up, connecting to my local WiFi network, and making itself available via SSH. A good start.

Data sheet for the LCD

The LCD will possibly be one of the more challenging bits. This is from a phone that was new last century! As it happens though, Bergthaller Iulian-Alexandru was kind enough to publish some details on a number of LCD screens. Someone’s since bought and squatted the domain, but The Wayback Machine has an archive of the site.

I’ve mirrored his notes on various Ericsson LCDs here:

The diagrams on that page appear to show the connections as viewed from the front of the LCD panel. I guess if I let magic smoke out, too bad! The alternative is I do have two Nokia 3310s floating around, so harvest the LCDs out of them — in short, I have a fallback plan!

PipeWire on the Pi Zero

This will be the interesting bit. Not sure how well it’ll work, but we’ll give it a shot. The trickiest bit is getting binaries for the device, no one builds for armhf yet. There are these binaries for Ubuntu AMD64, and luckily there are source packages available.

I guess worst case scenario is I put the Pi Zero W aside and get a Pi Zero 2 W instead. Key will be to test PipeWire first before I warm up the soldering iron, let’s at least prove the software side of things, maybe using USB audio devices in place of the AudioInjector board.

I’m going through and building the .debs for armhf myself now, taking notes as I go. I’ll post these when I’m done.

Adventures with UniFi controllers and APs

We’ve had WiFi in one form or another for some years on this network. Originally it started with an interest in the Brisbane Mesh metropolitan area network which more-or-less imploded around 2006 or so. Back then, I think I had one of the few WiFi access points in The Gap. 2.4GHz was basically microwave ovens and not much else. The same is not true today.

WiFi networks in my local area, 2.4GHz isn’t as quiet as it once was.

Since then, the network has changed a bit: from a little D-Link 802.11b AP, we moved to a Prism54g WiFi card (that I still have) with hostapd, using OpenVPN to provide security. That got replaced by a Telstra-branded Netcomm WiFi router which I figured out supported WPA-Enterprise, so I went down the rabbit hole of setting up FreeRADIUS, and we ran that until a lightning strike blew it up. The next consumer AP that replaced it was a miserable failure, so it’s been business APs since then.

Initially a Cisco WAP4410N, which was a great little AP… worked reliably for years, but about 12 months ago I noticed it was dropping packets occasionally and getting a bit intermittent. Thinking that maybe the device is past its prime, I bought a replacement: a WAP150, which proved to be a bit disappointing. Range wasn’t as good compared to the WAP4410N, and I soon found myself moving the WAP150 downstairs to service the network there and re-instating the WAP4410N.

In particular, one feature I liked about the two Cisco units is they support 802.1Q VLANs, with the ability to assign a different WiFi SSID to each. The 4410N could do 4 SSIDs, the 150 8. This is a feature that consumer APs don’t do, and it is a handy feature here as it enables me to have a “work” LAN (with VPN to my workplace) and a “home” LAN which everybody else uses.

Years ago, our Internet usage was over a 512kbps/128kbps ADSL link, and it was mostly Internet browsing… so intermittent packet loss wasn’t a big deal… one AP did just fine. Now with the move to NBN, our telephone service is a VoIP service, and I’m finding that WiFi IP phones are very picky about APs. We have three IP phones and an ATA… the ATA (Grandstream HT814) is Ethernet of course, as is one of the IP phones (Grandstream GXP1615), but the other two IP phones are WiFi (Aristel Wi-Fi Genius X1+ and Grandstream WP810).

The Aristel device in particular, was really choppy… and the first one sent out seemed to be a DoA, with poor performance even when right beside the AP. A replacement was provided under RMA, and this one performed much better, but still suffered intermittent loss. The Grandstream WP810 in general worked, but there were noticeable dead spots in a few areas around the house.

The final straw with the existing pair of APs came at the last Brisbane WICEN meeting, conducted over Zoom… both APs seem to suffer a problem where they started dropping packets and glitching badly. A power-cycle “fixes” the problem, but the issue returns after a week or two. Clearly, they were no longer up to snuff.

The replacements

APs

I procured the following replacements:

I went the long-range one for upstairs since it’s in a high spot (sitting atop a stereo speaker on a top shelf in my room) so would be able to “radiate” over a long distance to hopefully reach down the drive way and into the back-yard.

The other one is to fill in dead spots downstairs, and since it’s going to be pretty much sitting at waist level, there’s no point in it being “long range”.

The devices I bought were purchased through mWave (here and here), as they had them in stock at the time.

Power injectors

These are 48V passive PoE devices… so to make them go, you need a separate power injector. The “standard” Ubiquiti power injector was out-of-stock, but I wanted these to work on 12V anyway, so I looked around for a suitable option. Core Electronics do have some step-up converters which work great for 24V devices, but the range available doesn’t quite reach 48V. I did find though that Telco Antennas sell these 48V PoE injectors. (They also sell the APs here and here, but were out-of-stock at the time of purchase.)

Admittedly, they’re 10/100Mbps only, which means you don’t quite get the full throughput out of the WiFi6 APs, but meh, it’s good enough… if the IP phones need more than 100Mbps, they’ll run up against the 25Mbps limit of the NBN link!

Controller

These APs, unlike the Cisco devices they’re replacing (and everything else I’ve used prior), these have no built-in management interface, they talk to a network controller device… normally the UniFi Cloud Key. I had a run-in with the first generation of these at the Stirling’s Crossing Endurance Centre. For a big network, the idea of a central device does make a lot of sense (that site has 5 UAP-AC-Ms and 3 8-port PoE switches), but for a two-AP network like mine it seemed overkill.

One thing I learned, is these things positively DO NOT like being power-cycled! Repeated power-cycling corrupts the database in very short order, and you find yourself restoring configurations from a back-up soon after. So I was squeamish about buying one of these. The second generation version has its own back-up battery, but reports suggest they can be just as unreliable. In any case, they were out of stock everywhere, and I didn’t want to spring the extra cash for the “plus” model (that has a HDD… not much use to me) or the Dream Machine router.

I did consider using a Raspberry Pi 3, in fact that was my original plan… I had one spare, and so started down the path of setting it up as a UniFi controller… however, ran into two road blocks:

  • UniFi controller at this time requires Java v8… Debian Bullseye ships with v11 minimum
  • UniFi controller needs MongoDB 3.4, which isn’t available on Debian Bullsye on ARM64

I could compile MongoDB, but Java is a whole other issue, and lots of people have complained loudly about this very limitation. If there was one big gripe I’ve got, this would be it.

I did some further research: Ubuntu 20.04 does offer a Java 8 runtime, and on AMD64, I can use existing binaries for MongoDB. I looked around and purchased this small-form-factor PC. Windows 10 went bye byes once I managed to hit F1 at the right point in the BIOS set-up, and Ubuntu 20.04 was PXE-loaded. I could then follow the standard instructions to install via APT. The controller seems to be working fine using OpenJDK JRE v8. I’d recommend this over the licensing quagmire that is using Oracle JRE.

Installation

With a controller, and all the requisite bits, things went smoothly. I found at first, the controller insisted on using 192.168.1.0/24 addresses to talk to the APs… so wound up setting that up in the netplan config. I later found that the UniFi controller won’t let you set a network subnet address unless you turn off Auto Scale Network.

Setting the network subnet is not possible until “Auto Scale Network” is disabled.

So maybe from here-on-in, new APs will appear in the correct subnet, but to be honest, it’s no big deal either way, unless an AP has an untimely end, I shouldn’t need to buy new ones for a while!

Auto-negotiation quirks with Cisco switches

One oddity I noticed was the upstairs (U6 LR) AP was reluctant to communicate via Ethernet, instead funnelling its traffic via the downstairs AP. While it’s handy they can do that, means I don’t necessarily need to worry about powering the upstairs switches in a power outage, the AP should be able to use its Ethernet back-end.

The downstairs one was having no problems, and the set-up was similar: switch port → PoE injector → AP, via short cables. I tried a few different cables with no change. Logged into the switch and had a look, it was set to auto-negotiate, which was working fine downstairs. The downstairs switch is a Netgear GS748T, whereas the one upstairs is a Cisco SG200-08 (not the P version that does PoE).

I found I could log into the AP over SSH (you can provide your SSH key via the UniFi controller)… so I logged in as root and had a look around. They run Linux with (a sadly tainted due to ubnthal.ko and ubnt_common.ko) kernel 4.4, and a Busybox/musl environment with an ARM64 CPU. (Well, the U6 LRs are ARM64, the U6 Lites are MediaTek MT7621s… mipsel32r2 with kernel 5.4.0 and not tainted.) ip told me that eth0 was up, and that the AP’s IP address was assigned to br0 which was also up. brctl told me that eth0 was enslaved by br0. Curiously, /sys/class/net/eth0/carrier was reporting 1, which disagreed with what the switch was telling me.

On a hunch, I tried turning off auto-negotiation, forcing instead 100Mbps full-duplex. Bingo, a link LED appeared. The topology showed the AP was now wired, not talking via downstairs.

Network topology shown in the UniFi Controller UI

Switched back to auto-negotiation, and the AP switched to being a wireless extender with the link LED disappearing from the switch. This may be a quirk of the PoE injectors I’m using, which do not handle 100Mbps, and maybe the switch hasn’t realised this because the AP otherwise “advertises” 1Gbps link capability. For now, I’m leaving that switch port locked at 100Mbps full-duplex. If you have problems with an AP showing up via Ethernet, here’s a place that is worth checking.

Demise of classic hardware

So, this year I had a new-year’s resolution of sorts… when we first started this “work from home” journey due to China’s “gift”, I just temporarily set up on the dinner table, which was of course, meant to be another few months.

Well, nearly 2 years later, we’re still working from home, and work has expanded to the point that a move to the office, on any permanent basis, is pretty much impossible now unless the business moves to a bigger building. With this in mind, I decided I’d clear off the dinner table, and clean up my room sufficiently to set up my workstation in there.

That meant re-arranging some things, but for the most part, I had the space already. So some stuff I wasn’t using got thrown into boxes to be moved into the garage. My CD collection similarly got moved into the garage (I have it on the computer, but need to retain the physical discs as they represent my “personal use license”), and lo and behold, I could set up my workstation.

The new workspace

One of my colleagues spotted the Indy and commented about the old classic SGI logo. Some might notice there’s also an O2 lurking in the shadows. Those who have known me for a while, will remember I did help maintain a Linux distribution for these classic machines, among others, and had a reasonable collection of my own:

My Indy, O2 and Indigo2 R10000
The Octane, booting up

These machines were all eBay purchases, as is the Sun monitor pictured (it came with the Octane). Sadly, fast forward a number of years, and these machines are mostly door stops and paperweights now.

The Octane’s and Indigo2’s demises

The Octane died when I tried to clean it out with a vacuum cleaner, without realising the effect of static electricity generated by the vacuum cleaner itself. I might add mine was a particularly old unit: it had a 175MHz R10000 CPU, and I remember the Linux kernel didn’t recognise the power management circuitry in the PSU without me manually patching it.

The Indigo2 mysteriously stopped working without any clear reason why, I’ve never actually tried to diagnose the issue.

That left the Indy and the O2 as working machines. I haven’t fired them up in a long time until today. I figured, what the hell, do they still run?

Trying the Indy and O2 out

Plug in the Indy, hit the power button… nothing. Dead as a doornail. Okay, put it aside… what about the O2?

I plug it in, shuffle it closer to the monitor so I can connect it. ‘Lo and behold:

The O2 lives!

Of course, the machine was set up to use a serial console as its primary interface, and boot-up was running very slow.

Booting up… very slowly…

It sat there like that for a while, figuring the action was happening on a serial port, I went to go get my null modem cable, only to find a log-in prompt by the time I got back.

Next was remembering what password I was using when I last ran this machine. We had the OpenSSL heartbleed vulnerability happen since then, and at about that time, I revoked all OpenPGP keys and changed all passwords, so it isn’t what I use today. I couldn’t get in as root, but my regular user account worked, and I was able to change the root password via sudo.

Remembering my old log-in credentials, from 22 years ago it seems

The machine soon crashed after that. I tried rebooting, this time I tweaked some PROM settings (and yes, I was rusty remembering how to do it) to be able to see what was going. (I had the null modem cable in hand, but didn’t feel like trying to blindly find the serial port at the back of my desktop.)

Changing PROM settings
The subsequent boot, and crash

Evidently, I had a dud disk. This did not surprise me in the slightest. I also noticed the PSU fan was not spinning, possibly seized after years of non-use.

Okay, there were two disks installed in this machine, both 80-pin SCA SCSI drives. Which one was it? I took a punt and tried the one furtherest away from the I/O ports.

Success, she boots now

I managed to reset the root password, before the machine powered itself off (possibly because of overheating). I suspect the machine will need the dust blown out of it (safely! — not using the method that killed the Octane!), and the HDDs will need replacements. The guilty culprit was this one (which I guessed correctly first go):

a 4GB HDD was a big drive back in 1998!

The computer I’m typing this on has a HDD that stores 1000 of these drives. Today, there are modern alternatives, such as SCSI2SD that could get this machine running fully if needed. The tricky bit would be handling the 80-pin hot-swap interface. There’d be some hardware hacking needed to connect the two, but AU$145 plus an adaptor seems like a safer bet than trying some random used HDD.

So, replacement for the HDDs, a clean-out, and possibly a new fan or two, and that machine will be back to “working” state. Of course the Linux landscape has moved on since then, Debian no longer support the MIPS4 ISA that the RM5200 CPU understands, Gentoo still could work on this though, and maybe OpenBSD still support this too. In short, this machine is enough of a “go-er” that it should not be sent to land-fill… yet.

Turning my attention back to the Indy

So the Indy was my first SGI machine. Bought to better understand the MIPS processor architecture, and perhaps gain enough understanding to try and breathe life into a Cobalt Qube II server appliance (remember those?), it did teach me a lot about Linux and how things vary between platforms.

I figured I might as well pop the cover and see if there’s anything “obviously” wrong. The procedure I was rusty on, but I recalled there was a little catch on the back of the case that needed to be release before the cover slid off. So I lugged the 20″ CRT off the top of the machine, pulled the non-functioning Indy out, and put it on the table to inspect further.

Upon trying to pop the cover (gently!), the top of the case just exploded. Two pieces of the top cover go flying, and the RF shield parts company with the cover’s underside.

The RF shield parted company with the underside of the lid

I was left with a handful of small plastic fragments that were the heat-set posts holding the RF shield to the inside of the lid.

Some of the fragments that once held the RF shield in place

Clearly, the plastic has become brittle over the years. These machines were released in 1993, I think this might be a 1994-model as it has a slightly upgraded R4600 CPU in it.

As to the machine itself, I had a quick sticky-beak, there didn’t seem to be any immediately obvious things, but to be honest, I didn’t do a very thorough check. Maybe there’s some corrosion under the motherboard I didn’t spot, maybe it’s just a blown fuse in the PSU, who knows?

The inside of the Indy

This particular machine had 256MB RAM (a lot for its day), 8-bit Newport XL graphics, the “Indy Presenter” LCD interface (somewhere, we have the 15″ monitor it came with — sadly the connecting cable has some damaged conductors), and the HDD is a 9.1GB HDD I added some time back.

Where to now?

I was hanging on to these machines with the thinking that someone who was interested in experimenting with RISC machines might want them — find them a new home rather than sending them to landfill. I guess that’s still an option for the O2, as it still boots: so long as its remaining HDD doesn’t die it’ll be fine.

For the others, there’s the possibility of combining bits to make a functional frankenmachine from lots of parts. The Indy will need a new PROM battery if someone does manage to breathe life into it.

The Octane had two SCSI interfaces, one of which was dead — a problem that was known-of before I even acquired it. The PROM would bitch and moan about the dead SCSI interface for a good two minutes before giving up and dumping you in the boot menu. Press 1, and it’d hand over to arcload, which would boot a Linux kernel from a disk on the working controller. Linux would see the dead SCSI controller, and proceed to ignore it, booting just fine as if nothing had happened.

The Indigo2 R10000 was always the red-hedded step child: an artefact of the machine’s design. The IP22 design (Indy and Indigo2) was never designed with the intent of being used with a R10000 CPU, and the speculative execution features played havoc with caching on this design. The Octane worked fine because it was designed from the outset to run this CPU. The O2 could be made to work because of a quirk of its hardware design, but the Indigo2 was not so flexible, so kernel-space code had to hack around the problem in software.

I guess I’d still like to see the machines go to a good home, but no idea who that’d be in this day and age. Somewhere, I have a partial disc set of Irix 6.5, and there’s also a 20″ SGI GDM5410 monitor (not the Sun monitor pictured above) that, at last check, did still work.

It’ll be a sad day when these go to the tip.

Half-arsed integration

You’d be hard pressed to find a global event that has brought as much pandamonium as this COVID-19 situation has in the last two years. Admittedly, Australia seems to have come out of it better than most nations, but not without our own tortise and hare moment on the vaccination “stroll-out”.

One area where we’re all slowly trying to figure out a way to get along, is in contact tracing, and proving vaccination status.

Now, it’s far from a unique problem. If Denso Wave were charging royalties each time a QR code were created or scanned, they’d be richer than Microsoft, Amazon and Apple put together by now. In the beginning of the pandemic, when a need for effective contact tracing was first proposed, we initially did things on paper.

Evidently though, at least here in Queensland, our education system has proven ineffective at teaching today’s crop of adults how to work a pen, with a sufficient number seemingly being unable to write in a legible manner. And so, the state government here mandated that all records shall be electronic.

Now, this wasn’t too bad, yes a little time-consuming, but by-in-large, most of the check-in systems worked with just your phone’s web browser. Some even worked by SMS, no web browser or fancy check-in software needed. It was a bother if you didn’t have a phone on you (e.g. maybe you don’t like using them, or maybe you can’t for legal reasons), but most of the places where they were enforcing this policy, had staff on hand that could take down your details.

The problems really started much later on when first, the Queensland Government decided that there shall be one software package, theirs. This state was not unique in doing this, each state and territory decided that they cannot pool resources together — wheels must be re-invented!

With restrictions opening up, they’re now making vaccination status a factor in deciding what your restrictions are. Okay, no big issue with this in principle, but once again, someone in Canberra thought that what the country really wanted to do was to spend all evening piss-farting around with getting MyGov and ther local state/territory’s check-in application to talk to each-other.

MyGov itself is its own barrel of WTFs. Never needed to worry about it until now… it took 6 attempts with pass to come up with a password that met its rather loosely defined standards, and don’t get me started on the “wish-it-were two-factor” authentication. I did manage to get an account set-up, and indeed, the COVID-19 certificate is as basic as they come; a PDF genrated using the Eclipse BIRT Report Engine, on what looks to be a Linux machine (or some Unix-like system with a /opt directory anyway). The PDF itself just has the coat-of-arms in the background, and some basic text describing whom the certificate is for, what they got poked with and when. Nothing there that would allow machine-verification whatsoever.

The International version (which I don’t have as I lack a passport), embeds a rather large and complicated QR-code which embeds a JSON data structure (perhaps JOSE? I didn’t check) that seems to be digitally signed with an ECC-based private key. That QR code pushes the limits of what a standard QR code can store, but provided the person scanning it has a copy of the corresponding public key, all the data is there for verification.

The alternative to QRZilla, is rather to make an opaque token, and have that link through to a page with further information. This is, after all, what all the check-in QR codes do anyway. Had MyGov embedded such a token on the certificate, it’d be a trivial matter for the document to be printed out, screen-shotted or opened in, an application that needs to check it, and have that direct whatever check-in application to make an API call to the MyGov site to verify the certificate.

But no, they instead have on the MyGov site in addition to the link that gives you the rather bland PDF, a button that “shares with” the check-in applications. To see this button, you have to be logged in on the mobile device running the check-in application(s). For me, that’s the tablet, as my phone is too old for this check-in app stuff.

When you tap that button, it brings you to a page showing you the smorgasboard of check-in applications you can theoretically share the certificate with. Naturally, “Check-in Queensland” is one of those; tapping it, it takes you to a legal agreement page to which you must accept, and after that, magic is supposed to happen.

As you can gather, magic did not happen. I got this instead.

I at least had the PDF, which I’ve since printed, and stashed, so as far as I’m concerned, I’ve met the requirements. If some business owner wants to be a technical elitist, then they can stick it where it hurts.

In amongst the instructions, it makes two curious points:

  • iOS devices, apparently Safari won’t work, they need you to use Chrome on iOS (which really is just Safari pretending to be Chrome)
  • Samsung’s browser apparently needs to be told to permit opening links in third-party applications

I use Firefox for Android on my tablet as I’m a Netscape user from way-back. I had a look at the settings to see if something could help there, and spotted this:

Turning the Open links in apps option on, I wondered if I could get this link-up to work. So, dug out the password, logged in, navigated to the appropriate page… nada, nothing. They changed the wording on the page, but the end result was the same.

So, I’m no closer than I was; and I think I’ll not bother from here on in.

As it is, I’m thankful I don’t need to go interstate. I’ve got better things to do than to muck around with a computer every time I need to go to the shops! Service NSW had a good idea in that, rather than use their application, you could instead go to a website (perhaps with the aide of someone who had the means), punch in your details, and print out some sort of check-in certificate that the business could then scan. Presumably that same certificate could mention vaccination status.

Why this method of checking-in hasn’t been adopted nation-wide is a mystery to me. Seems ridiculous that each state needs to maintain its own database and software, when all these tools are supposed to be doing the same thing.

In any case, it’s a temporary problem: I for one, will be uninstalling any contact-tracing software at some point next year. Once we’re all mingling out in public, sharing coronaviruses with each-other, and internationally… it’ll be too much of a flood of data for each state’s contact tracers to keep up with everyone’s movements.

I’m happy to just tell my phone, tablet or GPS to record a track-log of where I’ve been, and maybe keep a diary — for the sake of these contact tracers. Not hard when they make an announcement that ${LOCATION} is a contact site; me to check, “have I been to ${LOCATION}?” and get in touch if I have, turning over my diary/track logs for contact tracers to do their work. It’ll probably be more accurate than what all these silly applications can give them anyway.

We need to move on, and move forward.

Making InfluxDB’s init script more patient

Recently, I noticed my network monitoring was down… I hadn’t worried about it because I had other things to keep me busy, and thankfully, my network monitoring, whilst important, isn’t mission critical.

I took a look at it today. The symptom was an odd one, influxd was running, it was listening on the back-up/RPC port 8088, but not 8086 for queries.

It otherwise was generating logs as if it were online. What gives?

Tried some different settings, nothing… nada… zilch. Nothing would make it listen to port 8086.

Tried updating to 1.8 (was 1.1), still nothing.

Tried manually running it as root… sure enough, if I waited long enough, it started on its own, and did begin listening on port 8086. Hmmm, I wonder. I had a look at the init scripts:

#!/bin/bash -e

/usr/bin/influxd -config /etc/influxdb/influxdb.conf $INFLUXD_OPTS &
PID=$!
echo $PID > /var/lib/influxdb/influxd.pid

PROTOCOL="http"
BIND_ADDRESS=$(influxd config | grep -A5 "\[http\]" | grep '^  bind-address' | cut -d ' ' -f5 | tr -d '"')
HTTPS_ENABLED_FOUND=$(influxd config | grep "https-enabled = true" | cut -d ' ' -f5)
HTTPS_ENABLED=${HTTPS_ENABLED_FOUND:-"false"}
if [ $HTTPS_ENABLED = "true" ]; then
  HTTPS_CERT=$(influxd config | grep "https-certificate" | cut -d ' ' -f5 | tr -d '"')
  if [ ! -f "${HTTPS_CERT}" ]; then
    echo "${HTTPS_CERT} not found! Exiting..."
    exit 1
  fi
  echo "$HTTPS_CERT found"
  PROTOCOL="https"
fi
HOST=${BIND_ADDRESS%%:*}
HOST=${HOST:-"localhost"}
PORT=${BIND_ADDRESS##*:}

set +e
max_attempts=10
url="$PROTOCOL://$HOST:$PORT/health"
result=$(curl -k -s -o /dev/null $url -w %{http_code})
while [ "$result" != "200" ]; do
  sleep 1
  result=$(curl -k -s -o /dev/null $url -w %{http_code})
  max_attempts=$(($max_attempts-1))
  if [ $max_attempts -le 0 ]; then
    echo "Failed to reach influxdb $PROTOCOL endpoint at $url"
    exit 1
  fi
done
set -e

Ahh right, so start the server, check every second to see if it’s up, and if not, just abort and let systemd restart the whole shebang. Because turning the power on-off-on-off-on-off is going to make it go faster, right?

I changed max_attempts to 360 and the sleep to 10.

Having fixed this, I am now getting data back into my system.

Peer-to-peer LAN VPN using WireGuard

So, the situation: I have two boxes that must replicate data between themselves and generally keep in contact with one another over a network (Ethernet or WiFi) that I do not control. I want the two to maintain a peer-to-peer VPN over this potentially hostile network: ensuring confidentiality and authenticity of data sent over the tunnelled link.

The two nodes should be able to try and find each-other via other means, such as mDNS (Avahi).

I had thought of just using OpenVPN in its P2P mode, but I figured I’d try something new, WireGuard. Both machines are running Debian 10 (Buster) on AMD64 hardware, but this should be reasonably applicable to lots of platforms and Linux-based OSes.

This assumes WireGuard is in fact, installed: sudo apt-get install -y wireguard will do the deed on Debian/Ubuntu.

Initial settings

First, having installed WireGuard, I needed to make some decisions as to how the VPN would be addressed. I opted for using an IPv6 ULA. Why IPv6? Well, remember I mentioned I do not control the network? They could be using any IPv4 subnet, including the one I hypothetically might choose for my own network. This is also true of ULAs, however the probabilities are ridiculously small: parts per billion chance, enough to ignore!

So, I trundled over to a ULA generator site and generated a ULA. I made up a MAC address for this purpose. For the purposes of this document let’s pretend it gave me 2001:db8:aaaa::/48 as my address (yes, I know this is not a ULA, this is in the documentation prefix). For our VPN, we’ll statically allocate some addresses out of 2001:db8:aaaa:1000::/64, leaving the other address blocks free for other use as desired.

For ease of set-up, we also picked a port number for each node to listen on, WireGuard’s Quick Start guide uses port 51820, which is as good as any, so we used that.

Finally, we need to choose a name for the VPN interface, wg0 seemed as good as any.

Summarising:

  • ULA: 2001:db8:aaaa::/48
  • VPN subnet: 2001:db8:aaaa:1000::/64
  • Listening port number: 51820
  • WireGuard interface: wg0

Generating keys

Each node needs a keypair for communicating with its peers. I did the following:

( umask 077 ; wg genkey > /etc/wg.priv )
wg pubkey < /etc/wg.priv > /etc/wg.pub

I gathered all the wg.pub files from my nodes and stashed them locally

Creating settings for all nodes

I then made some settings files for some shell scripts to load. First, a description of the VPN settings for wg0, I put this into /etc/wg0.conf:

INTERFACE=wg0
SUBNET_IP=2001:db8:aaaa:1000::
SUBNET_SZ=64
LISTEN_PORT=51820
PERSISTENT_KEEPALIVE=60

Then, in a directory called wg.peers, I added a file with the following content for each peer:

pubkey=< node's /etc/wg.pub content >
ip=<node's VPN IP >

The VPN IP was just allocated starting at ::1 and counting upwards, do whatever you feel is appropriate for your virtual network. The IPs only need to be unique and within that same subnet.

Both the wg.peers and wg0.conf were copied to /etc on all nodes.

The VPN clean-up script

I mention this first, since it makes debugging the set-up script easier since there’s a single command that will bring down the VPN and clean up /etc/hosts:

#!/bin/bash

. /etc/wg0.conf

if [ -d /sys/class/net/${INTERFACE} ]; then
	ip link set ${INTERFACE} down
	ip link delete dev ${INTERFACE}

	sed -i -e "/^${SUBNET_IP}/ d" /etc/hosts
fi

This checks for the existence of wg0, and if found, brings the link down and deletes it; then cleans up all VPN IPs from the /etc/hosts file. Copy this to /usr/local/sbin, make permissions 0700.

The VPN set-up script

This is what establishes the link. The set-up script can take arguments that tell it where to find each peer: e.g. peernode.local=10.20.30.40 to set a static IP, or peernode.local=10.20.30.40:12345 if an alternate port is needed.

Giving peernode.local=listen just tells the script to tell WireGuard to listen for an incoming connection from that peer, where-ever it happens to be.

If a peer is not mentioned, it tries to discover the address of the peer using getent: the peer must have a non-link-local, non-VPN address assigned to it already: this is due to getent not being able to tell me which interface the link-local address came from. If it does, and it can ping that address, it considers the node up and adds it.

Nodes that do not have a static address configured, are set to listen, or are not otherwise locatable and reachable, are dropped off the list for VPN set-up. For two peers, this makes sense, since we want them to actively seek each-other out; for three nodes you might want to add these in “listen” mode, an exercise I leave for the reader.

#!/bin/bash

set -e

. /etc/wg0.conf

# Pick up my IP details and private key
ME=$( uname -n ).local
MY_IP=$( . /etc/wg.peers/${ME} ; echo ${ip} )

# Abort if we already have our interface
if [ -d /sys/class/net/${INTERFACE} ]; then
	exit 0
fi

# Gather command line arguments
declare -A static_peers
while [ $# -gt 0 ]; do
	case "$1" in
	*=*)	# Peer address
		static_peers["${1%=*}"]="${1#*=}"
		shift
		;;
	*)
		echo "Unrecognised argument: $1"
		exit 1
	esac
done

# Gather the cryptography configuration settings
peers=""
for peerfile in /etc/wg.peers/*; do
	peer=$( basename ${peerfile} )
	if [ "${peer}" != ${ME} ]; then
		# Derive a host name for the endpoint on the VPN
		host=${peer%.local}
		vpn_hostname=${host}.vpn

		# Do we have an endpoint IP given on the command line?
		endpoint=${static_peers[${peer}]}

		if [ -n "${endpoint}" ] && [ "${endpoint}" != listen ]; then
			# Given an IP/name, add brackets around IPv6, add port number if needed.
			endpoint=$(
				echo "${endpoint}" | sed \
					-e 's/^[0-9a-f]\+:[0-9a-f]\+:[0-9a-f:]\+$/[&]/' \
					-e "s/^\\(\\[[0-9a-f:]\\+\\]\\|[0-9\\.]\+\\)\$/\1:${LISTEN_PORT}/"
			)
		elif [ -z "${endpoint}" ]; then
			# Try to resolve the IP address for the peer
			# Ignore link-local and VPN tunnel!
			endpoint_ip=$(
				getent hosts ${peer} \
					| cut -f 1 -d' ' \
					| grep -v "^\(fe80:\|169\.\|${SUBNET_IP}\)"
			)

			if ping -n -w 20 -c 1 ${endpoint_ip}; then
				# Endpoint is reachable.  Construct endpoint argument
				endpoint=$( echo ${endpoint_ip} | sed -e '/:/ s/^.*$/[&]/' ):${LISTEN_PORT}
			fi
		fi

		# Test reachability
		if [ -n "${endpoint}" ]; then
			# Pick up peer pubkey and VPN IP
			. ${peerfile}

			# Add to peers
			peers="${peers} peer ${pubkey}"
			if [ "${endpoint}" != "listen" ]; then
				peers="${peers} endpoint ${endpoint}"
			fi
			peers="${peers} persistent-keepalive ${PERSISTENT_KEEPALIVE}"
			peers="${peers} allowed-ips ${SUBNET_IP}/${SUBNET_SZ}"

			if ! grep -q "${vpn_hostname} ${host}\\$" /etc/hosts ; then
				# Add to /etc/hosts
				echo "${ip} ${vpn_hostname} ${host}" >> /etc/hosts
			else
				# Update /etc/hosts
				sed -i -e "/${vpn_hostname} ${host}\\$/ s/^[^ ]\+/${ip}/" \
					/etc/hosts
			fi
		else
			# Remove from /etc/hosts
			sed -i -e "/${vpn_hostname} ${host}\\$/ d" \
				/etc/hosts
		fi
	fi
done

# Abort if no peers
if [ -z "${peers}" ]; then
	exit 0
fi

# Create the interface
ip link add ${INTERFACE} type wireguard

# Configre the cryptographic settings
wg set ${INTERFACE} listen-port ${LISTEN_PORT} \
	private-key /etc/wg.priv ${peers}

# Bring the interface up
ip -6 addr add ${MY_IP}/${SUBNET_SZ} dev ${INTERFACE}
ip link set ${INTERFACE} up

This is run from /etc/cron.d/vpn:

* * * * * root /usr/local/sbin/vpn-up.sh >> /tmp/vpn.log 2>&1

6LoWHAM: TCP over sensor networks

I stumbled across this article regarding the use of TCP over sensor networks. Now, TCP has been done with AX.25 before, and generally suffers greatly from packet collisions. Apparently (I haven’t read more than the first few paragraphs of this article), implementations TCP can be tuned to improve performance in such networks, which may mean TCP can be made more practical on packet radio networks.

Prior to seeing this, I had thought 6LoWHAM would “tunnel” TCP over a conventional AX.25 connection using I-frames and S-frames to carry TCP segments with some header prepended so that multiple TCP connections between two peers can share the same AX.25 connection.

I’ve printed it out, and made a note of it here… when I get a moment I may give this a closer look. Ultimately I still think multicast communications is the way forward here: radio inherently favours one-to-many communications due to it being a shared medium, but there are definitely situations in which being able to do one-to-one communications applies; and for those, TCP isn’t a bad solution.

Comments having read the article

So, I had a read through it. The take-aways seem to be this:

  • TCP was historically seen as “too heavy” because the MCUs of the day (circa 2002) lacked the RAM needed for TCP data structures. More modern MCUs have orders of magnitude more RAM (32KiB vs 512B) today, and so this is less of an issue.
    • For 6LoWHAM, intended for single-board computers running Linux, this will not be an issue.
  • A lot of early experiments with TCP over sensor networks tried to set a conservative MSS based on the actual link MTU, leading to TCP headers dominating the lower-level frame. Leaning on 6LoWPAN’s ability to fragment IP datagrams lead to much improved performance.
    • 6LoWHAM uses AX.25 which can support 256-byte frames; vs 128-byte 802.15.4 frames on 6LoWPAN. Maybe gains can be made this way, but we’re already a bit ahead on this.
  • Much of the document considered battery-powered nodes, in which the radio transceiver was powered down completely for periods of time to save power, and the effects this had on TCP communications. Optimisations were able to be made that reduced the impact of such power-down events.
    • 6LoWHAM will likely be using conventional VHF/UHF transceivers. Hand-helds often implement a “battery saver” mode — often this is configured inside the device with no external control possible (thus it will not be possible for us to control, or even detect, when the receiver is powered down). Mobile sets often do not implement this, and you do not want to frequently power-cycle a modern mobile transceiver at the sorts of rates that 802.15.4 radios get power-cycled!
  • Performance in ideal conditions favoured TCP, with the article authors managing to achieve 30% of the raw link bandwidth (75kbps of a theoretical 250kbps maximum), with the underlying hardware being fingered as a possible cause for performance issues.
    • Assuming we could manage the same percentage; that would equate to ~360bps on 1200-baud networks, or 2.88kbps on 9600-baud networks.
  • With up to 15% packet loss, TCP and CoAP (its nearest contender) can perform about the same in terms of reliability.
  • A significant factor in effective data rate is CSMA/CA. aioax25 effectively does CSMA/CA too.

Its interesting to note they didn’t try to do anything special with the TCP headers (e.g. Van Jacobson compression). I’ll have to have a look at TCP and see just how much overhead there is in a typical segment, and whether the roughly double MTU of AX.25 will help or not: the article recommends using MSS of approximately 3× the link MTU for “fair” conditions (so ~384 bytes), and 5× in “good” conditions (~640 bytes).

It’s worth noting a 256-byte AX.25 frame takes ~2 seconds to transmit on a 1200-baud link. You really don’t want to make that a habit! So smaller transmissions using UDP-based protocols may still be worthwhile in our application.