In that article, the discussion is about one TCP connection being tunnelled over another TCP connection. Basically it comes down to the lower layer buffering and re-sending the TCP datagrams just as the upper layer gives up on hearing a reply and re-sends its own attempt.
Now, end-to-end ACKs have been done on long chains of AX.25 networks before. It’s generally accepted to be an unreliable mechanism. UDP for sure can benefit, but then many protocols that use UDP already do their own handling of lost messages. CoAP for instance does its own ARQ, as does TFTP.
This latter document, was apparently the inspiration for 6LoWPAN. Section 4.4.3 discusses the approaches to handling ARQ in TCP. Section 9.6 goes into further detail on how ARQ might be handled elsewhere in the network.
Thankfully in our case, it’s only the network that’s constrained, the nodes themselves will be no smaller than a Raspberry Pi which would have held its own against the PC that Adam Dunkels used to write that thesis!
In short, it looks as if just routing IP packets is not going to cut it, we need to actually handle the TCP side of things as well. As for other protocols like CoAP, I guess the answer is be patient. The timeout settings defined in RFC-7252 are usually tuneable, and it may be desirable to back those off just a little for use over AX.25.
So, for the past few weeks I’ve been running a Redarc BCDC-1225 solar controller to keep the batteries charged. I initially found I had to make my little power controller back off on the mains charger a bit, but was finally able to prove conclusively that the Redarc was able to operate in both boost and float modes.
In the interests of science, I have plugged the Powertech back in. I have changed nothing else. What I’m interested to see, is if the Powertech in fact behaves itself, or whether it will go back to its usual tricks.
The following is the last 6 hours.
Next week, particularly Thursday and Friday, are predicted to have similar weather patterns to today. Today’s not a good test, since the battery started at a much higher voltage, so I expect that the solar controller will be doing little more than keeping the battery voltage up to the float set-point.
For reference, the settings on the MP-3735 are: Boost voltage 14.6V, Float voltage 13.8V. These are the recommended settings according to Century’s datasheets for the batteries concerned.
Interestingly, no sooner do I wire this up, but the power controller reaches for the mains. The MP-3735 definitely likes to flip-flop. Here’s a video of its behaviour shortly after connecting up the solar (and after I turned off the mains charger at the wall).
Now looking, it’s producing about 10A, much better than the 2A it was doing whilst filming. So it can charge properly, when it wants to, but it’s intermittent, and inside you can sometimes hear a quiet clicking noise as if it’s switching a relay. At 2A it’s wasting time, as the cluster draws nearly 5× that.
The hesitation was so bad, the power controller kicked the mains charger in for about 30 minutes, after that, the MP-3735 seems to be behaving itself. I guess the answer is, see what it does tomorrow, and later this week without me intervening.
If it behaves itself, I’m happy to leave it there, otherwise I’ll be ordering a VSR, pulling out the Powertech MP-3735 and re-instating the Redarc BCDC-1225 with the VSR to protect against over-discharge.
Update 2018-10-28… okay, overcast for a few hours this morning, but by 11AM it had fined up. The solar performance however was abysmal.
Let’s see how it goes this week… but I think I might be ordering that VSR and installing the Redarc permanently now.
Each one of those vertical lines was accompanied by a warning email.
Today it seems, the IT gremlins have been out to get me. At my work I have a desktop computer (personal hardware) consisting of a Rysen 7 1700, 16GB RAM, a 240GB Intel M.2 SATA SSD (540 series) and a 4TB Western Digital HDD.
The machine has been, pretty reliable, not rock-solid, in particular, compiling gcc sometimes segfaulted for reasons unknown (the RAM checks out okay according to memtest86), but for what I was doing, it mostly ran fine. I put up with the minor niggles with the view of solving those another day. Today though, I come in and find X has crashed.
Okay, no big deal, re-start the display manager, except that crashed too.
Hmm, okay, log in under my regular user account and try startx: No dice, there’s no room on /.
Ahh, that might explain a few things, we clean up some log files, truncate a 500MB file, manage to free up 50GB (!).
The machine dual-boots two OSes: Debian 9 and Gentoo. It’s been running the latter for about 12 months now, I used Debian 9 to get things rolling so I could use the machine at work (did try Ubuntu 16.04, but it didn’t like my machine), and later, used that to get Gentoo running before switching over. So there was a 40GB partition on the SSD that had a year-old install of Debian that wasn’t being used. I figured I’d ditch it, and re-locate my Gentoo partition to occupy that space.
So I pull out an Ubuntu 18.04 disc, boot that up, and get gparted going. It’s happily copying, until WHAM, I was hit with an I/O error:
Failed re-location of partition (click to enlarge)
Clicking any of the three buttons resulted in the same message. Brilliant. I had just copied over the first 15GB of the partition, so the Debian install would be hosed (I was deleting it anyway), but my Gentoo root partition should still be there intact at its old location. Of course the partition table was updated, so no rolling back there. At this point, I couldn’t do anything with the SSD, it had completely stalled, and I just had to cut my losses and kill gparted.
I managed to make some room on the 4TB drive shuffling some partitions around so I could install Ubuntu 18.04 there. My /home partition was btrfs on the 4TB drive (first partition), the rest of that drive was LVM. I just shrank my /home down by 40GB and slipped it in there. The boot-loader didn’t install (no EFI partition), but who cares, I know enough of grub to boot from the DVD and bootstrap the machine that way. At first it wouldn’t boot because in their wisdom, they created the root partition with a @ subvolume. I worked around that by making the @ subvolume the default.
Then there was momentary panic when the /home partition I had specified lacked my usual files. Turned out, they had created a @home subvolume on my existing /home partition. Why? Who knows? Debian/Ubuntu seem to do strange things with btrfs which do nothing but complicate matters and I do not understand the reasoning. Editing /etc/fstab to remove the subvolume argument for /home and re-booting fixed that.
I set up a LVM volume that would receive a DD dump of the mangled partition to see what could be saved. GNU’s ddrescue managed to recover most of the raw partition, and so now I just had to find where the start was. If I had the output of fdisk -l before I started, I’d be right, but I didn’t have that foresight. (Maybe if I had just formatted a LVM volume and DD’d the root fs before involving gparted? Never mind!)
I figured there’d be some kind of magic bytes I could “grep” for. Something that would tell me “BTRFS was here”. Sure enough, the information is stashed in the superblock. At 0x00010040 from the start of the partition, I should see the magic bytes 5f 42 47 52 66 53 5f 4d. I just needed to grep for these. To speed things up I made an educated guess on the start-location. The screenshot says the old partition was about 37.25GB in size, so that was a hint to maybe try skipping that bit and see what could be seen.
Sure enough, I found what looked to be the superblock:
Whilst waiting for this to complete, I double-checked my findings, by inspecting the other fields. From the screenshot, I know my filesystem UUID was 6513-682e-7182-4474-89e6-c0d1c71866ad. Looking at the superblock, sure enough I see that listed:
Looks promising! After an agonising wait, the dd finishes. I can check the filesystem:
root@vk4msl-ws:~# btrfsck /dev/scratch/gentoo-root
Checking filesystem on /dev/scratch/gentoo-root
checking free space cache
block group 111690121216 has wrong amount of free space
failed to load free space cache for block group 111690121216
block group 161082245120 has wrong amount of free space
failed to load free space cache for block group 161082245120
checking fs roots
checking root refs
found 107544387643 bytes used, no error found
total csum bytes: 99132872
total tree bytes: 6008504320
total fs tree bytes: 5592694784
total extent tree bytes: 271663104
btree space waste bytes: 1142962475
file data blocks allocated: 195274670080
Okay, it complained that the free space was wrong (which I’ll blame on gparted prematurely growing the partition), but the data is there! This is confirmed by mounting the volume and doing a ls:
root@vk4msl-ws:~# mount /dev/scratch/gentoo-root /mnt/
root@vk4msl-ws:~# ls /mnt/ -l
drwxr-xr-x 1 root root 1020 Oct 7 14:13 bin
drwxr-xr-x 1 root root 18 Jul 21 2017 boot
drwxr-xr-x 1 root root 16 May 28 10:29 dbus-1
drwxr-xr-x 1 root root 1686 May 31 2017 dev
drwxr-xr-x 1 root root 3620 Oct 19 18:53 etc
drwxr-xr-x 1 root root 0 Jul 14 2017 home
lrwxrwxrwx 1 root root 5 Sep 17 09:20 lib -> lib64
drwxr-xr-x 1 root root 1156 Oct 7 13:59 lib32
drwxr-xr-x 1 root root 4926 Oct 13 05:13 lib64
drwxr-xr-x 1 root root 70 Oct 19 11:52 media
drwxr-xr-x 1 root root 28 Apr 23 13:18 mnt
drwxr-xr-x 1 root root 336 Oct 9 07:27 opt
drwxr-xr-x 1 root root 0 May 31 2017 proc
drwx------ 1 root root 390 Oct 22 06:07 root
drwxr-xr-x 1 root root 10 Jul 6 2017 run
drwxr-xr-x 1 root root 4170 Oct 9 07:57 sbin
drwxr-xr-x 1 root root 10 May 31 2017 sys
drwxrwxrwt 1 root root 6140 Oct 22 06:07 tmp
drwxr-xr-x 1 root root 304 Oct 19 18:20 usr
drwxr-xr-x 1 root root 142 May 17 12:36 var
root@vk4msl-ws:~# cat /mnt/etc/gentoo-release
Gentoo Base System release 2.4.1
Yes, I’ll be backing this up properly RIGHT NOW. But, my data is back, and I’ll be storing this little data recovery technique for next time.
The real lesson here is:
KEEP RELIABLE BACKUPS! You never know when something will fail.
Catch the copy process before it starts overwriting your source data! If there’s no overlap between the old and new locations, you’re fine, but if there is and it starts overwriting the start of your original volume, it’s all over red rover! You might be lucky with a superblock back-up, but don’t bet on it!
Make note of the filesystem type and its approximate location. The fact that I knew roughly where to look, and what sort of filesystem I was looking for meant I could look for magic bytes that say “I’m a BTRFS filesystem”. The magic bytes for EXT4, XFS, etc will differ, but the same concepts are there, you just have to look up the documentation on how your particular filesystem structures its data.
So, I placed an order to Mouser the other day to actually get some parts into my hands so I can better design the boards.
In that, I discovered the screw terminals I was planning on using, are discontinued. So, I found something that was able to take the same gauge wire: Phoenix 1017526s. Turns out, these will not fit along side the current shunts on the board as planned.
There’s just no way I’ll be fitting these on a 5×5cm board and have room to spare for a shunt in between. Since this is really application-specific, it might be better off board. We’ll put the INA219 and PCA9615 together on the board so we have a nice self-contained sensor board that can be mounted close to the current shunt, wherever that lives, and have nice noise-resistant links back to the controller.
This does mean I can do things like put a current shunt in the fuse box where the solar panels connect, and run CAT5 down to the controller from there.
To make routing easier, I’ve gone to a 4-layer board. The board has solder-jumpers for setting the I²C address of the INA219, and I’ve documented all the termination and pull resistors. I’m not sure what ones are needed yet, so there’s space at every point where I could envisage one being needed.
There’s two power planes in the inner layers, one for VCC the other for 0V.
Next step, I’ll print out the board designs and test fit everything before ordering the boards, which I hope to have ordered this afternoon.
So, doing some more digging here. One question people might ask is what kind of applications would I use over this network?
HTTP really isn’t designed for low-bandwidth links, as Steve Netting demonstrated:
The page itself is bad enough, but even then, it’s loaded after a minute. The real slow bit is the 20kB GIF.
So yeah, slow-scan television, the ability to send weather radar images over, that is something I was thinking of, but not like that!
That request is 508 bytes and the response headers are 216 bytes. It’d be inappropriate on 6LoWPAN as you’d be fragmenting that packet left right and centre in order to squeeze it into the 128-byte 802.15.4 frames.
In that video, ICMP echo requests were also demonstrated, and those weren’t bad! Yes, a little slow, but workable. So to me, it’s not the packet network that’s the problem, it’s just that something big like HTTP is just not appropriate for a 1200-baud radio link.
It might work on 9600 baud packet … maybe. My Kantronics KPC3 doesn’t do 9600 baud over the air.
CoAP was designed for tight messages. It is UDP based, so your TCP connection overhead disappears, and the “options” are encoded as individual bytes in many cases. There are other UDP-based protocols that would work fine too, as well as older TCP protocols such as Telnet.
A request, and reply in CoAP look something like this:
That there, also shows another tool to data packing: CBOR. CBOR is basically binary JSON. Just like JSON it is schemaless, it has objects, arrays, strings, booleans, nulls and numbers (CBOR differentiates between integers of various sizes and floats). Unlike JSON, it is tight. The CBOR blob in this response would look like this as JSON (in the most compact representation possible):
The entire exchange is 190 bytes, less than a quarter of the size of just the HTTP request alone. I think that would work just fine over 1200 baud packet. As a bonus, you can also multicast, try doing that with HTTP.
So you’d be writing higher-level services that would use this instead of JSON-REST interfaces. There’s a growing number of libraries that can consume this sort of thing, and IoT is pushing that further. I think it’s doable.
Now, on the routing front, I’ve been digging up a bit on Net/ROM. Net/ROM is actually two parts, Net/ROM Level 3 does the routing and level 4 does the circuit switching. It’s the “Level 3” bit we want.
Coming up with a definitive specification of the protocol has been a bit tough, it doesn’t help that there is a company called NetROM, but I did manage to find this document. In a way, if I could make my software behave like a Net/ROM node, I could piggy-back off that to discover neighbours. Thus this protocol would co-exist along side Net/ROM networks that may be completely oblivious to TCP/IP.
This is preferable to just re-inventing the wheel…yes I know non-circular wheels are so much fun! Really, once Net/ROM L3 has figured out where everyone is, IP routing just becomes a matter of correctly addressing the AX.25 frame so the next hop receives the message.
VK4RZB at Mt. Coot-tha is one such node running TheNet. Easy enough to do tests on as it’s a mere stone throw away from my home QTH.
There’s a little consideration to make about how to label the AX.25 frame. Obviously, it’ll be a UI frame, but what PID field should I use? My instinct suggests that I should just label it as “ARPA Internet Protocol”, since it is Internet Protocol traffic, just IPv6 instead of v4. Not all the codes are taken though, 0xc9 is free, so I could be cheeky and use that instead. If the idea takes off, we can talk with the TAPR then.
So, part of me wants to consider the idea of using amateur radio as a transmission mechanism for 6LoWPAN. The idea being that we use NET/ROM and AX.25 or similar schemes as a transport mechanism for delivering shortened IPv6 packets. Over this, we can use standard TCP/IP programming to write applications.
Protocols designed for low-bandwidth constrained networks are ideal here, so things like CoAP where emphasis is placed on compact representation. 6LoWPAN normally runs over IEEE 802.15.4 which has a payload limit of 128 bytes. AX.25 has a limit of 256 bytes, so is already doing better.
The thinking is that I “encode” the call-sign into a “hardware” address. MAC addresses are nominally 48-bits, although the IEEE is trying to phase that out in favour of 64-bit EUIs. Officially the IEEE looks after this, so we want to avoid doing things that might clash with their system.
A EUI-48 (MAC) address is 6-bytes long, where the first 3 bytes identify the type of address and the organisation, and the latter 3 bytes identify an individual device. The least significant two bits of the first byte are flags that decide whether the address is unicast or local, and whether it is globally administered (by the IEEE) or locally administered.
To avoid complications, we should probably keep the unicast bit cleared to indicate that these addresses are unicast addresses.
Some might argue that the ITU assigns prefixes to countries, and these countries have national bodies that hand out callsigns, thus we could consider callsigns as “globally administered”. Truth is, the IEEE has nothing to do with the process, and could very legitimately assign the EUI-48 prefix 56-4b-34 to a company… in that hypothetical scenario, there goes all the addresses that might represent amateur operators stationed in Queensland. So let’s call these “locally administered”, since there are suffixes the user may choose (e.g. “/P”).
That gives us 46-bits to play with. 7-bit ASCII just fits 6 characters, which would just fit the callsigns used in AX.25 with enough room for a 4-bit SSID. We don’t need all 128 characters though, and a scheme based on DEC’s Radix50 can pack in far more.
We can get 8 arbitrary Radix50 characters into 43 bits, which gives us 3 left over which can be used as the user wishes. We’ll probably call it the SSID, but unlike AX.25, will be limited from 0-7. The user can always use the least significant character in their callsign field for an additional 6 bits, which gives them 9 bits to play with. (i.e. “VK4MSL-1″#0 to encode the AX.25 SSID “VK4MSL-10”)
Flip the multicast bit, and we’ve got a group address.
SLAAC derives the IPv6 address from the EUI-48, so the IPv6 address will effectively encode the callsigns of the two communicating stations. If both are on the same “mesh”, then we can probably borrow ideas from 6LoWPAN for shortening that address.
So, I’ve designed the sensor board, this is basically a break-out of an INA219 coupled with a PCA9615, for extended I²C range. If I was to use one of these on the cluster, it’s theoretically possible for me to put one up in the fuse box on the back deck, and run CAT5e down to the server rack to help measure voltage drop across that long run. Doing that with regular I²C would be insane.
Again, I’ve gone crazy with pull-up, pull-down and termination resistances, not knowing what would be needed. The schematic is nothing special.
The board wound up bigger than I’d expected, but largely because it had to accommodate fairly heavy power traces. I think I’ve got the footprint for the screw terminal blocks right. I’ve managed to cram it onto a 5cm×5cm board (two layer).
As always, you’ve got two ways of dealing with the current shunt, either hook one up externally, which means you don’t bother with the beefy power connection footprints, or you fit a surface-mount shunt on.
You’ve got full flexibility there, as well as what address to set the board to via the jumpers.
I’ll probably order some of the connectors and other parts in question, print out the board layout and test-fit everything. I’m not happy about the fact that NXP only make the PCA9615 in TSSOP, but I guess I should be thankful the part has legs.
No idea what the month-on-month usage is (I haven’t spotted it), but this is a scan from our last bill:
GreenPower? We need no stinkin’ GreenPower!
This won’t take into consideration my tweaks to the controller where I now just bring the mains power in to do top-ups of the battery. These other changes should see yet further reductions in the power bill.
So, I’ve been running the Redarc controller for a little while now, and we’ve had some good days of sunshine to really test it out.
Recall in an earlier posting with the Powertech solar controller I was getting this in broad daylight:
Note the high amount of “noise”, this is the Powertech solar controller PWMing its output. I’m guessing output filtering is one of the corners they cut, I expect to see empty footprints for juicy big capacitors that would have been in the “gold” model sent for emissions testing. It’ll be interesting to tear that down some day.
I’ve had to do some further tweaks to the power controller firmware, so this isn’t an apples-to-apples comparison, maybe next week we’ll try switching back and see what happens, but this was Tuesday, on the Redarc controller:
You can see that overnight, the Meanwell 240V charger was active until a little after 5AM, when my power controller decided the sun should take over. There’s a bit of discharging, until the sun crept up over the roof of our back-fence-neighbour’s house at about 8AM. The Redarc basically started in “float” mode, because the Meanwell had done all the hard work overnight. It remains so until the sun drops down over the horizon around 4PM, and the power controller kicks the mains back on around 6PM.
I figured that, if the Redarc controller saw the battery get below the float voltage at around sunrise, it should boost the voltage.
The SSR controlling the Meanwell was “powered” by the solar, meaning that by default, the charge controller would not be able to inhibit the mains charger at night as there was nothing to power the SSR. I changed that last night, powering it from the battery. Now, the power controller only brings in the mains charger when the battery is below about 12.75V. It’ll remain on until it’s been at above 14.4V for 30 minutes, then turn off.
In the last 24 hours, this is what the battery voltage looks like.
I made the change at around 8PM (can you tell?), and so the battery immediately started discharging, then the charge-discharge cycles began. I’m gambling on the power being always available to give the battery a boost here, but I think the gamble is a safe one. You can see what happened 12 hours later when the sun started hitting the panels: the Redarc sprang into action and is on a nice steady trend to a boost voltage of 14.6V.
We’re predicted to get rain and storms tomorrow and Saturday, but maybe Monday, I might try swapping back to the Powertech controller for a few days and we’ll be able to compare the two side-by-side with the same set-up.
So, I’ll admit to looking at AX.25 with the typical modems available (the classical 1200-baud AFSK and the more modern G3RUH modem which runs at a blistering 9600 baud… look out 5G!) years ago and wondering “what’s the point”?
It was Brisbane Area WICEN’s involvement in the International Rally of Queensland that changed my view somewhat. This was an event that, until CAMS knocked it on the head, ran annually in the Imbil State Forest up in the Sunshine Coast hinterland.
There, WICEN used it for forwarding the scores of drivers as they passed through each stage of the rally. A checkpoint would be at the start and finish of each stage, and a packet network would be set up with digipeaters in strategic locations and a base station, often located at the Imbil school.
The organisers of IRoQ did experiment with other ways of getting scores through, including hiring bandwidth on satellites, flying planes around in circles over the area, and other shenanigans. Although these systems had faster throughput speeds, one thing they had which we did not have, was latency. The score would arrive back at base long before the car had left the check point.
In addition to this kind of work, WICEN also help out with horse endurance rides. Traditionally we’ve just relied on good ol’e analogue FM radio, but in events such as the Tom Quilty, there has been a desire to use packet as a mechanism for reporting when horses arrive at given checkpoints and to perhaps enable autonomous stations that can detect horses via RFID and report those “back to base” to deter riders from cheating.
The challenge of AX.25 is two-fold:
With the exception of Linux, no other OS has any kind of baked-in support for it, so writing applications that can interact with it means either implementing your own AX.25 stack or interfacing to some third-party stack such as BPQ.
Due to the specialised stack, applications often have to run as privileged applications, can have problems with firewalling, etc.
The AX.25 protocol does do static routing. It offers connected-mode links (like TCP) and a connectionless-mode (like UDP), and there are at least two routing protocols I know of that allow for dynamic routing (ROSE, Net/ROM). There is a standard for doing IPv4 over AX.25, but you still need to manage the allocation of addresses and other details, it isn’t plug-and-play.
Net/ROM would make an ideal way to forward 6LoWPAN traffic, except it only does connected mode, and doing IP over a “TCP-like” link is really a bad idea. (Anything that does automatic repeat requests really messes with TCP/IP.)
I have no idea whether ROSE does the connectionless mode, but the idea of needing to come up with a 10-digit numeric “address” is a real turn-off.
If the address used can be derived off the call-sign of the operator, that makes life a lot easier.
The IPv6 address format has enough bits to do that. To me the most obvious way would be to derive a MAC address from a call-sign and an arbitrarily chosen digit (0-7). It would be reversible of course, and since the MAC address is used in SLAAC, you would see the station’s call-sign in the IPv6 address.
The thinking is that there’s a lot of problems that have been solved in 6LoWPAN. Discovery of services for example is handled using mechanisms like mDNS and CoRE RD. We don’t need to forward Internet traffic, although being able to pull up the Mt. Kanigan and Mt. Stapylton radars over such a network would be real handy at times (yes, I know it’ll be slow).
The OS will view the packet network like a VPN, and so writing applications that can talk over packet will be no different to writing any other kind of network software. Any consumer desktop OS written in the last 16 years has the necessary infrastructure to support it (even Windows 2000, there was a downloadable add-on for it).
Linking two separate “mesh” networks via point-to-point links is also trivial. Each mesh will of course see the other as “external” but participants on both can nonetheless communicate.
The guts of 6LoWPAN is in RFC-4944. This specifies details about how the IPv6 datagram is encoded as a IEEE 802.15.4 payload, and how the infrastructure within 802.15.4 is used to route IPv6. Gnarly details like how fragmentation of a 1280-byte IPv6 datagram into something that will fit the 128-byte maximum 802.15.4 frames is handled here. For what it’s worth, AX.25 allows 255 bytes (or was it 256?), so we’re ahead there.
Crucially, it is assumed that the 802.15.4 layer can figure out how to get from node A to node Z via B… C…, etc. 802.15.4 networks are managed by a PAN coordinator, which provides various services to the network.
AX.25 makes this “our problem”. Yes the sender of a frame can direct which digipeaters a frame should be passed to, but they have to figure that out. It’s like sending an email by UUCP, you need a map of the Internet to figure out what someone’s address is relative to your site.
Plain AX.25 digipeaters will of course be part of the mix, so having the ability for a node stuck on one side of such a digipeater would be worth having, but ultimately, the aim here will be to provide a route discovery mechanism in place that, knowing a few static digipeater routes, can figure out who is able to hear whom, and route traffic accordingly.