tcpip

6LoWHAM: TCP over sensor networks

I stumbled across this article regarding the use of TCP over sensor networks. Now, TCP has been done with AX.25 before, and generally suffers greatly from packet collisions. Apparently (I haven’t read more than the first few paragraphs of this article), implementations TCP can be tuned to improve performance in such networks, which may mean TCP can be made more practical on packet radio networks.

Prior to seeing this, I had thought 6LoWHAM would “tunnel” TCP over a conventional AX.25 connection using I-frames and S-frames to carry TCP segments with some header prepended so that multiple TCP connections between two peers can share the same AX.25 connection.

I’ve printed it out, and made a note of it here… when I get a moment I may give this a closer look. Ultimately I still think multicast communications is the way forward here: radio inherently favours one-to-many communications due to it being a shared medium, but there are definitely situations in which being able to do one-to-one communications applies; and for those, TCP isn’t a bad solution.

Comments having read the article

So, I had a read through it. The take-aways seem to be this:

  • TCP was historically seen as “too heavy” because the MCUs of the day (circa 2002) lacked the RAM needed for TCP data structures. More modern MCUs have orders of magnitude more RAM (32KiB vs 512B) today, and so this is less of an issue.
    • For 6LoWHAM, intended for single-board computers running Linux, this will not be an issue.
  • A lot of early experiments with TCP over sensor networks tried to set a conservative MSS based on the actual link MTU, leading to TCP headers dominating the lower-level frame. Leaning on 6LoWPAN’s ability to fragment IP datagrams lead to much improved performance.
    • 6LoWHAM uses AX.25 which can support 256-byte frames; vs 128-byte 802.15.4 frames on 6LoWPAN. Maybe gains can be made this way, but we’re already a bit ahead on this.
  • Much of the document considered battery-powered nodes, in which the radio transceiver was powered down completely for periods of time to save power, and the effects this had on TCP communications. Optimisations were able to be made that reduced the impact of such power-down events.
    • 6LoWHAM will likely be using conventional VHF/UHF transceivers. Hand-helds often implement a “battery saver” mode — often this is configured inside the device with no external control possible (thus it will not be possible for us to control, or even detect, when the receiver is powered down). Mobile sets often do not implement this, and you do not want to frequently power-cycle a modern mobile transceiver at the sorts of rates that 802.15.4 radios get power-cycled!
  • Performance in ideal conditions favoured TCP, with the article authors managing to achieve 30% of the raw link bandwidth (75kbps of a theoretical 250kbps maximum), with the underlying hardware being fingered as a possible cause for performance issues.
    • Assuming we could manage the same percentage; that would equate to ~360bps on 1200-baud networks, or 2.88kbps on 9600-baud networks.
  • With up to 15% packet loss, TCP and CoAP (its nearest contender) can perform about the same in terms of reliability.
  • A significant factor in effective data rate is CSMA/CA. aioax25 effectively does CSMA/CA too.

Its interesting to note they didn’t try to do anything special with the TCP headers (e.g. Van Jacobson compression). I’ll have to have a look at TCP and see just how much overhead there is in a typical segment, and whether the roughly double MTU of AX.25 will help or not: the article recommends using MSS of approximately 3× the link MTU for “fair” conditions (so ~384 bytes), and 5× in “good” conditions (~640 bytes).

It’s worth noting a 256-byte AX.25 frame takes ~2 seconds to transmit on a 1200-baud link. You really don’t want to make that a habit! So smaller transmissions using UDP-based protocols may still be worthwhile in our application.

6LoWHAM: ARQ over a “mesh” network

So earlier, I had mentioned that it’s really not desirable to have ARQ (automatic repeat request) on a link carrying TCP datagrams.  My comment is based on this observation:

http://sites.inka.de/bigred/devel/tcp-tcp.html

In that article, the discussion is about one TCP connection being tunnelled over another TCP connection.  Basically it comes down to the lower layer buffering and re-sending the TCP datagrams just as the upper layer gives up on hearing a reply and re-sends its own attempt.

Now, end-to-end ACKs have been done on long chains of AX.25 networks before.  It’s generally accepted to be an unreliable mechanism.  UDP for sure can benefit, but then many protocols that use UDP already do their own handling of lost messages.  CoAP for instance does its own ARQ, as does TFTP.

Gerald Wagenknecht, Markus Anwander and Torsten Braun discuss some of the impacts of this on a 802.15.4 network in their thesis “Hop-to-Hop Reliability in IP-based Wireless Sensor Networks – a Cross-Layer Approach“.  In this, they talk about a variant of TCP called TSS: TCP Support for Sensor Networks.  This was discussed at depth in a thesis by Adam Dunkels, “Towards TCP/IP for Wireless Sensor Networks“.

This latter document, was apparently the inspiration for 6LoWPAN.  Section 4.4.3 discusses the approaches to handling ARQ in TCP.  Section 9.6 goes into further detail on how ARQ might be handled elsewhere in the network.

Thankfully in our case, it’s only the network that’s constrained, the nodes themselves will be no smaller than a Raspberry Pi which would have held its own against the PC that Adam Dunkels used to write that thesis!

In short, it looks as if just routing IP packets is not going to cut it, we need to actually handle the TCP side of things as well.  As for other protocols like CoAP, I guess the answer is be patient.  The timeout settings defined in RFC-7252 are usually tuneable, and it may be desirable to back those off just a little for use over AX.25.

6LoWHAM Research

So, doing some more digging here.  One question people might ask is what kind of applications would I use over this network?

Bear in mind that it’s running at 1200 baud!  If we use HTTP at all, tiny is the word!  No bloated images, and definitely no big heavy JavaScript frameworks like ReactJS, Angular, DoJo or JQuery.  You can forget watching Netflicks in 4k over this link.

HTTP really isn’t designed for low-bandwidth links, as Steve Netting demonstrated:

The page itself is bad enough, but even then, it’s loaded after a minute.  The real slow bit is the 20kB GIF.

So yeah, slow-scan television, the ability to send weather radar images over, that is something I was thinking of, but not like that!

HTTP uses pretty verbose headers:

GET /qld/forecasts/brisbane.shtml?ref=hdr HTTP/1.1
Host: www.bom.gov.au
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-AU,en-GB;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://www.bom.gov.au/products/IDR664.loop.shtml
Cookie: bom_meteye_windspeed_units_knots=yes
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Pragma: no-cache
Cache-Control: no-cache

HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Encoding: gzip
Content-Type: text/html; charset=UTF-8
Server: Apache
Vary: Accept-Encoding
Content-Length: 6321
Date: Sat, 20 Oct 2018 10:56:12 GMT
Connection: keep-alive

That request is 508 bytes and the response headers are 216 bytes.  It’d be inappropriate on 6LoWPAN as you’d be fragmenting that packet left right and centre in order to squeeze it into the 128-byte 802.15.4 frames.

In that video, ICMP echo requests were also demonstrated, and those weren’t bad!  Yes, a little slow, but workable.  So to me, it’s not the packet network that’s the problem, it’s just that something big like HTTP is just not appropriate for a 1200-baud radio link.

It might work on 9600 baud packet … maybe.  My Kantronics KPC3 doesn’t do 9600 baud over the air.

CoAP was designed for tight messages.  It is UDP based, so your TCP connection overhead disappears, and the “options” are encoded as individual bytes in many cases.  There are other UDP-based protocols that would work fine too, as well as older TCP protocols such as Telnet.

A request, and reply in CoAP look something like this:

Hex dump of request:
00000000  40 01 00 01 3b 65 78 61  6d 70 6c 65 2e 63 6f 6d   @...;exa mple.com
00000010  81 63 03 52 46 77 11 3c                            .c.RFw.< 

Hex dump of response:
    00000000  60 45 00 01 c1 3c ff a1  1a 00 01 11 70 a1 01 a3   `E...<.. ....p...
    00000010  04 18 64 02 6b 31 39 32  2e 31 36 38 2e 30 2e 31   ..d.k192 .168.0.1
    00000020  03 64 65 74 68 30                                  .deth0

Or in more human readable form:

Request:
Constrained Application Protocol, Confirmable, GET, MID:1
    01.. .... = Version: 1
    ..00 .... = Type: Confirmable (0)
    .... 0000 = Token Length: 0
    Code: GET (1)
    Message ID: 1
    Opt Name: #1: Uri-Host: example.com
        Opt Desc: Type 3, Critical, Unsafe
        0011 .... = Opt Delta: 3
        .... 1011 = Opt Length: 11
        Uri-Host: example.com
    Opt Name: #2: Uri-Path: c
        Opt Desc: Type 11, Critical, Unsafe
        1000 .... = Opt Delta: 8
        .... 0001 = Opt Length: 1
        Uri-Path: c
    Opt Name: #3: Uri-Path: RFw
        Opt Desc: Type 11, Critical, Unsafe
        0000 .... = Opt Delta: 0
        .... 0011 = Opt Length: 3
        Uri-Path: RFw
    Opt Name: #4: Content-Format: application/cbor
        Opt Desc: Type 12, Elective, Safe
        0001 .... = Opt Delta: 1
        .... 0001 = Opt Length: 1
        Content-type: application/cbor
    [Uri-Path: coap://example.com/c/RFw]

Response:
Constrained Application Protocol, Acknowledgement, 2.05 Content, MID:1
    01.. .... = Version: 1
    ..10 .... = Type: Acknowledgement (2)
    .... 0000 = Token Length: 0
    Code: 2.05 Content (69)
    Message ID: 1
    Opt Name: #1: Content-Format: application/cbor
        Opt Desc: Type 12, Elective, Safe
        1100 .... = Opt Delta: 12
        .... 0001 = Opt Length: 1
        Content-type: application/cbor
    End of options marker: 255
    Payload: Payload Content-Format: application/cbor, Length: 31
        Payload Desc: application/cbor
        [Payload Length: 31]
Concise Binary Object Representation
    Map: (1 entries)
        Unsigned Integer: 70000
            Map: (1 entries)
                ...0 0001 = Unsigned Integer: 1
                    Map: (3 entries)
                        ...0 0100 = Unsigned Integer: 4
                            Unsigned Integer: 100
                        ...0 0010 = Unsigned Integer: 2
                            Text String: 192.168.0.1
                        ...0 0011 = Unsigned Integer: 3
                            Text String: eth0

That there, also shows another tool to data packing: CBOR.  CBOR is basically binary JSON.  Just like JSON it is schemaless, it has objects, arrays, strings, booleans, nulls and numbers (CBOR differentiates between integers of various sizes and floats).  Unlike JSON, it is tight.  The CBOR blob in this response would look like this as JSON (in the most compact representation possible):

{70000:{4:100,2:"192.168.0.1",3:"eth0"}}

The entire exchange is 190 bytes, less than a quarter of the size of just the HTTP request alone.  I think that would work just fine over 1200 baud packet.  As a bonus, you can also multicast, try doing that with HTTP.

So you’d be writing higher-level services that would use this instead of JSON-REST interfaces.  There’s a growing number of libraries that can consume this sort of thing, and IoT is pushing that further.  I think it’s doable.

Now, on the routing front, I’ve been digging up a bit on Net/ROM.  Net/ROM is actually two parts, Net/ROM Level 3 does the routing and level 4 does the circuit switching.  It’s the “Level 3” bit we want.

Coming up with a definitive specification of the protocol has been a bit tough, it doesn’t help that there is a company called NetROM, but I did manage to find this document.  In a way, if I could make my software behave like a Net/ROM node, I could piggy-back off that to discover neighbours.  Thus this protocol would co-exist along side Net/ROM networks that may be completely oblivious to TCP/IP.

This is preferable to just re-inventing the wheel…yes I know non-circular wheels are so much fun!  Really, once Net/ROM L3 has figured out where everyone is, IP routing just becomes a matter of correctly addressing the AX.25 frame so the next hop receives the message.

VK4RZB at Mt. Coot-tha is one such node running TheNet.  Easy enough to do tests on as it’s a mere stone throw away from my home QTH.

There’s a little consideration to make about how to label the AX.25 frame.  Obviously, it’ll be a UI frame, but what PID field should I use?  My instinct suggests that I should just label it as “ARPA Internet Protocol”, since it is Internet Protocol traffic, just IPv6 instead of v4.  Not all the codes are taken though, 0xc9 is free, so I could be cheeky and use that instead.  If the idea takes off, we can talk with the TAPR then.

6LoWHAM Background

So, I’ll admit to looking at AX.25 with the typical modems available (the classical 1200-baud AFSK and the more modern G3RUH modem which runs at a blistering 9600 baud… look out 5G!) years ago and wondering “what’s the point”?

It was Brisbane Area WICEN’s involvement in the International Rally of Queensland that changed my view somewhat.  This was an event that, until CAMS knocked it on the head, ran annually in the Imbil State Forest up in the Sunshine Coast hinterland.

There, WICEN used it for forwarding the scores of drivers as they passed through each stage of the rally.  A checkpoint would be at the start and finish of each stage, and a packet network would be set up with digipeaters in strategic locations and a base station, often located at the Imbil school.

The organisers of IRoQ did experiment with other ways of getting scores through, including hiring bandwidth on satellites, flying planes around in circles over the area, and other shenanigans.  Although these systems had faster throughput speeds, one thing they had which we did not have, was latency.  The score would arrive back at base long before the car had left the check point.

This freed up the analogue FM network for reporting other more serious matters.

In addition to this kind of work, WICEN also help out with horse endurance rides.  Traditionally we’ve just relied on good ol’e analogue FM radio, but in events such as the Tom Quilty, there has been a desire to use packet as a mechanism for reporting when horses arrive at given checkpoints and to perhaps enable autonomous stations that can detect horses via RFID and report those “back to base” to deter riders from cheating.

The challenge of AX.25 is two-fold:

  1. With the exception of Linux, no other OS has any kind of baked-in support for it, so writing applications that can interact with it means either implementing your own AX.25 stack or interfacing to some third-party stack such as BPQ.
  2. Due to the specialised stack, applications often have to run as privileged applications, can have problems with firewalling, etc.

The AX.25 protocol does do static routing.  It offers connected-mode links (like TCP) and a connectionless-mode (like UDP), and there are at least two routing protocols I know of that allow for dynamic routing (ROSE, Net/ROM).  There is a standard for doing IPv4 over AX.25, but you still need to manage the allocation of addresses and other details, it isn’t plug-and-play.

Net/ROM would make an ideal way to forward 6LoWPAN traffic, except it only does connected mode, and doing IP over a “TCP-like” link is really a bad idea.  (Anything that does automatic repeat requests really messes with TCP/IP.)

I have no idea whether ROSE does the connectionless mode, but the idea of needing to come up with a 10-digit numeric “address” is a real turn-off.

If the address used can be derived off the call-sign of the operator, that makes life a lot easier.

The IPv6 address format has enough bits to do that.  To me the most obvious way would be to derive a MAC address from a call-sign and an arbitrarily chosen digit (0-7).  It would be reversible of course, and since the MAC address is used in SLAAC, you would see the station’s call-sign in the IPv6 address.

The thinking is that there’s a lot of problems that have been solved in 6LoWPAN.  Discovery of services for example is handled using mechanisms like mDNS and CoRE RD.  We don’t need to forward Internet traffic, although being able to pull up the Mt. Kanigan and Mt. Stapylton radars over such a network would be real handy at times (yes, I know it’ll be slow).

The OS will view the packet network like a VPN, and so writing applications that can talk over packet will be no different to writing any other kind of network software.  Any consumer desktop OS written in the last 16 years has the necessary infrastructure to support it (even Windows 2000, there was a downloadable add-on for it).

Linking two separate “mesh” networks via point-to-point links is also trivial.  Each mesh will of course see the other as “external” but participants on both can nonetheless communicate.

The guts of 6LoWPAN is in RFC-4944.  This specifies details about how the IPv6 datagram is encoded as a IEEE 802.15.4 payload, and how the infrastructure within 802.15.4 is used to route IPv6.  Gnarly details like how fragmentation of a 1280-byte IPv6 datagram into something that will fit the 128-byte maximum 802.15.4 frames is handled here.  For what it’s worth, AX.25 allows 255 bytes (or was it 256?), so we’re ahead there.

Crucially, it is assumed that the 802.15.4 layer can figure out how to get from node A to node Z via B… C…, etc.  802.15.4 networks are managed by a PAN coordinator, which provides various services to the network.

AX.25 makes this “our problem”.  Yes the sender of a frame can direct which digipeaters a frame should be passed to, but they have to figure that out.  It’s like sending an email by UUCP, you need a map of the Internet to figure out what someone’s address is relative to your site.

Plain AX.25 digipeaters will of course be part of the mix, so having the ability for a node stuck on one side of such a digipeater would be worth having, but ultimately, the aim here will be to provide a route discovery mechanism in place that, knowing a few static digipeater routes, can figure out who is able to hear whom, and route traffic accordingly.