Sep 192021
 

I stumbled across this article regarding the use of TCP over sensor networks. Now, TCP has been done with AX.25 before, and generally suffers greatly from packet collisions. Apparently (I haven’t read more than the first few paragraphs of this article), implementations TCP can be tuned to improve performance in such networks, which may mean TCP can be made more practical on packet radio networks.

Prior to seeing this, I had thought 6LoWHAM would “tunnel” TCP over a conventional AX.25 connection using I-frames and S-frames to carry TCP segments with some header prepended so that multiple TCP connections between two peers can share the same AX.25 connection.

I’ve printed it out, and made a note of it here… when I get a moment I may give this a closer look. Ultimately I still think multicast communications is the way forward here: radio inherently favours one-to-many communications due to it being a shared medium, but there are definitely situations in which being able to do one-to-one communications applies; and for those, TCP isn’t a bad solution.

Comments having read the article

So, I had a read through it. The take-aways seem to be this:

  • TCP was historically seen as “too heavy” because the MCUs of the day (circa 2002) lacked the RAM needed for TCP data structures. More modern MCUs have orders of magnitude more RAM (32KiB vs 512B) today, and so this is less of an issue.
    • For 6LoWHAM, intended for single-board computers running Linux, this will not be an issue.
  • A lot of early experiments with TCP over sensor networks tried to set a conservative MSS based on the actual link MTU, leading to TCP headers dominating the lower-level frame. Leaning on 6LoWPAN’s ability to fragment IP datagrams lead to much improved performance.
    • 6LoWHAM uses AX.25 which can support 256-byte frames; vs 128-byte 802.15.4 frames on 6LoWPAN. Maybe gains can be made this way, but we’re already a bit ahead on this.
  • Much of the document considered battery-powered nodes, in which the radio transceiver was powered down completely for periods of time to save power, and the effects this had on TCP communications. Optimisations were able to be made that reduced the impact of such power-down events.
    • 6LoWHAM will likely be using conventional VHF/UHF transceivers. Hand-helds often implement a “battery saver” mode — often this is configured inside the device with no external control possible (thus it will not be possible for us to control, or even detect, when the receiver is powered down). Mobile sets often do not implement this, and you do not want to frequently power-cycle a modern mobile transceiver at the sorts of rates that 802.15.4 radios get power-cycled!
  • Performance in ideal conditions favoured TCP, with the article authors managing to achieve 30% of the raw link bandwidth (75kbps of a theoretical 250kbps maximum), with the underlying hardware being fingered as a possible cause for performance issues.
    • Assuming we could manage the same percentage; that would equate to ~360bps on 1200-baud networks, or 2.88kbps on 9600-baud networks.
  • With up to 15% packet loss, TCP and CoAP (its nearest contender) can perform about the same in terms of reliability.
  • A significant factor in effective data rate is CSMA/CA. aioax25 effectively does CSMA/CA too.

Its interesting to note they didn’t try to do anything special with the TCP headers (e.g. Van Jacobson compression). I’ll have to have a look at TCP and see just how much overhead there is in a typical segment, and whether the roughly double MTU of AX.25 will help or not: the article recommends using MSS of approximately 3× the link MTU for “fair” conditions (so ~384 bytes), and 5× in “good” conditions (~640 bytes).

It’s worth noting a 256-byte AX.25 frame takes ~2 seconds to transmit on a 1200-baud link. You really don’t want to make that a habit! So smaller transmissions using UDP-based protocols may still be worthwhile in our application.

Sep 162021
 

So, one evening I was having difficulty sleeping, so like some people count sheep, turned to a different problem…6LoWPAN relies on all nodes sharing a common “context”. This is used as a short-hand to “compress” the rather lengthy IPv6 addresses for allowing two nodes to communicate with one another by substituting particular IPv6 address subnets with a “context number” which can be represented in 4 bits.

Fundamentally, this identifier is a stand-in for the subnet address. This was a sticking-point with earlier thoughts on 6LoWHAM: how do we agree on what the context should be? My thought was, each network should be assigned a 3-bit network ID. Why 3-bit? Well, this means we can reserve some context IDs for other uses. We use SCI/DCI values 0-7 and leave 8-15 reserved; I’ll think of a use for the other half of the contexts.

The node “group” also share a SSID; the “group” SSID. This is a SSID that receives all multicast traffic for the nodes on the immediate network. This might be just a generic MCAST-n SSID, where n is the network ID; or it could be a call-sign for a local network coordinator, e.g. I might decide my network will use VK4MSL-0 for my group SSID (network 0). Probably nodes that are listening on a custom SSID should still listen for MCAST-n traffic, in case a node is attempting to join without knowing the group SSID.

AX.25 allows for 16 SSIDs per call-sign, so what about the other 8? Well, if we have a convention that we reserve SSIDs 0-7 for groups; that leaves 8-15 for stations. This can be adjusted for local requirements where needed, and would not be enforced by the protocol.

Joining a network

How does a new joining node “discover” this network? Firstly, the first node in an area is responsible for “forming” the network — a node which “forms” a network must be manually programmed with the local subnet, group SSID and other details. Ensuring all nodes with “formation” capability for a given network is beyond the scope of 6LoWHAM.

When a node joins; at first it only knows how to talk to immediate nodes. It can use MCAST-n to talk to immediate neighbours using the fe80::/64 subnet. Anyone in earshot can potentially reply. Nodes simply need to be listening for traffic on a reserved UDP port (maybe 61631; there’s an optimisation in 6LoWPAN for 61616-61631). The joining node can ask for the network context, maybe authenticate itself if needed (using asymmetric cryptography – digital signatures, no encryption).

The other nodes presumably already know the answer, but for all nodes to reply simultaneously, would lead to a pile-up. Nodes should wait a randomised delay, and if nothing is heard in that period, they then transmit what they know of the context for the given network ID.

The context information sent back should include:

  • Group SSID
  • Subnet prefix
  • (Optional) Authentication data:
    • Public key of the forming network (joining node will need to maintain its own “trust” database)
    • Hash of all earlier data items
    • Digital signature signed with included public key

Once a node knows the context for its chosen network, it is officially “joined”.

Routing to non-local endpoints

So, a node may wish to send a message to another node that’s not directly reachable. This is, after-all, the whole point of using a routing protocol atop AX.25. If we knew a route, we could encode it in the digipeater path, and use conventional AX.25 source routing. Nodes that know a reliable route are encouraged to do exactly that. But what if you don’t know your way around?

APRS uses WIDEN-n to solve this problem: it’s a dumb broadcast, but it achieves this aim beautifully. n just stands for the number of hops, and it gets decremented with each hop. Each digipeater inserts itself into the path as it sends the frame on. APRS specs normally call for everyone to broadcast all at once, pile-up be damned. FM capture effect might help here, but I’m not sure its a good policy. Simple, but in our case, we can do a little better.

We only need to broadcast far enough to reach a node that knows a route. We’ll use ROUTE-n to stand for a digipeater that is no more than n hops away from the station listed in the AX.25 destination field. n must be greater than 0 for a message to be relayed. AX.25 2.0 limits the number of digipeaters to 8 (and 2.2 to 2!), so naturally n cannot be greater than 8.

So we’ll have a two-tier approach.

Routing from a node that knows a viable route

If a node that receives a ROUTE-n destination message, knows it has a good route that is n or less hops away from the target; it picks a randomised delay (maybe 0-5 seconds range), and if no reply is heard from another node; it relays the message: the ROUTE-n is replaced by its own SSID, followed by the required digipeater path to reach the target node.

Routing from a node that does not know a viable route

In the case where a node receives this same ROUTE-n destination message, does not know a route, and hasn’t heard anyone else relay that same message; it should pick a randomised delay (5-10 second range), and if it hasn’t heard the message relayed via a specific path in that time, should do one of the following:

If n is greater than 1:

Substitute ROUTE-n in the digipeater path with its own SSID followed by ROUTE-(n-1) then transmit the message.

If n is 1 (or 0):

Substitute ROUTE-n with its own SSID (do not append ROUTE-0) then transmit the message.

Routing multicast traffic

Discovering multicast listeners

I’ll have to research MLD (RFC-3810 / RFC-4604), but that seems the sensible way forward from here.

Relaying multicast traffic

If a node knows of downstream nodes that ordinarily rely on it to contact the sender of a multicast message, and it knows the downstream nodes are subscribers to the destination multicast group, it should wait a randomised period, and forward the message on (appending its SSID in the digipeater path) to the downstream nodes.

Application thoughts

I think I have done some thoughts on what the applications for this system may be, but the other day I was looking around for “prior art” regarding one-to-many file transfer applications.

One such system that could be employed is UFTP. Yes, it mentions encryption, but that is an optional feature (and could be useful in emcomm situations). That would enable SSTV-style file sharing to all participants within the mesh network. Its ability to be proxied also lends itself to bridging to other networks like AMPRnet, D-Star packet, DMR and other systems.

Dec 312020
 

So, this last 2 years, I’ve been trying to keep multiple projects on the go, then others come along and pile their own projects on top. It kinda makes a mess of one’s free time, including for things like keeping on top of where things have been put.

COVID-19 has not helped here, as it’s meant I’ve lugged a lot of gear that belongs to my workplace, or belongs at my workplace, home, to use there. This all needs tracking to ensure nothing is lost.

Years ago, I threw together a crude parts catalogue system. It was built on Django, django-mptt and PostgreSQL, and basically abused the admin part of Django to manage electronic parts storage.

I later re-purposed some of its code for an estate database for my late grandmother: I just wrote a front-end so that members of the family could be given login accounts, and “claim” certain items of the estate. In that sense, the concept was extremely powerful.

The overarching principle of how both these systems worked is that you had “items” stored within “locations”. Locations were in a tree-structure (hence django-mptt) where a location could contain further “locations”… e.g. a root-level location might be a bed room, within that might be a couple of wardrobes and draws, and there might be containers within those.

You could nest locations as deeply as you liked. In my parts database, I didn’t consider rooms, but I’d have labelled boxes like “IC Parts 1”, “IC Parts 2”, these were Plano StowAway 932 boxes… which work okay, although I’ve since discovered you don’t leave the inner boxes exposed to UV light: the plastic becomes brittle and falls apart.

The inner boxes themselves were labelled by their position within the outer box (row, column), and each “bin” inside the inner box was labelled by row and column.

IC tubes themselves were also labelled, so if I had several sitting in a box, I could identify them and their location. Some were small enough to fit inside these boxes, others were stored in large storage tubs (I have two).

If I wanted to know where I had put some LM311 op-amps, I might look up the database and it’d tell me that there were 3 of them in IC Box 1/Row 2/Row 3/Column 5. If luck was on my side, I’d go to that box, pull out the inner box, open it up and find what I was looking for plugged into some anti-static foam or stashed in a small IC tube.

The parts themselves were fairly basic, just a description, a link to a data sheet, and some other particulars. I’d then have a separate table that recorded how many of each part was present, and in which location.

So from the locations perspective, it did everything I wanted, but parametric search was out of the question.

The place here looks like a tip now, so I really do need to get on top of what I have, so much so I’m telling people no more projects until I get on top of what I have now.

Other solutions exist. OpenERP had a warehouse inventory module, and I suspect Odoo continues this, but it’s a bit of a beast to try and figure out and it seems customisation has been significantly curtailed from the OpenERP days.

PartKeepr (if you can tolerate deliberate bad spelling) is another option. It seems to have very good parametric search of parts, but one downside is that it has a flat view of locations. There’s a proposal to enhance this, but it’s been languishing for 4 years now.

VRT used to have a semi-active track-and-trace business built on a tracking software package called P-Trak. P-Trak had some nice ideas (including a surprisingly modern message-passing back-end, even if it was a proprietary one), but is overkill of my needs, and it’s a pain to try and deploy, even if I was licensed to do so.

That doesn’t mean though I can’t borrow some ideas from it. It integrated barcode scanners as part of the user interface, something these open-source part inventory packages seem to overlook. I don’t have a dedicated barcode scanner, but I do have a phone with a camera, and a webcam on my netbook. Libraries exist to do this from a web browser, such as this one for QR codes.

My big problem right now is the need to do a stock-take to see what I’ve still got, and what I’ve added since then, along with where it has gone. I’ve got a lot of “random boxes” now which are unlabelled, and just have random items thrown in due to lack-of-time. It’s likely those items won’t remain there either. I need some frictionless way to record where things are getting put. It doesn’t matter exactly where something gets put, just so long as I record that information for use later. If something is going to move to a new location, I want to be able to record that with as little fuss as possible.

So the thinking is this:

  • Print labels for all my storage locations with UUIDs stored as barcodes
  • Enter those storage locations into a database using the UUIDs allocated
  • Expand (or re-write) my parts catalogue database to handle these UUIDs:
    • adding new locations (e.g. when a consignment comes in)
    • recording movement of containers between parent locations
    • sub-dividing locations (e.g. recording the content of a consignment)
    • (partial and complete) merging locations (e.g. picking parts from stock into a project-specific container)

The first step on this journey is to catalogue the storage containers I have now. Some are already entered into the old system, so I’ve grabbed a snapshot of that and can pick through it. Others are new boxes that have arrived since, and had additional things thrown in.

I looked at ways I could label the boxes. Previously that was a spirit pen hand-writing a label, but this does not scale. If I’m to do things efficiently, then a barcode seems the logical way to go since it uses what I already have.

Something new comes in? Put a barcode on the box, scan it, enter it into the system as a new location, then mark where that box is being stored by scanning the location barcode where I’ll put the box. Later, I’ll grab the box, open it up, and I might repeat the process with any IC tubes or packets of parts inside, marking them as being present inside that box.

Need something? Look up where it is, then “check it out” into my work area… now, ideally when I’m finished, it should go back there, but if I’m in a hurry, I just throw it in a box, somewhere, then record that I put it there. Next time I need it, I can look up where it is. Logical order isn’t needed up front, and can come later.

So, step 1 is to label all the locations. Since I’m doing this before the database is fully worked-out, I want to avoid ID clashes, I’m using UUIDs to label all the locations. Initially I thought of QR codes, but then realised some of the “locations” are DIP IC storage tubes, which do not permit large square labels. I did some experiments with Code-128, but found it was near impossible to reliably encode a UUID that way, my phone had difficulty recognising the entire barcode.

I returned to the idea of QR-codes, and found that my phone will scan a 10mm×10mm QR code printed on a page. That’s about the right height for the side of an IC tube. We had some inkjet labels kicking around, small 38.1×21.2mm labels arranged in a 5×11 grid (Avery J8651/L7651 layout). Could I make a script that generated a page full of QR codes?

Turns out, pylabels will do this. It is built on reportlab which amongst other things, embeds a barcode generator that supports various symbologies including QR codes. @hugohadfield had contributed a pull request which demonstrated using this tool with QR codes. I just had to tweak this for my needs.

# This file is part of pylabels, a Python library to create PDFs for printing
# labels.
# Copyright (C) 2012, 2013, 2014 Blair Bonnett
#
# pylabels is free software: you can redistribute it and/or modify it under the
# terms of the GNU General Public License as published by the Free Software
# Foundation, either version 3 of the License, or (at your option) any later
# version.
#
# pylabels is distributed in the hope that it will be useful, but WITHOUT ANY
# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
# A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along with
# pylabels.  If not, see <http://www.gnu.org/licenses/>.

import uuid

import labels
from reportlab.graphics.barcode import qr
from reportlab.lib.units import mm

# Create an A4 portrait (210mm x 297mm) sheets with 5 columns and 13 rows of
# labels. Each label is 38.1mm x 21.2mm with a 2mm rounded corner. The margins
# are automatically calculated.
specs = labels.Specification(210, 297, 5, 13, 38.1, 21.2, corner_radius=2,
        left_margin=6.7, right_margin=3, top_margin=10.7, bottom_margin=10.7)

def draw_label(label, width, height, obj):
    size = 12 * mm
    label.add(qr.QrCodeWidget(
            str(uuid.uuid4()),
            barHeight=height, barWidth=size, barBorder=2))

# Create the sheet.
sheet = labels.Sheet(specs, draw_label, border=True)

sheet.add_labels(range(1, 66))

# Save the file and we are done.
sheet.save('basic.pdf')
print("{0:d} label(s) output on {1:d} page(s).".format(sheet.label_count, sheet.page_count))

The alignment is slightly off, but not severely. I’ll fine tune it later. I’m already through about 30 of those labels. It’s enough to get me started.

For the larger J8165 2×4 sheets, the following specs work. (I can see this being a database table!)

# Specifications for Avery J8165 2×4 99.1×67.7mm
specs = labels.Specification(210, 297, 2, 4, 99.1, 67.7, corner_radius=3,
        left_margin=5.5, right_margin=4.5, top_margin=13.5, bottom_margin=12.5)

Later when I get the database ready (standing up a new VM to host the database and writing the code) I can enter this information in and get back on top of my inventory once again.

Jun 212020
 

So, for a decade now, I’ve been looking at a way of un-tethering myself from VoIP and radio applications. Headsets are great, but it does mean you’re chained to whatever it’s plugged into by your head.

Two solutions exist for this: get a headset with a wireless interface, or get a portable device to plug the headset into.

The devices I’d like to use are a combination of analogue and computer-based devices. Some have Bluetooth… I did buy the BU-1 module for my Yaesu VX-8DR years, and still have it installed in the FTM-350AR, but found it was largely a gimmick as it didn’t work with the majority of Bluetooth headsets on the market, and was buggy with the ones it did work with.

Years ago, I hit upon a solution using a wireless USB headset and a desktop PC connected to the radio. Then, the headset was one of the original Logitech wireless USB headsets, which I still have and still works… although it is in need of a re-build as the head-band is falling to pieces and the leatherette covering on the earpieces has long perished.

One bug bear I had with it though, is that the microphone would only do 16kHz sample rates. At the time when I did that set-up, I was running a dual-Pentium III workstation with a PCI sound-card which was fairly frequency-agile, so it could work to the limitations of the headset, however newer sound cards are generally locked to 48kHz, and maybe support 44.1kHz if you’re lucky.

I later bought a Logitech G930 headset, but found it had the same problem. It also was a bit “flat” sounding, given it is a “surround sound” headset, it compromises on the audio output. This, and the fact that JACK just would not talk to it any higher than 16kHz, meant that it was relegated to VoIP only. I’ve been using VoIP a lot thanks to China’s little gift, even doing phone patches using JACK, Twinkle and Zoom for people who don’t have Internet access.

I looked into options. One option I considered was the AstroGaming A50 Gen 4. These are a pricey option, but one nice feature is they do have an analogue interface, so in theory, it could wire direct to my old radios. That said, I couldn’t find any documentation on the sample rates supported, so I asked.

… crickets chirping …

After hearing nothing, I decided they really didn’t want my business, and there were things that made me uneasy about sinking $500 on that set. In the end I stumbled on these.

Reading the specs, the microphone frequency range was 30Hz to 20kHz… that meant for them to not be falsely advertising, they had to be sampling at at least 40kHz. 44.1 or 48kHz was fine by me. These are pricey too, but not as bad as the A50s, retailing at AU$340.

I took the plunge:

RC=0 stuartl@rikishi ~ $ cat /proc/asound/card1/stream0 
Unknown manufacturer ATH-G1WL at usb-0000:00:14.0-4, full speed : USB Audio

Playback:
  Status: Running
    Interface = 1
    Altset = 1
    Packet Size = 200
    Momentary freq = 48000 Hz (0x30.0000)
  Interface 1
    Altset 1
    Format: S16_LE
    Channels: 2
    Endpoint: 1 OUT (ADAPTIVE)
    Rates: 48000
    Bits: 16

Capture:
  Status: Running
    Interface = 2
    Altset = 1
    Packet Size = 100
    Momentary freq = 48000 Hz (0x30.0000)
  Interface 2
    Altset 1
    Format: S16_LE
    Channels: 1
    Endpoint: 2 IN (ASYNC)
    Rates: 48000
    Bits: 16

Okay, so these are locked to 48kHz… that works for me. Oddly enough, I used to get a lot of XRUNS in jackd with the Logitech sets… I don’t get any using this set, even with triple the sample rate. This set is very well behaved with jackd. Allegedly the headset is “broadcast”-grade.

Well, not sure about that, but it performs a lot better than the Logitech ones did. AudioTechnica have plenty of skin in the audio game, have done for decades (the cartridge on my father’s turntable… an old Rotel from the late 70s) was manufactured by them. So it’s possible, but it’s also possible it’s marketing BS.

The audio quality is decent though. I’ve only used it for VoIP applications so far, people have noticed the microphone is more “bassy”.

The big difference from my end, is notifications from Slack and my music listening. Previously, since I was locked to 16kHz on the headset, it was no good listening to music there since I got basically AM radio quality. So I used the on-board sound card for that with PulseAudio. Slack though (which my workplace uses), refuses to send notification sounds there, so I missed hearing notifications as I didn’t have the headset on.

Now, I have the music going through the headset with the notification sounds. I miss nothing. PulseAudio also had a habit of “glitching”, momentary drop-outs in audio. This is gone when using JACK.

My latency is just 64msec. I can’t quite ditch PulseAudio, as without it, tools like Zoom won’t see the JACK virtual source/sink… this seems to be a limitation of the QtMultimedia back-end being used: it doesn’t list virtual sound interfaces and doesn’t let people put in arbitrary ALSA device strings (QAudioDeviceInfo just provides a list).

At the moment though, I don’t need to route between VoIP applications (other than Twinkle, which talks to ALSA direct), so I can live with it staying there for now.

Jun 012020
 

Brisbane Area WICEN Group (Inc) lately has been caught up in this whole COVID-19 situation, unable to meet face-to-face for business meetings. Like a lot of groups, we’ve had to turn to doing things online.

Initially, Cisco WebEx was trialled, however this had significant compatibility issues, most notably, under Linux — it just straight plain didn’t work. Zoom however, has proven fairly simple to operate and seems to work, so we’ve been using that for a number of “social” meetings and at least one business meeting so far.

A challenge we have though, is that one of our members does not have a computer or smart-phone. Mobile telephony is unreliable in his area (Kelvin Grove), and so yee olde PSTN is the most reliable service. For him to attend meetings, we need some way of patching that PSTN line into the meeting.

The first step is to get something you can patch to. In my case, it was a soft-phone and a SIP VoIP service. I used Twinkle to provide that link. You could also use others like baresip, Linphone or anything else of your choosing. This connects to your sound card at one end, and a Voice Service Provider; in my case it’s my Asterisk server through Internode NodePhone.

The problem is though, while you can certainly make a call outbound whilst in a conference, the person on the phone won’t be able to hear the conference, nor will the conference attendees be able to hear the person on the phone.

Enter JACK

JACK is a audio routing framework for Unix-like operating systems that allows for audio to be routed between applications. It is geared towards multimedia production and professional audio, but since there’s a plug-in in the ALSA framework, it is very handy for linking audio between applications that would otherwise be incompatible.

For this to work, one application has to work either directly with JACK, or via the ALSA plug-in. Many support, and will use, an alternate framework called PulseAudio. Conference applications like Zoom and Jitsi almost universally rely on this as their sound card interface on Linux.

PulseAudio unfortunately is not able to route audio with the same flexibility, but it can route audio to JACK. In particular, JACKv2 and its jackdbus is the path of least resistance. Once JACK starts, PulseAudio detects its presence, and loads a module that connects PulseAudio as a client of JACK.

A limitation with this is PulseAudio will pre-mix all audio streams it receives from its clients into one single monolithic (stereo) feed before presenting that to JACK. I haven’t figured out a work-around for this, but thankfully for this use case, it doesn’t matter. For our purposes, we have just one PulseAudio application: Zoom (or Jitsi), and so long as we keep it that way, things will work.

Software tools

  • jack2: The audio routing daemon.
  • qjackctl: This is a front-end for controlling JACK. It is optional, but if you’re not familiar with JACK, it’s the path of least resistance. It allows you to configure, start and stop JACK, and to control patch-bay configuration.
  • SIP Client, in my case, Twinkle.
  • ALSA JACK Plug-in, part of alsa-plugins.
  • PulseAudio JACK plug-in, part of PulseAudio.

Setting up the JACK ALSA plug-in

To expose JACK to ALSA applications, you’ll need to configure your ${HOME}/.asoundrc file. Now, if your SIP client happens to support JACK natively, you can skip this step, just set it up to talk to JACK and you’re set.

Otherwise, have a look at guides such as this one from the ArchLinux team.

I have the following in my .asoundrc:

pcm.!default {
        type plug
        slave { pcm "jack" }
}

pcm.jack {
        type jack
        playback_ports {
                0 system:playback_1
                1 system:playback_2
        }
        capture_ports {
                0 system:capture_1
                1 system:capture_1
        }
}

The first part sets my default ALSA device to jack, then the second block defines what jack is. You could possibly skip the first block, in which case your SIP client will need to be told to use jack (or maybe plug:jack) as the ALSA audio device for input/output.

Configuring qjackctl

At this point, to test this we need a JACK audio server running, so start qjackctl. You’ll see a window like this:

qjackctl in operation

This shows it actually running, most likely for you this will not be the case. Over on the right you’ll see Setup… — click that, and you’ll get something like this:

Parameters screen

The first tab is the parameters screen. Here, you’ll want to direct this at your audio device that your speakers/microphone are connected to.

The sample rate may be limited by your audio device. In my experience, JACK hates devices that can’t do the same sample rate for input and output.

My audio device is a Logitech G930 wireless USB headset, and it definitely has this limitation: it can play audio right up to 48kHz, but will only do a meagre 16kHz on capture. JACK thus limits me to both directions running 16kHz. If your device can do 48kHz, that’d be better if you intend to use it for tasks other than audio conferencing. (If your device is also wireless, I’d be interested in knowing where you got it!)

JACK literature seems to recommend 3 periods/buffer for USB devices. The rest is a matter of experiment. 1024 samples/period seems to work fine on my hardware most of the time. Your mileage may vary. Good setups may get away with less, which will decrease latency (mine is 192ms… good enough for me).

The other tab has more settings:

Advanced settings

The things I’ve changed here are:

  • Force 16-bit: since my audio device cannot do anything but 16-bit linear PCM, I force 16-bit mode (rather than the default of 32-bit mode)
  • Channels I/O: output is stereo but input is mono, so I set 1 channel in, two channels out.

Once all is set, Apply then OK.

Now, on qjackctl itself, click the “Start” button. It should report that it has started. You don’t need to click any play buttons to make it work from here. You may have noticed that PulseAudio has detected the JACK server and will now connect to it. If click “Graph”, you’ll see something like this:

qjackctl‘s Graph window

This is the thing you’ll use in qjackctl the most. Here, you can see the “system” boxes represent your audio device, and “PulseAudio JACK Sink”/”PulseAudio JACK Source” represent everything that’s connected to PulseAudio.

You should be able to play sound in PulseAudio, and direct applications there to use the JACK virtual sound card. pavucontrol (normally shipped with PulseAudio) may be handy for moving things onto the JACK virtual device.

Configuring your telephony client

I’ll use Twinkle as the example here. In the preferences, look for a section called Audio. You should see this:

Twinkle audio settings

Here, I’ve set my ringing device to pulse to have that ring PulseAudio. This allows me to direct the audio to my laptop’s on-board sound card so I can hear the phone ring without the headset on.

Since jack was made my default device, I can leave the others as “Default Device”. Otherwise, you’d specify jack or plug:jack as the audio device. This should be set on both Speaker and Microphone settings.

Click OK once you’re done.

Configuring Zoom

I’ll use Zoom here, but the process is similar for Jitsi. In the settings, look for the Audio section.

Zoom audio settings

Set both Speaker and Microphone to JACK (sink and source respectively). Use the “Test Speaker” function to ensure it’s all working.

The patch up

Now, it doesn’t matter whether you call first, then join the meeting, or vice versa. You can even have the PSTN caller call you. The thing is, you want to establish a link to both your PSTN caller and your conference.

The assumption is that you now have a session active in both programs, you’re hearing both the PSTN caller and the conference in your headset, when you speak, both groups hear you. To let them hear each other, do this:

Go to qjackctl‘s patch bay. You’ll see PulseAudio is there, but you’ll also see the instance of the ALSA plug-in connected to JACK. That’s your telephony client. Both will be connected to the system boxes. You need to draw new links between those two new boxes, and the PulseAudio boxes like this:

qjackctl patching Twinkle to Zoom

Here, Zoom is represented by the PulseAudio boxes (since it is using PulseAudio to talk to JACK), and Twinkle is represented by the boxes named alsa-jack… (tip: the number is the PID of the ALSA application if you’re not sure).

Once you draw the connections, the parties should be able to hear each-other. You’ll need to monitor this dialogue from time to time: if either of PulseAudio or the phone client disconnect from JACK momentarily, the connections will need to be re-made. Twinkle will do this if you do a three-way conference, then one person hangs up.

Anyway, that’s the basics covered. There’s more that can be done, for example, recording the audio, or piping audio from something else (e.g. a media player) is just a case of directing it either at JACK directly or via the ALSA plug-in, and drawing connections where you need them.

May 262020
 

Lately, I’ve been socially distancing a home and so there’s been a few projects that have been considered that otherwise wouldn’t ordinarily get a look in on a count of lack-of-time.

One of these has been setting up a Raspberry Pi with DRAWS board for use on the bicycle as a radio interface. The DRAWS interface is basically a sound card, RTC, GPS and UART interface for radio interfacing applications. It is built around the TI TMS320AIC3204.

Right now, I’m still waiting for the case to put it in, even though the PCB itself arrived months ago. Consequently it has not seen action on the bike yet. It has gotten some use though at home, primarily as an OpenThread border router for 3 WideSky hubs.

My original idea was to interface it to Mumble, a VoIP server for in-game chat. The idea being that, on events like the Yarraman to Wulkuraka bike ride, I’d fire up the phone, connect it to an AP run by the Raspberry Pi on the bike, and plug my headset into the phone:144/430MHz→2.4GHz cross-band.

That’s still on the cards, but another use case came up: digital. It’d be real nice to interface this over WiFi to a stronger machine for digital modes. Sound card over network sharing. For this, Mumble would not do, I need a lossless audio transport.

Audio streaming options

For audio streaming, I know of 3 options:

  • PulseAudio network streaming
  • netjack
  • trx

PulseAudio I’ve found can be hit-and-miss on the Raspberry Pi, and IMO, is asking for trouble with digital modes. PulseAudio works fine for audio (speech, music, etc). It will make assumptions though about the nature of that audio. The problem is we’re not dealing with “audio” as such, we’re dealing with modem tones. Human ears cannot detect phase easily, data modems can and regularly do. So PA is likely to do things like re-sample the audio to synchronise the two stations, possibly use lossy codecs like OPUS or CELT, and make other changes which will mess with the signal in unpredictable ways.

netjack is another possibility, but like PulseAudio, is geared towards low-latency audio streaming. From what I’ve read, later versions use OPUS, which is a no-no for digital modes. Within a workstation, JACK sounds like a close fit, because although it is geared to audio, its use in professional audio means it’s less likely to make decisions that would incur loss, but it is a finicky beast to get working at times, so it’s a question mark there.

trx was a third option. It uses RTP to stream audio over a network, and just aims to do just that one thing. Digging into the code, present versions use OPUS, older versions use CELT. The use of RTP seemed promising though, it actually uses oRTP from the Linphone project, and last weekend I had a fiddle to see if I could swap out OPUS for linear PCM. oRTP is not that well documented, and I came away frustrated, wondering why the receiver was ignoring the messages being sent by the sender.

It’s worth noting that trx probably isn’t a good example of a streaming application using oRTP. It advertises the stream as G711u, but then sends OPUS data. What it should be doing is sending it as a dynamic content type (e.g. 96), and if this were a SIP session, there’d be a RTPMAP sent via Session Description Protocol to say content type 96 was OPUS.

I looked around for other RTP libraries to see if there was something “simpler” or better documented. I drew a blank. I then had a look at the RTP/RTCP specs themselves published by the IETF. I came to the conclusion that RTP was trying to solve a much more complicated use case than mine. My audio stream won’t traverse anything more sophisticated than a WiFi AP or an Ethernet switch. There’s potential for packet loss due to interference or weak signal propagation between WiFi nodes, but latency is likely to remain pretty consistent and out-of-order handling should be almost a non-issue.

Another gripe I had with RTP is its almost non-consideration of linear PCM. PCMA and PCMU exist, 16-bit linear PCM at 44.1kHz sampling exists (woohoo, CD quality), but how about 48kHz? Nope. You have to use SDP for that.

Custom protocol ideas

With this in mind, my own custom protocol looks like the simplest path forward. Some simple systems that used by GQRX just encapsulate raw audio in UDP messages, fire them at some destination and hope for the best. Some people use TCP, with reasonable results.

My concern with TCP is that if packets get dropped, it’ll try re-sending them, increasing latency and never quite catching up. Using UDP side-steps this, if a packet is lost, it is forgotten about, so things will break up, then recover. Probably a better strategy for what I’m after.

I also want some flexibility in audio streams, it’d be nice to be able to switch sample rates, bit depths, channels, etc. RTP gets close with its L16/44100/2 format (the Philips Red-book standard audio format). In some cases, 16kHz would be fine, or even 8kHz 16-bit linear PCM. 44.1k works, but is wasteful. So a header is needed on packets to at least describe what format is being sent. Since we’re adding a header, we might as well set aside a few bytes for a timestamp like RTP so we can maintain synchronisation.

So with that, we wind up with these fields:

  • Timestamp
  • Sample rate
  • Number of channels
  • Sample format

Timestamp

The timestamp field in RTP is basically measured in ticks of some clock of known frequency, e.g. for PCMU it is a 8kHz clock. It starts with some value, then increments up monotonically. Simple enough concept. If we make this frequency the sample rate of the audio stream, I think that will be good enough.

At 48kHz 16-bit stereo; data will be streaming at 192kbps. We can tolerate wrap-around, and at this data rate, we’d see a 16-bit counter overflow every ~341ms, which whilst not unworkable, is getting tight. Better to use a 32-bit counter for this, which would extend that overflow to over 6 hours.

Sample rate encoding

We can either support an integer field, or we can encode the rate somehow. An integer field would need a range up to 768k to support every rate ALSA supports. That’s another 32-bit integer. Or, we can be a bit clever: nearly every sample rate in common use is a harmonic of 8kHz or 11.025kHz, so we devise a scheme consisting of a “base” rate and multiplier. 48kHz? That’s 8kHz×6. 44.1kHz? That’s 11.025kHz×4.

If we restrict ourselves to those two base rates, we can support standard rates from 8kHz through to 1.4MHz by allocating a single bit to select 8kHz/11.025kHz and 7 bits for the multiplier: the selected sample rate is the base rate multiplied by the multipler incremented by one. We’re unlikely to use every single 8kHz step though. Wikipedia lists some common rates and as we go up, the steps get bigger, so let’s borrow 3 multiplier bits for a left-shift amount.

7 6 5 4 3 2 1 0
B S S S M M M M

B = Base rate: (0) 8000 Hz, (1) 11025 Hz
S = Shift amount
M = Multiplier - 1

Rate = (Base << S) * (M + 1)

Examples:
  00000000b (0x00): 8kHz
  00010000b (0x10): 16kHz
  10100000b (0xa0): 44.1kHz
  00100000b (0x20): 48kHz
  01010010b (0x52): 768kHz (ALSA limit)
  11111111b (0xff): 22.5792MHz (yes, insane)

Other settings

I primarily want to consider linear PCM types. Technically that includes unsigned PCM, but since that’s losslessly transcodable to signed PCM, we could ignore it. So we could just encode the number of bytes needed for a single channel sample, minus one. Thus 0 would be 8-bits; 1 would be 16-bits; 2 would be 32-bits and 3 would be 64-bits. That needs just two bits. For future-proofing, I’d probably earmark two extra bits; reserved for now, but might be used to indicate “compressed” (and possibly lossy) formats.

The remaining 4 bits could specify a number of channels, again minus 1 (mono would be 0, stereo 1, etc up to 16).

Packet type

For the sake of alignment, I might include a 16-bit identifier field so the packet can be recognised as being this custom audio format, and to allow multiplexing of in-band control messages, but I think the concept is there.

May 032020
 

This afternoon, I was pondering about how I might do text-to-speech, but still have the result sound somewhat natural. For what use case? Well, two that come to mind…

The first being for doing “strapper call” announcements at horse endurance rides. A horse endurance ride is where competitors and their horses traverse a long (sometimes as long as 320km) trail through a wilderness area. Usually these rides (particularly the long ones) are broken up into separate stages or “legs”.

Upon arrival back at base, the competitor has a limited amount of time to get the horse’s vital signs into acceptable ranges before they must present to the vet. If the horse has a too-high temperature, or their horse’s heart rate is too high, they are “vetted out”.

When the competitor reaches the final check-point, ideally you want to let that competitor’s support team know they’re on their way back to base so they can be there to meet the competitor and begin their work with the horse.

Historically, this was done over a PA system, however this isn’t always possible for the people at base to achieve. So having an automated mechanism to do this would be great. In recent times, Brisbane WICEN has been developing a public display that people can see real-time results on, and this also doubles as a strapper-call display.

Getting the information to that display is something of a work-in-progress, but it’s recognised that if you miss the message popping up on the display, there’s no repeat. A better solution would be to “read out” the message. Then you don’t have to be watching the screen, you can go about your business. This could be done over a PA system, or at one location there’s an extensive WiFi network there, so streaming via Icecast is possible.

But how do you get the text into speech?

Enter flite

flite is a minimalist speech synthesizer from the Festival project. Out of the box it includes 3 voices, mostly male American voices. (I think the rms one might be Richard M. Stallman, but I could be wrong on that!) There’s a couple of demos there that can be run direct from the command line.

So, for the sake of argument, let’s try something simple, I’ll use the slt voice (a US female voice) and just get the program to read out what might otherwise be read out during a horse ride event:

$ flite_cmu_us_slt -t 'strapper call for the 160 kilometer event competitor numbers 123 and 234' slt-strapper-nopunctuation-digits.wav
slt-strapper-nopunctuation-digits.ogg

Not bad, but not that great either. Specifically, the speech is probably a little quick. The question is, how do you control this? Turns out there’s a bit of hidden functionality.

There is an option marked -ssml which tells flite to interpret the text as SSML. However, if you try it, you may find it does little to improve matters, I don’t think flite actually implements much of it.

Things are improved if we spell everything out. So if you instead replace the digits with words, you do get a better result:

$ flite_cmu_us_slt -t 'strapper call for the one hundred and sixty kilometer event competitor number one two three and two three four' slt-strapper-nopunctuation-words.wav
slt-strapper-nopunctuation-words.ogg

Definitely better. It could use some pauses. Now, we don’t have very fine-grained control over those pauses, but we can introduce some punctuation to have some control nonetheless.

$ flite_cmu_us_slt -t 'strapper call.  for the one hundred and sixty kilometer event.  competitor number one two three and two three four' slt-strapper-punctuation.wav
slt-strapper-punctuation.ogg

Much better. Of course it still sounds somewhat robotic though. I’m not sure how to adjust the cadence on the whole, but presumably we can just feed the text in piece-wise, render those to individual .wav files, then stitch them together with the pauses we want.

How about other changes though? If you look at flite --help, there is feature options which can control the synthesis. There’s no real documentation on what these do, what I’ve found so far was found by grep-ing through the flite source code. Tip: do a grep for feat_set_, and you’ll see a whole heap.

Controlling pitch

There’s two parameters for the pitch… int_f0_target_mean controls the “centre” frequency of the speech in Hertz, and int_f0_target_stddev controls the deviation. For the slt voice, …mean seems to sit around 160Hz and the deviation is about 20Hz.

So we can say, set the frequency to 90Hz and get a lower tone:

$ flite_cmu_us_slt --setf int_f0_target_mean=90 -t 'strapper call' slt-strapper-mean-90.wav
slt-strapper-mean-90.ogg

… or 200Hz for a higher one:

$ flite_cmu_us_slt --setf int_f0_target_mean=200 -t 'strapper call' slt-strapper-mean-200.wav
slt-strapper-mean-200.ogg

… or we can change the variance:

$ flite_cmu_us_slt --setf int_f0_target_stddev=0.0 -t 'strapper call' slt-strapper-stddev-0.wav
$ flite_cmu_us_slt --setf int_f0_target_stddev=70.0 -t 'strapper call' slt-strapper-stddev-70.wav
slt-strapper-stddev-0.ogg
slt-strapper-stddev-70.ogg

We can’t change these values during a block of speech, but presumably we can cut up the text we want to render, render each piece at the frequency/variance we want, then stitch those together.

Controlling rate

So I mentioned we can control the rate, somewhat coarsely using usual punctuation devices. We can also change the rate overall by setting duration_stretch. This basically is a control of how “long” we want to stretch out the pronunciation of words.

$ flite_cmu_us_slt --setf duration_stretch=0.5 -t 'strapper call' slt-strapper-stretch-05.wav
$ flite_cmu_us_slt --setf duration_stretch=0.7 -t 'strapper call' slt-strapper-stretch-07.wav
$ flite_cmu_us_slt --setf duration_stretch=1.0 -t 'strapper call' slt-strapper-stretch-10.wav
$ flite_cmu_us_slt --setf duration_stretch=1.3 -t 'strapper call' slt-strapper-stretch-13.wav
$ flite_cmu_us_slt --setf duration_stretch=2.0 -t 'strapper call' slt-strapper-stretch-20.wav
slt-strapper-stretch-05.ogg
slt-strapper-stretch-07.ogg
slt-strapper-stretch-10.ogg
slt-strapper-stretch-13.ogg
slt-strapper-stretch-20.ogg

Putting it together

So it looks as if all the pieces are there, we just need to stitch them together.

RC=0 stuartl@rikishi /tmp $ flite_cmu_us_slt --setf duration_stretch=1.2 --setf int_f0_target_stddev=50.0 --setf int_f0_target_mean=180.0 -t 'strapper call' slt-strapper-call.wav
RC=0 stuartl@rikishi /tmp $ flite_cmu_us_slt --setf duration_stretch=1.1 --setf int_f0_target_stddev=30.0 --setf int_f0_target_mean=180.0 -t 'for the, one hundred, and sixty kilometer event' slt-160km-event.wav
RC=0 stuartl@rikishi /tmp $ flite_cmu_us_slt --setf duration_stretch=1.4 --setf int_f0_target_stddev=40.0 --setf int_f0_target_mean=180.0 -t 'competitors, one two three, and, two three four' slt-competitors.wav
Above files stitched together in Audacity

Here, I manually imported all three files into Audacity, arranged them, then exported the result, but there’s no reason why the same could not be achieved by a program, I’m just inserting pauses after all.

There are tools for manipulating RIFF waveform files in most languages, and generating silence is not rocket science. The voice itself could be fine-tuned, but that’s simply a matter of tweaking settings. Generating the text is basically a look-up table feeding into snprintf (or its equivalent in your programming language of choice).

It’d be nice to implement a wrapper around flite that took the full SSML or JSML text and rendered it out as speech, but this gets pretty close without writing much code at all. Definitely worth continuing with.

Feb 082020
 

So, lately I’ve been helping out with running the base at a few horse rides up at Imbil. This involves amongst other things, running three radios, a base computer, laptops, and other paraphernalia.

The whole kit needs to run off an unregulated 12V DC supply, consisting of two 105Ah AGM batteries which have solar and mains back-up. The outlet for this is a Anderson SB50 connector, fairly standard for caravans.

Catch being, this is temporary. So no permanent linkages, we need to be able to disconnect and pack everything away when not in use. One bug bear is having enough DC outlets for everything. Especially of the 30A Anderson Power Pole variety, since most of our radios use those.

The monitor for the base computer uses a cigarette lighter adapter, while the base computer itself (an Intel NUC) has a cable terminated with a 30A power pole. There’s also a WiFi router which has a micro-USB power input — thankfully the monitor’s adaptor embeds a USB power outlet, so we can run it off that.

We need two amateur radios (one for voice comms, one for packet), and a CB set for communications with the ride organisers (who are otherwise not licensed to use amateur bands). We may also see a move to commercial frequencies, so that’s potentially another radio or two.

I started thinking about ways we could make a modular power distribution system.

The thought was, if we made PDU boxes where the inlet and outlet were nice big SB50s, configured so that they would mate when the boxes were joined up, we could have a flexible PDU system where we just clip it together like Lego bricks.

This is a work in progress, but I figured I’d post what I have so far.

Power outlets on the distribution box, yet to be wired up.

I still need to do the internal wiring, but above is basically what I was thinking of. There’s room for up to 6 consumers via the 30A power pole connections along one side, each with its own 20A breaker. (The connectors are rated at 45A.)

Originally I was aiming for 6 cigarette lighter sockets, but after receiving the parts, I realised that wouldn’t fit, but two seems to work okay, and we can always make a second box and slap that on the end. Each has a 15A breaker.

Protecting the upstream power source is a 50A breaker. So total of the down-stream port + all outlets on the box itself may not exceed 50A.

The upstream and downstream ports are positioned so that boxes can just be butted up against each-other for the connectors to mate. I’ve got to fine-tune the positioning a bit, and right now the connectors are also on an angle, but this hopefully shows the concept…

The idea for maintenance is the box will fold out. Not sure if the connection between all the outputs on the lid will be via a bus bar or using individual cables going to the tie point inside the box just yet. Those 30A outlets are just begging for a single cable to visit each bus-bar style. I also have to figure out how I’ll connect to the cigarette lighter sockets too.

Hopefully I’ll get this done before the next ride event.

Nov 242019
 

The past few months have been quiet for this project, largely because Brisbane WICEN has had my spare time soaked up with an RFID system they are developing for tracking horse rides through the Imbil State Forest for the Stirling’s Crossing Endurance Club.

Ultimately, when we have some decent successes, I’ll probably be reporting more on this on WICEN’s website. Suffice to say, it’s very much a work-in-progress, but it has proved a valuable testing ground for aioax25. The messaging system being used is basically just plain APRS messaging, with digipeating thrown in as well.

Since I’ve had a moment to breathe, I’ve started filling out the features in aioax25, starting with connected-mode operation. The thinking is this might be useful for sending larger payloads. APRS messages are limited to a 63 character message size with only a subset of ASCII being permitted.

Thankfully that subset includes all of the Base64 character set, so I’m able to do things like tunnel NTP packets and CBOR blobs through it, so that stations out in the field can pull down configuration settings and the current time.

As for the RFID EPCs, we’re sending those in the canonical hexadecimal format, which works, but the EPC occupies most of the payload size. At 1200 bits per second, this does slow things down quite a bit. We get a slight improvement if we encode the EPCs as Base64. We’d get a 200% efficiency increase if we could send it as binary bytes instead. Sending a CBOR blob that way would be very efficient.

The thinking is that the nodes find each-other via APRS, then once they’ve discovered a path, they can switch to connected mode to send bulk transfers back to base.

Thus, I’ve been digging into connected mode operation. AX.25 2.2 is not the most well-written spec I’ve read. In fact, it is down-right confusing in places. It mixes up little-endian and big-endian fields, certain bits have different meanings in different contexts, and it uses concepts which are “foreign” to someone like myself who’s used to TCP/IP.

Right now I’m making progress, there’s an untested implementation in the connected-mode branch. I’m writing unit test cases based on what I understand the behaviour to be, but somehow I think this is going to need trials with some actual AX.25 implementations such as Direwolf, the Linux kernel stack, G8BPQ stack and the implementation on my Kantronics KPC3 and my Kenwood TH-D72A.

Some things I’m trying to get an answer to:

  • In the address fields at the start of a frame, you have what I’ve been calling the ch bit.
    On digipeater addresses, it’s called H and it is used to indicate that a frame has been digipeated by that digipeater.
    When seen in the source or destination addresses, it is called C, and it describes whether the frame is a “command” frame, or a “response” frame.

    An AX.25 2.x “command” frame sets the destination address’s C bit to 1, and the source address’s C bit to 0, whilst a “response” frame in AX.25 does the opposite (destination C is 0, source C is 1).

    In prior AX.25 versions, they were set identically. Question is, which is which? Is a frame a “command” when both bits are set to 1s and a “response” if both C bits are 0s? (Thankfully, I think my chances of meeting an AX.25 1.x station are very small!)
  • In the Control field, there’s a bit marked P/F (for Poll/Final), and I’ve called it pf in my code. Sometimes this field gets called “Poll”, sometimes it gets called “Final”. It’s not clear on what occasions it gets called “Poll” and when it is called “Final”. It isn’t as simple as assuming that pf=1 means poll and pf=0 means final. Which is which? Who knows?
  • AX.25 2.0 allowed up to 8 digipeaters, but AX.25 2.2 limits it to 2. AX.25 2.2 is supposed to be backward compatible, so what happens when it receives a frame from a digipeater that is more than 2 digipeater hops away? (I’m basically pretending the limitation doesn’t exist right now, so aioax25 will handle 8 digipeaters in AX.25 2.2 mode)
  • The table of PID values (figure 3.2 in the AX.25 2.2 spec) mentions several protocols, including “Link Quality Protocol”. What is that, and where is the spec for it?
  • Is there an “experimental” PID that can be used that says “this is a L3 protocol that is a work in progress” so I don’t blow up someone’s station with traffic they can’t understand? The spec says contact the ARRL, which I have done, we’ll see where that gets me.
  • What do APRS stations do with a PID they don’t recognise? (Hopefully ignore it!)

Right at this point, the Direwolf sources have proven quite handy. Already I am now aware of a potential gotcha with the AX.25 2.0 implementation on the Kantronics KPC3+ and the Kenwood TM-D710.

I suspect my hand-held (Kenwood TH-D72A) might do the same thing as the TM-D710, but given JVC-Kenwood have pulled out of the Australian market, I’m more like to just say F### you Kenwood and ignore the problem since these can do KISS mode, bypassing the buggy AX.25 implementation on a potentially resource-constrained device.

NET/ROM is going to be a whole different ball-game, and yes, that’s on the road map. Long-term, I’d like 6LoWHAM stations to be able to co-exist peacefully with other stations. Much like you can connect to a NET/ROM node using traditional AX.25, then issue connect commands to jump from there to any AX.25 or NET/ROM station; I intend to offer the same “feature” on a 6LoWHAM station — you’ll be able to spin up a service that accepts AX.25 and NET/ROM connections, and allows you to hit any AX.25, NET/ROM or 6LoWHAM station.

I might park the project for a bit, and get back onto the WICEN stuff, as what we have in aioax25 is doing okay, and there’s lots of work to be done on the base software that’ll keep me busy right up to when the horse rides re-start in 2020.

Oct 252019
 

In my last post, I mentioned that I was playing around with SDR a bit more, having bought a couple. Now, my experiments to date were low-hanging fruit: use some off-the-shelf software to receive an existing signal.

One of those off-the-shelf packages was CubicSDR, which gives me AM/FM/SSB/WFM reception, the other is qt-dab which receives DAB+. The long-term goal though is to be able to use GNURadio to make my own tools. Notably, I’d like to set up a Raspberry Pi 3 with a DRAWS board and a RTL-SDR, to control the FT-857D and implement dual-watch for emergency comms exercises, or use the RTL-SDR for DAB+ reception.

In the latter case, while I could use qt-dab, it’ll be rather cumbersome in that use case. So I’ll probably implement my own tool atop GNURadio that can talk to a small microcontroller to drive a keypad and display. As a first step, I thought I’d try a DIY FM stereo receiver. This is a mildly complex receiver that builds on what I learned at university many moons ago.

FM Stereo is actually surprisingly complex. Not DAB+ levels of complex, but still complex. The system is designed to be backward-compatible with mono FM sets. FM itself actually does not provide stereo on its own — a stereo FM station operates by multiplexing a “mono” signal, a “differential” signal, and a pilot signal. The pilot is just a plain 19kHz carrier. Both left and right channels are low-pass filtered to a band-width of 15kHz. The mono signal is generated from the summation of the left and right channels, whilst the differential is produced from the subtraction of the right from the left channel.

The pilot signal is then doubled and used as the carrier for a double-sideband suppressed carrier signal which is modulated by the differential signal. This is summed with the pilot and mono signal, and that is then frequency-modulated.

For reception, older mono sets just low-pass the raw FM discriminator output (or rely on the fact that most speakers won’t reproduce >18kHz well), whilst a stereo set performs the necessary signal processing to extract the left and right channels.

Below, is a flow-graph in GNURadio companion that shows this:

Flow graph for FM stereo reception

The signal comes in at the top-left via a RTL-SDR. We first low-pass filter it to receive just the station we want (in this case I’m receiving Triple M Brisbane at 104.5MHz). We then pass it through the WBFM de-modulator. At this point I pass a copy of this signal to a waterfall plot. A second copy gets low-passed at 15kHz and down-sampled to a 32kHz sample rate (my sound card doesn’t do 500kHz sample rates!).

A third copy is passed through a band-pass filter to isolate the differential signal, and a fourth, is filtered to isolate the pilot at 19kHz.

The pilot in a real receiver would ordinarily be full-wave-bridge-rectified, or passed through a PLL frequency synthesizer to generate a 38kHz carrier. Here, I used the abs math function, then band-passed it again to get a nice clean 38kHz carrier. This is then mixed with the differential signal I isolated before, then the result low-pass filtered to shift that differential signal to base band.

I now have the necessary signals to construct the two channels: M + D gives us (L+R) + (L-R) = 2L, and M – D = (L+R) – (L – R) = 2R. We have our stereo channels.

Below are the three waterfall diagrams showing (from top to bottom) the de-modulated differential signal, the 38kHz carrier for the differential signal and the raw output from the WBFM discriminator.

The constituent components of a FM stereo radio station.

Not decoded here is the RDS carrier which can be seen just above the differential signal in the third waterfall diagram.