Aug 082020
 

So, lately I’ve been working from home, which means amongst other things, playing music as loud as I like, not getting distracted by co-workers, and the kettle a short walking distance from my workspace.

Eventually though, I will have to return to the office. For this aim, I’ll need to be able to “drown out” the background noise. Music is good for this, but not everyone is into the same tastes as I am — I am a bit of a music luddite.

Years ago (when the ink was drying on my foundation license) I purchased a Logitech Wireless headset. Model A-00006 (yes, that is quite old now). This headset worked, but it did have two flaws:

  1. the audio isolation wasn’t great, so they tended to “leak” sound
  2. they had the dreaded asymmetric audio sample rate problem with JACK.

Now, for home use I bought a much nicer set which solves issue (2) and isn’t too bad with (1), but I’d like to keep them home. (2) isn’t a problem at work since I don’t generally have the need for the audio routing I use at home. So (2) isn’t going to be a problem.

This headset sat in a box for some years, and over time, the headband and earpads have fallen to bits. The electronics are still good. What if I bought a pair of earmuffs and stuffed the old headset guts inside those? That’d solve issue (1) nicely.

Getting inside

These things didn’t open up without a fight. I found that where the speakers are concerned, you will permanently break the housing. The two sides are joined by a 8-wire ribbon cable. The majority of the electronics is in the right-hand side. The battery is most of the guts of the left side.

You’ll need to destroy the headband to liberate the ribbon cable, and you’ll need to destroy the speakers’ housing to get at the screws behind.

I now have the un-housed headset guts sitting on the table, with the original charger plugged in charging the very flat battery, which is a single-cell 3.7V LiPo pouch cell; no idea what capacity it is, I doubt it’s more than 200mAh.

Charging

I plugged everything back together and tried the headset out. It still works, although instead of indicating a solid amber LED for charging, it was showing a slow blink.

fccid.io has a copy of the original documentation online, as well as photos of the guts here. In that, it did not discuss what a “slow blink” meant, which had me concerned that maybe the battery had been left too long and was no longer safe to charge.

Battery voltage whilst charging is sitting around 4.2V, which sounds fair for a 3.7V cell. It eventually stops blinking, going solid, but then the other LED turns RED. Disconnecting the battery reveals 0V across the pins.

So I might be up for a new battery, the PiJuice batteries look to be a similar arrangement (single cell with termistor pin) and may be a good upgrade anyway.

Next steps

I prefer the microphone on the right-hand side, so that’s one thing I’ll be looking at changing. The ribbon cable connects using small FPC connectors, so I’m thinking I might see if I can de-solder those and put a beefier 2.54″ KK-style connector in its place. This will require soldering some wire-wrap wire up to the pins, but the advantage is it’s a much easier connector to work with.

The break-out board on the left side is very simple, no components other than a momentary switch for detecting the microphone boom position, and pads to which the left speaker, microphone, mute LED and battery connect. I’ll still put the battery in the left side, so there’ll still be 5 wires running across the headband. It should be easier to interface with a new battery as well doing this.

I will also have to bring the buttons and switch out to the outside of the earcup, so I’ll probably use KK connectors for those too. The power switch is a through-hole part, so that should be easy.

I’ll probably replace the proprietary power connector with a barrel jack too. Not sure if these will charge from 5V, the original charger has a 6V output.

I think once I’ve got more hacker-friendly connectors onto this, I should be able to look at readying the new home for the electronics.

Jun 212020
 

So, for a decade now, I’ve been looking at a way of un-tethering myself from VoIP and radio applications. Headsets are great, but it does mean you’re chained to whatever it’s plugged into by your head.

Two solutions exist for this: get a headset with a wireless interface, or get a portable device to plug the headset into.

The devices I’d like to use are a combination of analogue and computer-based devices. Some have Bluetooth… I did buy the BU-1 module for my Yaesu VX-8DR years, and still have it installed in the FTM-350AR, but found it was largely a gimmick as it didn’t work with the majority of Bluetooth headsets on the market, and was buggy with the ones it did work with.

Years ago, I hit upon a solution using a wireless USB headset and a desktop PC connected to the radio. Then, the headset was one of the original Logitech wireless USB headsets, which I still have and still works… although it is in need of a re-build as the head-band is falling to pieces and the leatherette covering on the earpieces has long perished.

One bug bear I had with it though, is that the microphone would only do 16kHz sample rates. At the time when I did that set-up, I was running a dual-Pentium III workstation with a PCI sound-card which was fairly frequency-agile, so it could work to the limitations of the headset, however newer sound cards are generally locked to 48kHz, and maybe support 44.1kHz if you’re lucky.

I later bought a Logitech G930 headset, but found it had the same problem. It also was a bit “flat” sounding, given it is a “surround sound” headset, it compromises on the audio output. This, and the fact that JACK just would not talk to it any higher than 16kHz, meant that it was relegated to VoIP only. I’ve been using VoIP a lot thanks to China’s little gift, even doing phone patches using JACK, Twinkle and Zoom for people who don’t have Internet access.

I looked into options. One option I considered was the AstroGaming A50 Gen 4. These are a pricey option, but one nice feature is they do have an analogue interface, so in theory, it could wire direct to my old radios. That said, I couldn’t find any documentation on the sample rates supported, so I asked.

… crickets chirping …

After hearing nothing, I decided they really didn’t want my business, and there were things that made me uneasy about sinking $500 on that set. In the end I stumbled on these.

Reading the specs, the microphone frequency range was 30Hz to 20kHz… that meant for them to not be falsely advertising, they had to be sampling at at least 40kHz. 44.1 or 48kHz was fine by me. These are pricey too, but not as bad as the A50s, retailing at AU$340.

I took the plunge:

RC=0 stuartl@rikishi ~ $ cat /proc/asound/card1/stream0 
Unknown manufacturer ATH-G1WL at usb-0000:00:14.0-4, full speed : USB Audio

Playback:
  Status: Running
    Interface = 1
    Altset = 1
    Packet Size = 200
    Momentary freq = 48000 Hz (0x30.0000)
  Interface 1
    Altset 1
    Format: S16_LE
    Channels: 2
    Endpoint: 1 OUT (ADAPTIVE)
    Rates: 48000
    Bits: 16

Capture:
  Status: Running
    Interface = 2
    Altset = 1
    Packet Size = 100
    Momentary freq = 48000 Hz (0x30.0000)
  Interface 2
    Altset 1
    Format: S16_LE
    Channels: 1
    Endpoint: 2 IN (ASYNC)
    Rates: 48000
    Bits: 16

Okay, so these are locked to 48kHz… that works for me. Oddly enough, I used to get a lot of XRUNS in jackd with the Logitech sets… I don’t get any using this set, even with triple the sample rate. This set is very well behaved with jackd. Allegedly the headset is “broadcast”-grade.

Well, not sure about that, but it performs a lot better than the Logitech ones did. AudioTechnica have plenty of skin in the audio game, have done for decades (the cartridge on my father’s turntable… an old Rotel from the late 70s) was manufactured by them. So it’s possible, but it’s also possible it’s marketing BS.

The audio quality is decent though. I’ve only used it for VoIP applications so far, people have noticed the microphone is more “bassy”.

The big difference from my end, is notifications from Slack and my music listening. Previously, since I was locked to 16kHz on the headset, it was no good listening to music there since I got basically AM radio quality. So I used the on-board sound card for that with PulseAudio. Slack though (which my workplace uses), refuses to send notification sounds there, so I missed hearing notifications as I didn’t have the headset on.

Now, I have the music going through the headset with the notification sounds. I miss nothing. PulseAudio also had a habit of “glitching”, momentary drop-outs in audio. This is gone when using JACK.

My latency is just 64msec. I can’t quite ditch PulseAudio, as without it, tools like Zoom won’t see the JACK virtual source/sink… this seems to be a limitation of the QtMultimedia back-end being used: it doesn’t list virtual sound interfaces and doesn’t let people put in arbitrary ALSA device strings (QAudioDeviceInfo just provides a list).

At the moment though, I don’t need to route between VoIP applications (other than Twinkle, which talks to ALSA direct), so I can live with it staying there for now.

Jun 082020
 

Well, it’s been a while since we’ve had a cat in the house. Our establishment has been home to many pets over the years, mostly dogs and cats.

Our last was a domestic moggy named Emma, who we had as a kitten back in 1990 and past away in 2008 a few months short of her 18th birthday. A good innings for a feline!

Recently, my maternal grandmother passed away. She had been living alone since my grandfather had to move to a nursing home, and making a red-hot go of it, but bleeding on the brain eventually caused her demise. She was looking after a cat there, a 5-year old Russian Blue / domestic short-hair cross named Sam.

Just before my grandmothers’ passing, I had raised concerns about his welfare, since no one was in the house looking after him. Apparently he was getting occasional visits to re-fill bowls and take out cat litter trays, but it wasn’t ideal. There was a definite concern that he could be “forgotten”.

My two uncles on that side both have pets that Sam likely would not get along with. My mother has twice as many cats as she’s theoretically allowed. My sister’s husband dislikes cats. Most of my cousins are in rental accommodation. I was one of the few in the family that could take on a cat.

Not ideal for us, because we do go away for WICEN events occasionally and the people we’d have left Emma with are now either passed away or in palliative care. So we’ll have to figure something out when the time comes. But, at least here he’s got human attention, he’s got food, he’s got a clean litter tray, and a bigger house to run around in.

For now, he’s in my name for welfare purposes… the estate isn’t worked out yet (there’s 6 months delay in case someone “contests” the will). I can transfer ownership of him to the person officially inheriting him, but smart money is that’ll be me anyway.

Of course one fun thing is that thanks to a little gift from China, I’m working from home. Right now, the task at hand is developing a Modbus driver, so my “workbench”^W^Wthe dining room table has my laptop, and an industrial PC that’s pretending to be a Modbus/RTU device.

Sam has discovered jumping on the keyboard gets instant attention:

Sam, trying a, errm, paw, at JavaScript
The “JavaScript” code

Yeah, why am I reminded of the FVWM Cats page?

Jun 012020
 

Brisbane Area WICEN Group (Inc) lately has been caught up in this whole COVID-19 situation, unable to meet face-to-face for business meetings. Like a lot of groups, we’ve had to turn to doing things online.

Initially, Cisco WebEx was trialled, however this had significant compatibility issues, most notably, under Linux — it just straight plain didn’t work. Zoom however, has proven fairly simple to operate and seems to work, so we’ve been using that for a number of “social” meetings and at least one business meeting so far.

A challenge we have though, is that one of our members does not have a computer or smart-phone. Mobile telephony is unreliable in his area (Kelvin Grove), and so yee olde PSTN is the most reliable service. For him to attend meetings, we need some way of patching that PSTN line into the meeting.

The first step is to get something you can patch to. In my case, it was a soft-phone and a SIP VoIP service. I used Twinkle to provide that link. You could also use others like baresip, Linphone or anything else of your choosing. This connects to your sound card at one end, and a Voice Service Provider; in my case it’s my Asterisk server through Internode NodePhone.

The problem is though, while you can certainly make a call outbound whilst in a conference, the person on the phone won’t be able to hear the conference, nor will the conference attendees be able to hear the person on the phone.

Enter JACK

JACK is a audio routing framework for Unix-like operating systems that allows for audio to be routed between applications. It is geared towards multimedia production and professional audio, but since there’s a plug-in in the ALSA framework, it is very handy for linking audio between applications that would otherwise be incompatible.

For this to work, one application has to work either directly with JACK, or via the ALSA plug-in. Many support, and will use, an alternate framework called PulseAudio. Conference applications like Zoom and Jitsi almost universally rely on this as their sound card interface on Linux.

PulseAudio unfortunately is not able to route audio with the same flexibility, but it can route audio to JACK. In particular, JACKv2 and its jackdbus is the path of least resistance. Once JACK starts, PulseAudio detects its presence, and loads a module that connects PulseAudio as a client of JACK.

A limitation with this is PulseAudio will pre-mix all audio streams it receives from its clients into one single monolithic (stereo) feed before presenting that to JACK. I haven’t figured out a work-around for this, but thankfully for this use case, it doesn’t matter. For our purposes, we have just one PulseAudio application: Zoom (or Jitsi), and so long as we keep it that way, things will work.

Software tools

  • jack2: The audio routing daemon.
  • qjackctl: This is a front-end for controlling JACK. It is optional, but if you’re not familiar with JACK, it’s the path of least resistance. It allows you to configure, start and stop JACK, and to control patch-bay configuration.
  • SIP Client, in my case, Twinkle.
  • ALSA JACK Plug-in, part of alsa-plugins.
  • PulseAudio JACK plug-in, part of PulseAudio.

Setting up the JACK ALSA plug-in

To expose JACK to ALSA applications, you’ll need to configure your ${HOME}/.asoundrc file. Now, if your SIP client happens to support JACK natively, you can skip this step, just set it up to talk to JACK and you’re set.

Otherwise, have a look at guides such as this one from the ArchLinux team.

I have the following in my .asoundrc:

pcm.!default {
        type plug
        slave { pcm "jack" }
}

pcm.jack {
        type jack
        playback_ports {
                0 system:playback_1
                1 system:playback_2
        }
        capture_ports {
                0 system:capture_1
                1 system:capture_1
        }
}

The first part sets my default ALSA device to jack, then the second block defines what jack is. You could possibly skip the first block, in which case your SIP client will need to be told to use jack (or maybe plug:jack) as the ALSA audio device for input/output.

Configuring qjackctl

At this point, to test this we need a JACK audio server running, so start qjackctl. You’ll see a window like this:

qjackctl in operation

This shows it actually running, most likely for you this will not be the case. Over on the right you’ll see Setup… — click that, and you’ll get something like this:

Parameters screen

The first tab is the parameters screen. Here, you’ll want to direct this at your audio device that your speakers/microphone are connected to.

The sample rate may be limited by your audio device. In my experience, JACK hates devices that can’t do the same sample rate for input and output.

My audio device is a Logitech G930 wireless USB headset, and it definitely has this limitation: it can play audio right up to 48kHz, but will only do a meagre 16kHz on capture. JACK thus limits me to both directions running 16kHz. If your device can do 48kHz, that’d be better if you intend to use it for tasks other than audio conferencing. (If your device is also wireless, I’d be interested in knowing where you got it!)

JACK literature seems to recommend 3 periods/buffer for USB devices. The rest is a matter of experiment. 1024 samples/period seems to work fine on my hardware most of the time. Your mileage may vary. Good setups may get away with less, which will decrease latency (mine is 192ms… good enough for me).

The other tab has more settings:

Advanced settings

The things I’ve changed here are:

  • Force 16-bit: since my audio device cannot do anything but 16-bit linear PCM, I force 16-bit mode (rather than the default of 32-bit mode)
  • Channels I/O: output is stereo but input is mono, so I set 1 channel in, two channels out.

Once all is set, Apply then OK.

Now, on qjackctl itself, click the “Start” button. It should report that it has started. You don’t need to click any play buttons to make it work from here. You may have noticed that PulseAudio has detected the JACK server and will now connect to it. If click “Graph”, you’ll see something like this:

qjackctl‘s Graph window

This is the thing you’ll use in qjackctl the most. Here, you can see the “system” boxes represent your audio device, and “PulseAudio JACK Sink”/”PulseAudio JACK Source” represent everything that’s connected to PulseAudio.

You should be able to play sound in PulseAudio, and direct applications there to use the JACK virtual sound card. pavucontrol (normally shipped with PulseAudio) may be handy for moving things onto the JACK virtual device.

Configuring your telephony client

I’ll use Twinkle as the example here. In the preferences, look for a section called Audio. You should see this:

Twinkle audio settings

Here, I’ve set my ringing device to pulse to have that ring PulseAudio. This allows me to direct the audio to my laptop’s on-board sound card so I can hear the phone ring without the headset on.

Since jack was made my default device, I can leave the others as “Default Device”. Otherwise, you’d specify jack or plug:jack as the audio device. This should be set on both Speaker and Microphone settings.

Click OK once you’re done.

Configuring Zoom

I’ll use Zoom here, but the process is similar for Jitsi. In the settings, look for the Audio section.

Zoom audio settings

Set both Speaker and Microphone to JACK (sink and source respectively). Use the “Test Speaker” function to ensure it’s all working.

The patch up

Now, it doesn’t matter whether you call first, then join the meeting, or vice versa. You can even have the PSTN caller call you. The thing is, you want to establish a link to both your PSTN caller and your conference.

The assumption is that you now have a session active in both programs, you’re hearing both the PSTN caller and the conference in your headset, when you speak, both groups hear you. To let them hear each other, do this:

Go to qjackctl‘s patch bay. You’ll see PulseAudio is there, but you’ll also see the instance of the ALSA plug-in connected to JACK. That’s your telephony client. Both will be connected to the system boxes. You need to draw new links between those two new boxes, and the PulseAudio boxes like this:

qjackctl patching Twinkle to Zoom

Here, Zoom is represented by the PulseAudio boxes (since it is using PulseAudio to talk to JACK), and Twinkle is represented by the boxes named alsa-jack… (tip: the number is the PID of the ALSA application if you’re not sure).

Once you draw the connections, the parties should be able to hear each-other. You’ll need to monitor this dialogue from time to time: if either of PulseAudio or the phone client disconnect from JACK momentarily, the connections will need to be re-made. Twinkle will do this if you do a three-way conference, then one person hangs up.

Anyway, that’s the basics covered. There’s more that can be done, for example, recording the audio, or piping audio from something else (e.g. a media player) is just a case of directing it either at JACK directly or via the ALSA plug-in, and drawing connections where you need them.

May 262020
 

Lately, I’ve been socially distancing a home and so there’s been a few projects that have been considered that otherwise wouldn’t ordinarily get a look in on a count of lack-of-time.

One of these has been setting up a Raspberry Pi with DRAWS board for use on the bicycle as a radio interface. The DRAWS interface is basically a sound card, RTC, GPS and UART interface for radio interfacing applications. It is built around the TI TMS320AIC3204.

Right now, I’m still waiting for the case to put it in, even though the PCB itself arrived months ago. Consequently it has not seen action on the bike yet. It has gotten some use though at home, primarily as an OpenThread border router for 3 WideSky hubs.

My original idea was to interface it to Mumble, a VoIP server for in-game chat. The idea being that, on events like the Yarraman to Wulkuraka bike ride, I’d fire up the phone, connect it to an AP run by the Raspberry Pi on the bike, and plug my headset into the phone:144/430MHz→2.4GHz cross-band.

That’s still on the cards, but another use case came up: digital. It’d be real nice to interface this over WiFi to a stronger machine for digital modes. Sound card over network sharing. For this, Mumble would not do, I need a lossless audio transport.

Audio streaming options

For audio streaming, I know of 3 options:

  • PulseAudio network streaming
  • netjack
  • trx

PulseAudio I’ve found can be hit-and-miss on the Raspberry Pi, and IMO, is asking for trouble with digital modes. PulseAudio works fine for audio (speech, music, etc). It will make assumptions though about the nature of that audio. The problem is we’re not dealing with “audio” as such, we’re dealing with modem tones. Human ears cannot detect phase easily, data modems can and regularly do. So PA is likely to do things like re-sample the audio to synchronise the two stations, possibly use lossy codecs like OPUS or CELT, and make other changes which will mess with the signal in unpredictable ways.

netjack is another possibility, but like PulseAudio, is geared towards low-latency audio streaming. From what I’ve read, later versions use OPUS, which is a no-no for digital modes. Within a workstation, JACK sounds like a close fit, because although it is geared to audio, its use in professional audio means it’s less likely to make decisions that would incur loss, but it is a finicky beast to get working at times, so it’s a question mark there.

trx was a third option. It uses RTP to stream audio over a network, and just aims to do just that one thing. Digging into the code, present versions use OPUS, older versions use CELT. The use of RTP seemed promising though, it actually uses oRTP from the Linphone project, and last weekend I had a fiddle to see if I could swap out OPUS for linear PCM. oRTP is not that well documented, and I came away frustrated, wondering why the receiver was ignoring the messages being sent by the sender.

It’s worth noting that trx probably isn’t a good example of a streaming application using oRTP. It advertises the stream as G711u, but then sends OPUS data. What it should be doing is sending it as a dynamic content type (e.g. 96), and if this were a SIP session, there’d be a RTPMAP sent via Session Description Protocol to say content type 96 was OPUS.

I looked around for other RTP libraries to see if there was something “simpler” or better documented. I drew a blank. I then had a look at the RTP/RTCP specs themselves published by the IETF. I came to the conclusion that RTP was trying to solve a much more complicated use case than mine. My audio stream won’t traverse anything more sophisticated than a WiFi AP or an Ethernet switch. There’s potential for packet loss due to interference or weak signal propagation between WiFi nodes, but latency is likely to remain pretty consistent and out-of-order handling should be almost a non-issue.

Another gripe I had with RTP is its almost non-consideration of linear PCM. PCMA and PCMU exist, 16-bit linear PCM at 44.1kHz sampling exists (woohoo, CD quality), but how about 48kHz? Nope. You have to use SDP for that.

Custom protocol ideas

With this in mind, my own custom protocol looks like the simplest path forward. Some simple systems that used by GQRX just encapsulate raw audio in UDP messages, fire them at some destination and hope for the best. Some people use TCP, with reasonable results.

My concern with TCP is that if packets get dropped, it’ll try re-sending them, increasing latency and never quite catching up. Using UDP side-steps this, if a packet is lost, it is forgotten about, so things will break up, then recover. Probably a better strategy for what I’m after.

I also want some flexibility in audio streams, it’d be nice to be able to switch sample rates, bit depths, channels, etc. RTP gets close with its L16/44100/2 format (the Philips Red-book standard audio format). In some cases, 16kHz would be fine, or even 8kHz 16-bit linear PCM. 44.1k works, but is wasteful. So a header is needed on packets to at least describe what format is being sent. Since we’re adding a header, we might as well set aside a few bytes for a timestamp like RTP so we can maintain synchronisation.

So with that, we wind up with these fields:

  • Timestamp
  • Sample rate
  • Number of channels
  • Sample format

Timestamp

The timestamp field in RTP is basically measured in ticks of some clock of known frequency, e.g. for PCMU it is a 8kHz clock. It starts with some value, then increments up monotonically. Simple enough concept. If we make this frequency the sample rate of the audio stream, I think that will be good enough.

At 48kHz 16-bit stereo; data will be streaming at 192kbps. We can tolerate wrap-around, and at this data rate, we’d see a 16-bit counter overflow every ~341ms, which whilst not unworkable, is getting tight. Better to use a 32-bit counter for this, which would extend that overflow to over 6 hours.

Sample rate encoding

We can either support an integer field, or we can encode the rate somehow. An integer field would need a range up to 768k to support every rate ALSA supports. That’s another 32-bit integer. Or, we can be a bit clever: nearly every sample rate in common use is a harmonic of 8kHz or 11.025kHz, so we devise a scheme consisting of a “base” rate and multiplier. 48kHz? That’s 8kHz×6. 44.1kHz? That’s 11.025kHz×4.

If we restrict ourselves to those two base rates, we can support standard rates from 8kHz through to 1.4MHz by allocating a single bit to select 8kHz/11.025kHz and 7 bits for the multiplier: the selected sample rate is the base rate multiplied by the multipler incremented by one. We’re unlikely to use every single 8kHz step though. Wikipedia lists some common rates and as we go up, the steps get bigger, so let’s borrow 3 multiplier bits for a left-shift amount.

7 6 5 4 3 2 1 0
B S S S M M M M

B = Base rate: (0) 8000 Hz, (1) 11025 Hz
S = Shift amount
M = Multiplier - 1

Rate = (Base << S) * (M + 1)

Examples:
  00000000b (0x00): 8kHz
  00010000b (0x10): 16kHz
  10100000b (0xa0): 44.1kHz
  00100000b (0x20): 48kHz
  01010010b (0x52): 768kHz (ALSA limit)
  11111111b (0xff): 22.5792MHz (yes, insane)

Other settings

I primarily want to consider linear PCM types. Technically that includes unsigned PCM, but since that’s losslessly transcodable to signed PCM, we could ignore it. So we could just encode the number of bytes needed for a single channel sample, minus one. Thus 0 would be 8-bits; 1 would be 16-bits; 2 would be 32-bits and 3 would be 64-bits. That needs just two bits. For future-proofing, I’d probably earmark two extra bits; reserved for now, but might be used to indicate “compressed” (and possibly lossy) formats.

The remaining 4 bits could specify a number of channels, again minus 1 (mono would be 0, stereo 1, etc up to 16).

Packet type

For the sake of alignment, I might include a 16-bit identifier field so the packet can be recognised as being this custom audio format, and to allow multiplexing of in-band control messages, but I think the concept is there.

May 222020
 

For the past 2 years now, there’s been quite a bit in the press about the next evolution of mobile telephony standards.

The 5G standard is supposed to bring with it higher speeds and greater user density handling. As with a lot of systems, “5G” itself, describes a family of standards… some concern the use of millimetre-wave communications for tower-to-handset communications, some cover the communications channels for more modest frequencies in the high UHF bands.

One thing that I really can’t get my head around is the so-called claims of health effects.

Now, these are as old as radio communications itself. And for sure, danger to radio transmissions does increase with frequency, proximity and transmit power. There is a reason why radio transmitter sites such as those that broadcast medium wave radio or television are fenced off: electrocution is a real risk at high power.

0G: glorified two-way radios

Mobile phones originally were little more than up-market cordless phones. They often were a luggable device if they were portable at all. Many were not, they were installed into a vehicle (hence “mobile”). No such thing as cell hand-over, and often incoming calls had to be manually switched.

Often the sets were half-duplex, and despite using a hand-set, would have a very distinctive “radio” feel to them, requiring the user use a call-sign when initiating a call, and pressing a push-to-talk button to switch between listening and talking modes.

These did not see much deployment outside the US or maybe Europe.

1G: cellular communications

Back in the late 80s, when AMPS mobile phones (1G) were little more than executive toys, there might not have been much press about, but I’m sure there’d be anecdotal evidence of people being concerned about “radiation”.

If any standard was going to cause problems, it’d have been 1G, since the sets generally used much higher transmit power to compensate for the lack of coverage. They were little more than glorified FM transceivers with a little digital control channel on the side which implemented the selective calling and cell hand-off.

This was the first standard we saw here in Australia, and was the first to be actually practical. Analogue services didn’t last that long, and because of the expense of running AMPS services, they were mostly an expensive luxury. So that did limit its up-take.

2G: voice goes digital

The next big change was 2G, which replaced the analogue FM voice channel and used digital modulation techniques. GSM (which used Gaussian Minimum Shift Keying) and CDMA (which used phase shift keying) encoded everything in a single digital transmission.

This meant audio could be compressed (with some loss in fidelity), and have forward error correction added to make the signal more robust to noise. The cells could handle more users than the 1G services could. Transmit power could be reduced, improving battery life and the sets became cheaper to make and services became more economical.

Then came all the claims that 2G was going to cause us to develop brain cancer.

Now, many of those 2G services started popping up in the mid 90s… has there been a mass pandemic of cancer cases? Nope! About the only thing GSM was bad for, was its ability to leak into any audio frequency circuit.

2G went through a few sub-revisions, but it basically was AMPS done digitally, so fundamentally worked much the same. A sore point was how data was handled. 2G and its predecessors all tried to emulate what the wired network was doing: establishing a dedicated circuit between callers.

The Internet was really starting to get popular, and people wanted a way to access it on the move. GPRS did allow for some of that, but it really didn’t work that well due to the way 2G saw the world, so things moved on.

3G: packet switching

The big change here was the move from “circuits” to sending data around in packets. This is more like how the Internet operates, and so it meant the services could better support an Internet connection.

Voice still went the old-fashioned way, dedicated circuits, since the QoS (quality of service) could be better maintained that way.

The cells could support more users than 2G could, and the packet mode meant mobile Internet finally became a “thing” for most people.

I don’t recall there being the same concern about health as there was for 2G… it was probably still simmering below the surface. Services were deployed further afield and of course, the uptake continued.

4G: bye bye circuit switching

4G or LTE is the current standard that most of us are using. The biggest change is it ditches the circuit switching used in 1G, 2G and 3G. Voice is done using VoLTE… basically the voice call is sent the same way calls are routed over the Internet.

The cell towers are no longer trying to keep a “circuit” connected to your phone as you move around, instead it’s just directing packets. It’s your handset’s problem to sort out whether it heard a given packet already, or re-arrange incoming packets if they arrive out-of-order.

To make this work, obviously the latency inherent in 3G had to be addressed. As a sweetener, the speeds were bumped up, and the voice CODEC could be updated, so we gained wide-band voice calls. (Pity Bluetooth hasn’t kept up!)

5G: new frequencies, higher speed, smaller cells

So far, the cellular standards have largely co-existed in the same frequency bands. 4G actually varies quite a bit in frequency, but basically there are bands from the low UHF around 410MHz right up to microwave at 2600MHz.

Higher frequencies

5G has been contentious because some implementations of it reach even higher. Frequency Range 1 used in the 5G NR standard is basically much the same as 4G, but frequency range 2 soars as high as 40GHz.

Now, in terms of the electromagnetic spectrum, compared to other forms of radiation that we rely on for survival (and have done ever since life first began on this planet), this might as well be DC!

Infrared radiation, which is the very bottom of the “light” spectrum, starts at 300GHz. At these frequencies, we typically forget about frequencies, and instead consider wavelengths (1mm in this case). Visible light is even higher, 430THz (yes, that’s T for tera!).

Now, where do we start to worry about radiation? The nasty stuff begins with ultraviolet radiation, specifically UVC which is at a dizzying 1.1PHz (yes, that’s peta-hertz). It’s worth noting that UVB, which is a little lower in frequency can cause problems when exposure is excessive… however none is dangerous too, you actually need UVB exposure on your body to produce vitamin D for survival!

Dielectric heating

So that’s where the danger is in terms of frequency. I did mention that danger also increases with power… this is why microwave ovens, which typically operate at a fairly modest 2.4GHz frequency, pose a risk.

No, they won’t make you develop cancer, but the danger there is when there’s a lot of power, it can cause dielectric heating. That is, it causes molecules to move around, and in doing so, collide transferring energy which is then given off as heat. It happens at all frequencies in the EM spectrum, but it starts to become more practical at microwave frequencies.

To do something like cook dinner, a microwave oven bombards your food with hundreds of watts of RF energy at it. The microwave has a thick RF shield around it for a reason! If that shield is doing what it should, you might be exposed to no more than a watt of energy escaping the shield. Not enough to cause any significant heating.

I hear that if you put a 4W power amp on a 2.4GHz WiFi access point and put your hand in front of the antenna, you can “feel” framing packets. (Never tried this myself.) That’s pretty high power for most microwave links, and would be many orders of magnitude more than what any cell phone would be capable of.

Verdict: not a health risk

In my view, there’s practically no risk in terms of health effects from 5G. I expect my reasoning above will be thoroughly rubbished by those who are protesting against the roll-out.

However, that does not mean I am in favour of 5G.

The case against 5G

So I seem to be sticking up for 5G above, but let me make one thing abundantly clear, for us here in Australia, I do not think 5G is the “right” thing for us to use. It’s perfectly safe in terms of health effects, but simply the wrong tool for the job.

Small cells

Did I mention before the cells were smaller? Compared to its predecessors, 5G cells are tiny! The whole point of 5G was to serve a large number of users in a small area. Think of 10s of thousands of people crammed into a single stadium (okay, once COVID-19 is put to bed). That’s the use case for 5G.

5G’s range when deployed on the lower bands, is about on par with 4G. Maybe a little better in certain ideal conditions with higher speeds. This is likely the variant we’re most likely to see outside of major city CBDs. How reliable it is at that higher speed remains to be seen, as there’s a crazy amount of DSP going on to make stuff work at those data rates.

5G when deployed with mmWave bands, barely makes 500 metres. This will make deployment in the suburbs prohibitively expensive. Outdoor Wi-Fi or WiMAX might not be as fast, but would be more cost-effective!

Processor load

Did I mention about the crazy amount of DSP going on? To process data streams that exceed 1Gbps, you’re doing a lot of processing to extract the data out of the radio signal. 5G leans heavily on MIMO for its higher speeds, basically dividing the high-rate stream into parts which are directed to separate antennas. This reduces the bandwidth needed to achieve a high data rate, but it does make processing the signal at the far end more complex.

Consequently, the current crop of 5G handsets run hot. How hot? Well, subject them to 29.5°C, and they shut down! Now, think about the weather we get in this country? How many days have we experienced lately where 29°C has been a daily minimum, not a maximum?

5G isn’t the future for Australia

We need a wireless standard that goes the distance, and can take the heat! 5G is not looking so great in this marathon race. Personally, I’d like to see more investment into the 4G services and getting those rolled out to more locations. There’s plenty of locations that are less than a day’s drive from most capital cities, where mobile coverage is next to useless.

Plenty of modern 4GX handsets also suffer technical elitism… they see 3G services, but then refuse to talk to them, instead dropping to -1G: brick emulation. There’s a reason I stick by my rather ancient ZTE T83 and why I had high hopes for the Kite.

I think for the most part, many of the wireless standards we see have been driven by Europe and Asia, both areas with high population densities and relatively cool annual temperatures.

It saddens me when I hear Telstra tell everybody that they “aspire” to be a technology company, when back in the early 90s, Telecom Australia very much was a technology company, and a well respected trail-blazing one at that! It’s time they pulled their finger out and returned to those days.

May 122020
 

So, the other day I pondered about whether BlueTrace could be ported to an older device, or somehow re-implemented so it would be compatible with older phones.

The Australian Government has released their version of TraceTogether, COVIDSafe, which is available for newer devices on the Google and Apple application repositories. It suffers a number of technical issues, one glaring one being that even on devices it theoretically supports, it doesn’t work properly unless you have it running in the foreground and your phone unlocked!

Well, there’s a fail right there! Lots of people, actually need to be able to lock their phones. (e.g. a condition of their employment, preventing pocket dials, saving battery life, etc…)

My phone, will never run COVIDSafe, as provided. Even compiling it for Android 4.1 won’t be enough, it uses Bluetooth Low Energy, which is a Bluetooth 4.0 feature. However, the government did one thing right, they have published the source code. A quick fish-eye over the diff against TraceTogether, suggests the changes are largely superficial.

Interestingly, although the original code is GPLv3, our government has decided to supply their own license. I’m not sure how legal that is. Others have questioned this too.

So, maybe I can run it after all? All I need is a device that can do BLE. That then “phones home” somehow, to retrieve tokens or upload data. Newer phones (almost anything Android-based) usually can do WiFi hotspot, which would work fine with a ESP32.

Older phones don’t have WiFi at all, but many can still provide an Internet connection over a Bluetooth link, likely via the LAN Access Profile. I think this would mean my “token” would need to negotiate HTTPS itself. Not fun on a MCU, but I suspect someone has possibly done it already on ESP32.

Nordic platforms are another option if we go the pure Bluetooth route. I have two nRF52840-DK boards kicking around here, bought for OpenThread development, but not yet in use. A nicety is these do have a holder for a CR2032 cell, so can operate battery-powered.

Either way, I think it important that the chosen platform be:

  1. easily available through usual channels
  2. cheap
  3. hackable, so the devices can be re-purposed after this COVID-19 nonsense blows over

A first step might be to see if COVIDSafe can be cleaved in two… with the BLE part running on a ESP32 or nRF52840, and the HTTPS part running on my Android phone. Also useful, would be some sort of staging server so I can test my code without exposing things. Not sure if there is such a beast publicly available that we can all make use of.

Guess that’ll be the next bit to look at.

May 032020
 

This afternoon, I was pondering about how I might do text-to-speech, but still have the result sound somewhat natural. For what use case? Well, two that come to mind…

The first being for doing “strapper call” announcements at horse endurance rides. A horse endurance ride is where competitors and their horses traverse a long (sometimes as long as 320km) trail through a wilderness area. Usually these rides (particularly the long ones) are broken up into separate stages or “legs”.

Upon arrival back at base, the competitor has a limited amount of time to get the horse’s vital signs into acceptable ranges before they must present to the vet. If the horse has a too-high temperature, or their horse’s heart rate is too high, they are “vetted out”.

When the competitor reaches the final check-point, ideally you want to let that competitor’s support team know they’re on their way back to base so they can be there to meet the competitor and begin their work with the horse.

Historically, this was done over a PA system, however this isn’t always possible for the people at base to achieve. So having an automated mechanism to do this would be great. In recent times, Brisbane WICEN has been developing a public display that people can see real-time results on, and this also doubles as a strapper-call display.

Getting the information to that display is something of a work-in-progress, but it’s recognised that if you miss the message popping up on the display, there’s no repeat. A better solution would be to “read out” the message. Then you don’t have to be watching the screen, you can go about your business. This could be done over a PA system, or at one location there’s an extensive WiFi network there, so streaming via Icecast is possible.

But how do you get the text into speech?

Enter flite

flite is a minimalist speech synthesizer from the Festival project. Out of the box it includes 3 voices, mostly male American voices. (I think the rms one might be Richard M. Stallman, but I could be wrong on that!) There’s a couple of demos there that can be run direct from the command line.

So, for the sake of argument, let’s try something simple, I’ll use the slt voice (a US female voice) and just get the program to read out what might otherwise be read out during a horse ride event:

$ flite_cmu_us_slt -t 'strapper call for the 160 kilometer event competitor numbers 123 and 234' slt-strapper-nopunctuation-digits.wav
slt-strapper-nopunctuation-digits.ogg

Not bad, but not that great either. Specifically, the speech is probably a little quick. The question is, how do you control this? Turns out there’s a bit of hidden functionality.

There is an option marked -ssml which tells flite to interpret the text as SSML. However, if you try it, you may find it does little to improve matters, I don’t think flite actually implements much of it.

Things are improved if we spell everything out. So if you instead replace the digits with words, you do get a better result:

$ flite_cmu_us_slt -t 'strapper call for the one hundred and sixty kilometer event competitor number one two three and two three four' slt-strapper-nopunctuation-words.wav
slt-strapper-nopunctuation-words.ogg

Definitely better. It could use some pauses. Now, we don’t have very fine-grained control over those pauses, but we can introduce some punctuation to have some control nonetheless.

$ flite_cmu_us_slt -t 'strapper call.  for the one hundred and sixty kilometer event.  competitor number one two three and two three four' slt-strapper-punctuation.wav
slt-strapper-punctuation.ogg

Much better. Of course it still sounds somewhat robotic though. I’m not sure how to adjust the cadence on the whole, but presumably we can just feed the text in piece-wise, render those to individual .wav files, then stitch them together with the pauses we want.

How about other changes though? If you look at flite --help, there is feature options which can control the synthesis. There’s no real documentation on what these do, what I’ve found so far was found by grep-ing through the flite source code. Tip: do a grep for feat_set_, and you’ll see a whole heap.

Controlling pitch

There’s two parameters for the pitch… int_f0_target_mean controls the “centre” frequency of the speech in Hertz, and int_f0_target_stddev controls the deviation. For the slt voice, …mean seems to sit around 160Hz and the deviation is about 20Hz.

So we can say, set the frequency to 90Hz and get a lower tone:

$ flite_cmu_us_slt --setf int_f0_target_mean=90 -t 'strapper call' slt-strapper-mean-90.wav
slt-strapper-mean-90.ogg

… or 200Hz for a higher one:

$ flite_cmu_us_slt --setf int_f0_target_mean=200 -t 'strapper call' slt-strapper-mean-200.wav
slt-strapper-mean-200.ogg

… or we can change the variance:

$ flite_cmu_us_slt --setf int_f0_target_stddev=0.0 -t 'strapper call' slt-strapper-stddev-0.wav
$ flite_cmu_us_slt --setf int_f0_target_stddev=70.0 -t 'strapper call' slt-strapper-stddev-70.wav
slt-strapper-stddev-0.ogg
slt-strapper-stddev-70.ogg

We can’t change these values during a block of speech, but presumably we can cut up the text we want to render, render each piece at the frequency/variance we want, then stitch those together.

Controlling rate

So I mentioned we can control the rate, somewhat coarsely using usual punctuation devices. We can also change the rate overall by setting duration_stretch. This basically is a control of how “long” we want to stretch out the pronunciation of words.

$ flite_cmu_us_slt --setf duration_stretch=0.5 -t 'strapper call' slt-strapper-stretch-05.wav
$ flite_cmu_us_slt --setf duration_stretch=0.7 -t 'strapper call' slt-strapper-stretch-07.wav
$ flite_cmu_us_slt --setf duration_stretch=1.0 -t 'strapper call' slt-strapper-stretch-10.wav
$ flite_cmu_us_slt --setf duration_stretch=1.3 -t 'strapper call' slt-strapper-stretch-13.wav
$ flite_cmu_us_slt --setf duration_stretch=2.0 -t 'strapper call' slt-strapper-stretch-20.wav
slt-strapper-stretch-05.ogg
slt-strapper-stretch-07.ogg
slt-strapper-stretch-10.ogg
slt-strapper-stretch-13.ogg
slt-strapper-stretch-20.ogg

Putting it together

So it looks as if all the pieces are there, we just need to stitch them together.

RC=0 stuartl@rikishi /tmp $ flite_cmu_us_slt --setf duration_stretch=1.2 --setf int_f0_target_stddev=50.0 --setf int_f0_target_mean=180.0 -t 'strapper call' slt-strapper-call.wav
RC=0 stuartl@rikishi /tmp $ flite_cmu_us_slt --setf duration_stretch=1.1 --setf int_f0_target_stddev=30.0 --setf int_f0_target_mean=180.0 -t 'for the, one hundred, and sixty kilometer event' slt-160km-event.wav
RC=0 stuartl@rikishi /tmp $ flite_cmu_us_slt --setf duration_stretch=1.4 --setf int_f0_target_stddev=40.0 --setf int_f0_target_mean=180.0 -t 'competitors, one two three, and, two three four' slt-competitors.wav
Above files stitched together in Audacity

Here, I manually imported all three files into Audacity, arranged them, then exported the result, but there’s no reason why the same could not be achieved by a program, I’m just inserting pauses after all.

There are tools for manipulating RIFF waveform files in most languages, and generating silence is not rocket science. The voice itself could be fine-tuned, but that’s simply a matter of tweaking settings. Generating the text is basically a look-up table feeding into snprintf (or its equivalent in your programming language of choice).

It’d be nice to implement a wrapper around flite that took the full SSML or JSML text and rendered it out as speech, but this gets pretty close without writing much code at all. Definitely worth continuing with.

Apr 192020
 

COVID-SARS-2 is a nasty condition caused by COVID-19 that has seen many a person’s life cut short. The COVID-19 virus which originated from Wuhan, China has one particularly insidious trait: it can be spread by asymptomatic people. That is, you do not have to be suffering symptoms to be an infectious carrier of the condition.

As frustrating as isolation has been, it’s really our only viable solution to preventing this infectious condition from spreading like wildfire until we get a vaccine that will finally knock it on the head.

One solution that has been proposed has been to use contract tracing applications which rely on Bluetooth messaging to detect when an infected person comes into contact with others. Singapore developed the TraceTogether application. The Australian Government look like they might be adopting this application, our deputy CMO even suggesting it’d be made compulsory (before the PM poured water on that plan).

Now, the Android version of this, requires Android 5.1. My phone runs 4.1: I cannot run this application. Not everybody is in the habit of using Bluetooth, or even carries a phone. This got me thinking: can this be implemented in a stand-alone device?

The guts of this application is a protocol called BlueTrace which is described in this whitepaper. Reference implementations exist for Android and iOS.

I’ll have to look at the nitty-gritty of it, but essentially it looks like a stand-alone implementation on a ESP32 module maybe a doable proposition. The protocol basically works like this:

  • Clients register using some contact details (e.g. a telephone number) to a server, which then issues back a “user ID” (randomised).
  • The server then uses this to generate “temporary IDs” which are constructed by concatenating the “User ID” and token life-time start/finish timestamps together, encrypting that with the secret key, then appending the IV and an authentication token. This BLOB is then Base64-encoded.
  • The client pulls down batches of these temporary IDs (forward-dated) to use for when it has no Internet connection available.
  • Clients, then exchange these temporary IDs using BLE messaging.

This, looks doable in an ESP32 module. The ESP32 could be loaded up with tokens by a workstation. You then go about your daily business, carrying this device with you. When you get home, you plug the device into your workstation, and it uploads the “temporary IDs” it saw.

I’ll have to dig out my ESP32 module, but this looks like a doable proposition.

Apr 112020
 

So, for years… decades even, our telephone service has been via the Public Switched Telephone Network. Originally intended for just voice traffic, this later became our Internet connection, using dial-up modems, then using ADSL.

The number itself gained two digits in its life-time: originally 6 digits (late 70s/early 80s), it gained a 0 at some point, then in 1996 a 3 was prepended to all Brisbane numbers.

So yeah, we’ve had our phone a long time, and the underlying technology has remained largely the same for that time period. Even the handset is the same one from all those years ago. Telecom Australia used to pay for it as a “priority service” back then as my father was working for them at the time.

By September, this will change. The National Broadband Network is in our suburb, and a little while back I migrated the ADSL2+ connection over to HFC. After a brief hiccup getting OpenBSD talking to it, we were away. It’s been pretty stable so far. Stable enough now that I haven’t had the ADSL modem connected in weeks.

At the time of migration, I could have migrated the telephone too to NBN phone, however there’s a snag: how do you access NBN phone? The answer is the ISP sends you a pre-configured ATA+Router+WiFi AP (and sometimes there’s an ADSL modem in there too). As I had decided to use my own router, I didn’t have such a device.

I asked Internode about the SIP details, and the situation is this: when they provision a NBN connection, there’s an automated process that configures their VoIP service then stores the credentials and other settings in a ACS server. They have no visibility of these credentials at all. Ordinarily, if I had purchased hardware, they would have provisioned that with credentials that would allow it to authenticate over the TR069 protocol. So without spending $200 on a device I wasn’t going to use, I wasn’t getting these credentials.

There’s another option though: VoIP. The NBN phone is actually a VoIP service, so fundamentally nothing much changes. Keeping the services separate though does give me flexibility in the future. There’s a list of VoIP providers as long as my arm that I can go with.

VoIP is notoriously difficult to set up though, I didn’t want to mess up our only incoming telephone service, so I decided to migrate the NBN phone number over to Internode’s NodePhone VoIP service which would allow me to experiment.

Requirements

So, first thing to consider is what my needs are. We have 4 devices that are plugged into the telephone line at present:

  • Telecom Australia Touchfone 200 wired telephone
  • Telstra 9200a cordless telephone base-station
  • Epson WF-7510 printer/scanner/fax
  • Maestro Jetstream 56kbps modem

Now, we don’t send many faxes (maybe two a year), and the last time that modem was used, it was to dial into a weighbridge system in Rockhampton after my workplace had moved to VoIP.

That weighbridge system was originally built in 1995 on top of computers running SCO OpenServer 5 and using SCO UUCP over dial-up lines running at 1200 baud (some of the nodes were at remote sites with dodgy phone lines). In 2013 the core servers were upgraded with new hardware and Ubuntu 12.04LTS, but the UUCP links remained as the SCO boxes at sites were slowly replaced. mgetty would answer the phone, and certain user accounts would use uucico as the “shell”. For admin purposes, we could log in, and that would give us a BASH prompt.

Any time we had a support issue at work, muggins would be the one to literally “dial in” to that site. While it’s been years since I’ve needed to touch it (and I think now there’s a VPN to that site), I wanted to retain dial-up capability if needed.

As for the faxes… the stuff we’re doing can be done over email. Getting the modem and the fax machine working is a stretch goal.

99% of the traffic will be voice traffic. I’m not sure if it’s possible to use double-adaptors with an ATA, and so for hardware I opted for the Grandstream HT814. These are available domestically (e.g. MyITHub) for under AU$130 and looked to be pretty good bang-per-buck. I just wasn’t sure how well it’d get along with the old T200.

As a contingency plan, I also ordered an IP phone, a Grandstream GXP1615 which is also available locally. That way if the T200 gave me problems, I’d just wire up Ethernet to the old point where the T200 was and put the GXP1615 there.

Since I’ve got two SIP devices, I need to be able to control incoming and outgoing call flows. So that means running a soft-PBX somewhere. Asterisk is the obvious choice here, being a free-software VoIP package with lots of flexibility. OpenBSD 6.6 ships with Asterisk 16.6.2.

Opening the account

Opening the account is simple enough. As I was porting the NBN phone number over, I just needed some details off my last Internode bill. Later when I go to do the house phone some time next financial year, I’ll be after a Telstra bill (assuming Telstra Wholesale have sorted out their CoVID-19 issues by then).

I looked at the hardware options that Internode offered… and again, it was basically the same as what they offer for their Internet service.

One thing I note is that while Internode has been a great ISP (I switched to them in 2012), their credentials as a VSP are still developing somewhat.

Initial connection

Provisioning was much less smooth than my initial ADSL connection (which was practically seamless) or the subsequent move to HFC NBN. The first bump in the road was when they emailed me:

Subject: Internode: Your Internode NodePhone VoIP service xxxxxxxxxx can’t make/receive calls yet [xxxxxxxxx]

Following activation of your NBN service, we ran some tests to make sure things were running smoothly.

We weren’t able to detect that your NodePhone VoIP service (phone number xxxxxxxxxx) has been set up successfully.

Initial contact regarding the telephone service…

Okay, fair enough, on the date I received that email I had only just received the ATA that I had purchased (through another supplier). Evidently they thought I had the hardware already. I found some time and quickly cobbled together a set-up:

First test set-up: siproxd on the border router to the HT814

I did some quick research, rather than doing NAT, I figured siproxd was closer to my eventual goals, so I installed that on the border router as there were fewer knobs and dials to deal with. I configured the HT814 with the settings as best I understood them from Internode’s guides with one difference: I set the Proxy field to my border router’s internal IP address.

This failed with Internode’s server giving me a 404: Not Found response when my HT814 sent its REGISTER request. I took some captures using tshark and reported this back to their helpdesk. I disconnected the ATA since there was no sense in banging on their front door every 20 seconds: it wasn’t working.

They got back to me a few days later and told me I had the right user name, and that my number still wasn’t registered. Plugging the ATA in, same response, 404: Not Found. In exasperation, I tried some changes:

  • Different settings on the HT814 (too many to list)
  • Trying with Twinkle on my laptop connected to the DMZ, both through via siproxd and also shutting down siproxd and setting up NAT.
  • Installing Asterisk on the border router (which was in the plans) and trying to set that up.

All three hit the same problem, 404: Not Found. I figured either I had entered the same wrong details 3 times, or it was definitely their end. So I reported that, unplugged the ATA and waited. This was on the 27th March.

On the 31st March, I get an email from Internode Provisioning to say they would be porting the number soon. After receiving this email, REGISTER worked, I was getting 200: OK in reply. Funnily enough, there was no challenge to credentials, it just saw my end and said “OK, you’re in”.

Calling outbound after this worked: the call trace showed the Internode end challenging the ATA for credentials, but once supplied, it connected the call, all worked. Inbound calls were a different matter though, a recorded (female) voice announced that the number was invalid. We were half-way there.

The next day, that message changed, it now was a recorded (male) voice announcing the number was unavailable, and offering to leave a voice-mail message. Progress. I had not changed my end, I still had T200 → HT814 → siproxd on the border router as my set-up. I was not seeing any incoming traffic being blocked by pf, although I was seeing people playing with sipvicious. I decided to firewall off my SIP ports, only exposing them to Internode’s SIP server.

Later on the help-desk got back to me. Despite their server telling me 200: OK when I sent a REGISTER, the number was still “not registered”. Confusing!

On a whim, I ripped out siproxd again and set up NAT on the border router. BINGO! We registered, and incoming calls worked. Internode use Broadsoft’s BroadWorks platform for their NodePhone VoIP service, and something about the operation of siproxd confused it. It then sent a misleading response leading my devices to think they were registered when they were not!

At this point, it was time to do two things: uninstall siproxd, and start reading up on Asterisk.

Configuring Asterisk

Asterisk is regarded as the “swiss army knife” of VoIP. It can be configured to do a lot. Officially, it is supported on Linux i386 and AMD64 platforms, but there are builds of it for platforms such as the Raspberry Pi.

OpenBSD also build and ship it in their ports. I wanted to avoid the pain of NAT, and I also wanted to avoid the telephone service going off-line if my cluster went haywire. So installation of this on the border router was a no-brainer. Yes, it’s adding a bit more attack surface to that box, but with appropriate firewalling rules, this could be managed.

As for configuration, there were two SIP channel drivers I could use, either the old chan_sip method, or the newer res_pjsip method. I had seen guides that discussed the older method (e.g. OCAU, WP), however in the back of my mind is the question: “how long do Digium plan to keep supporting two methods?” Thus from the outset I decided to use res_pjsip.

Audio CODECs

This is a pretty big aspect of configuration and should not be skimped on. You’ll find even if you buy all your equipment from one supplier, there are differences in what audio CODECs are supported. For instance the HT814 supports the OPUS CODEC (something I’d like to take advantage of eventually) but the GXP1615 does not. Some CODECs require patent licenses (e.g. SILK, SIREN7, G.722.1, G.722.2/AMR-WB), some did require patent licenses but are now “free” (G.729, G.723.1) and some are open-source (OPUS, Speex).

Some also go by multiple names. G.711a is also called PCMA and G.711u is PCMU. Pretty much everything supports these, and depending on the country you’re in, one will be preferred over the other. In my case, G.711a is the preferred option.

Your voice provider is a factor here too. Some only support particular CODECs, some will allow any CODEC. NodePhone VoIP allegedly will allow you to use whatever you like, but calls into and out of their network to non-VoIP targets may be restricted to narrow-band CODECs.

Internode-specific gotcha: G.729

One gotcha I stumbled on the hard way was that sometimes SIP implementations do not play by the rules.

Specifically, I found once I got Asterisk installed, incoming calls would be immediately hung-up on the moment I answered. Mobile or PSTN didn’t matter. I fired up tshark again and dug into the problem. The only clue I had was this error message:

[Apr  5 16:27:40] WARNING[-1][C-00000001] channel.c: Unable to find a codec translation path: (g729) -> (alaw)

Even if I specified only use G.711a (alaw), somehow I’d still get that message. I asked about it on the Asterisk forum. I was seeing this pattern in my SIP traffic:

|Time     | ${PROVIDER}                           |
|         |                   | ${ASTERISK_BOX}   |                   
|0.000000 |         INVITE SDP (g711A g7          |SIP INVITE From: <sip:${MYPSTNNUM}@${PROVIDER}29;user=phone> To:"${PROVIDER_NAME}"<sip:${MYSIPNUM}@${PROVIDER_DOMAIN}> Call-ID:BW061858648050420-1965318496@${PROVIDER}   CSeq:612303213
|         |(5060)   ------------------>  (5060)   |
|0.003443 |         100 Trying|                   |SIP Status 100 Trying
|         |(5060)   <------------------  (5060)   |
|0.025416 |         180 Ringing                   |SIP Status 180 Ringing
|         |(5060)   <------------------  (5060)   |
|0.954015 |         200 OK SDP (g711A g7          |SIP Status 200 OK
|         |(5060)   <------------------  (5060)   |
|0.986287 |         RTP (g711A)                   |RTP, 6 packets. Duration: 0.099s SSRC: 0x4F53507
|         |(27642)  <------------------  (18024)  |
|1.092729 |         RTP (g729)                    |RTP, 3 packets. Duration: 0.041s SSRC: 0xCD74F85
|         |(27642)  ------------------>  (18024)  |
|1.097523 |         ACK       |                   |SIP Request INVITE ACK 200 CSeq:612303213
|         |(5060)   ------------------>  (5060)   |
|1.099552 |         BYE       |                   |SIP Request BYE CSeq:29850
|         |(5060)   <------------------  (5060)   |
|1.149521 |         200 OK    |                   |SIP Status 200 OK
|         |(5060)   ------------------>  (5060)   |

I was reliably informed that this was definitely against the rules. So another help-desk email informing them of the problem. If you’re an Internode NodePhone VoIP customer, and you see the above behaviour, these are your options:

  1. Purchase and Install Digium’s G.729 CODEC: only an option if you are running Asterisk on a i386 or AMD64-based Linux machine.
  2. If, like me, you’re running Asterisk on something else, or you despise proprietary software, there is an open-source G.729 CODEC. On OpenBSD, install the asterisk-g729 package.
  3. Asterisk have added a work-around in later versions of their code. Patches exist for version 17, 16 and 13. It is also included in 13.32.0, 16.9.0 and 17.3.0. OpenBSD 6.7 will likely ship with a version of Asterisk that includes this work-around.
  4. You can also complain to their help-desk. We can work-around their problem, but really, it’s their end that’s doing the wrong thing, they should fix it.

Dial Plans

The other big thing to consider is your dial-plan layout. Every SIP endpoint is assigned a context which is used in extensions.conf to determine what is meant when a particular sequence of digits is entered or how a call should be routed.

The pattern syntax is documented on their Pattern Matching page. Notably, extensions are “literal” if they do not begin with an underscore (_) and a X matches any numeric digit. So an example:

exten => 123456,1,DoSomething()
exten => 123987,1,DoSomethingDifferent()
exten => _123XXX,1,DoSomethingElse()

The non-pattern extensions take precedence. So the number 123456 would exactly match the first extension listed above and would call DoSomething(). The number 123987 would exactly match the second entry and would call DoSomethingDifferent(). Any other 6-digit number beginning with 123 would trigger DoSomethingElse().

Avoiding expensive mistakes

Before doing anything, it’s worth reading Asterisk’s page on Dial plan Security. You do NOT want someone to call your PBX, then dial outbound to expensive international numbers and run up a big phone bill! In particular, you want to keep the default context as lean as possible! It’s fine to put all your internal extensions in default, but anything that dials outside, should be in a separate context for internal extensions.

Dialling more than one phone at a time

It is possible to have a dial-plan entry ring multiple phones in parallel by separating the endpoints with & symbols, but don’t put spaces around the & characters! So Dial(PJSIP/ep1&PJSIP/ep2&PJSIP/ep3), not Dial(PJSIP/ep1 & PJSIP…). The most obvious use case for this is when someone rings the home number and you want all phones to ring.

Incoming calls in the dial plan

When a call comes in from outside, the number dialled by the outside party (i.e. your number) appears as the extension dialled.

A simple option is to just ring everyone… so if your phone number was, say 0735359696, you’d create a context in your extensions.conf like this:

; Incoming calls
[incoming]
exten => 0735359696,1,Dial(PJSIP/ep1&PJSIP/ep2&…)

… then in pjsip.conf, assign that context to the endpoint:

[Provider-endpoint]
type=endpoint
context = incoming
; … etc

Outgoing calls

Usually you want to be able to ring outbound too. Some guides suggested the pattern _X! can be used to “match all” but I couldn’t get this to work.

Fax over IP

I mentioned that we had a fax machine we wanted to continue using. Whilst we don’t use it every day, it’s nice to know it is there if we want to use it.

Likewise with the dial-up modem. Even if I needed to do reduced-speed for it to work, it’d be better than nothing for situations where I need to use dial-up. It’ll never connect to the Internet again, but that doesn’t make it useless.

There are two ways to do Fax over IP. One is to use G.711u and hope for the best. The other is to use T.38, where the ATA basically “spoofs” the fax modem at the remote end, decodes the symbols being sent back to digital data, encodes that in UDP packets and transmits those over the Internet. The other end then reverses the process.

The good news is there are diagnostic tools out there for testing purposes. You don’t (yet) have to know of someone who has a working fax. Here in Australia, there’s FOLDS-B.

I’d recommend if you’ve got multiple fax-capable devices though, you plug two of them into separate ports on one or more ATAs and try faxing from one extension to the other. I tried calling FOLDS-B directly at first with no luck, then tried setting up Asterisk with an extension that calls SendFax to send a TIFF image, but had no luck — handshaking would be cut short after a second.

Using two internal endpoints saved a lot of phone calls externally and allowed me to “prove” the ATA could work for this.

Once you’ve got things working locally, you’re ready to hit the outside world. You’ll need a fairly complex image to send as it needs to be transmitting for ~70 seconds. I tried a few pages (e.g. this one) on the flatbed scanner of the WF-7510 without much luck… the fax would barely muster 20 seconds of data for them even at “fine” levels.

I had better luck using the 56kbps modem and a copy of efax-gtk. I took RCA’s test card image off Wikipedia as a SVG, loaded that up into The Gimp, rendering the SVG at 600dpi, extending the image size to A4-paper sized, rotated it to portrait mode, then converted it to 1-bit-per-pixel monochrome with patterned dithering and saved it as an uncompressed TIFF.

I then passed this through tiff2ps which gave me this file:

Sending that at 14400bps took 72 seconds. Perfect! Then the reply came back. Fax reception is very much a work-in-progress, so rather than receive on the VoIP line, I decided to use the PSTN to receive the reports. This was accomplished by specifying my PSTN phone number in the fax identity (FOLDS-B evidently doesn’t check this against caller ID).

I tried again, and this time, received the report. FOLDS-B got a garbled mess apparently. The suggestion was to set the speed to 4800bps. In efax this is accomplished by setting the modem capabilities:

CAPABILITIES
       The capabilities of the local hardware and software can be set using a string of 8 digits separated by commas:

       vr,br,wd,ln,df,ec,bf,st

       where:

       vr  (vertical resolution) =
                0 for 98 lines per inch
                1 for 196 lpi

       br  (bit rate) =
                0 for 2400 bps
                1 for 4800
                2 for 7200
                3 for 9600
                4 for 12000 (V.17)
                5 for 14400 (V.17)

So in this case, I should set the capabilities to 1,1,0,2,0,0,0,0. I tried this, but then found the call just didn’t negotiate either. On a whim again I tried 7200bps (that’s 1,2,0,2,0,0,0,0). Eureka! I got this:

A pretty much “perfect” FOLDS-B report at 7200bps.

The transmission level and SNR are pretty much spot-on. Emboldened by this, I tried again faxing the same image (which I had printed out) from the WF-7510. I chose “Photo” quality and scanned it on the flat-bed. This didn’t work so well — evidently some detail was missed and it only transmitted for 60 seconds so I tried again, and faxed something else for the second sheet.

FOLDS-B wasn’t happy about the second page, but at least it had something to go on. The fax modem in the WF-7510 doesn’t appear to have a speed control other than turning V.34 mode (33.6kbps) on/off. It appears it tried sending at 14400bps. FOLDS-B tells a sorry tale:

An ugly 14400bps transmission

Now it is possible to get near perfect results at 14400bps through an ATA that supports T.38. Maybe a longer transmission on the first page might have helped, but fair to say the speed is not helping matters.

Transmit level is very quiet, and I can’t see a way to adjust this on the WF-7510. Nor can I force it to V.29 (7200bps) mode.

There’s allegedly a “reference” test sheet you can get by “receive polling” 019725112. When I ring this number (with a fax or telephone) I get a recorded message saying the number is not valid. If anyone knows what the correct number is, or how to get a copy of this reference sheet, it’d be greatly appreciated.

So yes, Fax over IP is doable… a lot of pissing about with ATA settings… and you better hope the fax machine itself has lots of knobs and dials. Hylafax + t38modem will probably crack this nut. Then again, so will email.

Configuration Settings

Asterisk Configuration

So whilst my set-up is still a work-in-progress, I figured I’d post what I have here.

Most of these are defaults shipped with OpenBSD 6.6. Specifically of interest would be pjsip.conf and extensions.conf.

pf firewall rules

You’re going to want to pierce your firewall appropriately to expose yourself just enough for your VoIP service to work.

# Define some interfaces
internal=em0

# SIP addresses
sip_provider=203.2.134.1  # sip.internode.on.net
sip_endpoints="{ 10.0.0.3, 10.0.0.4 }"

# Allow SIP from router to SIP devices/softphones and vice versa
pass in on $internal proto udp from $sip_endpoints to self port { 5060, 10000:30000 }
pass out on $internal proto udp from self port { 5060, 10000:30000 } to $sip_endpoints

# Allow SIP from Internode to self and vice versa.  This could be
# tightened up a lot further.  Maybe try some calls both ways and log
# the traffic to see which specific ports.  "All UDP ports" is in line
# with Internode recommendations:
# https://www.internode.on.net/support/faq/phone_and_voip/nodephone/troubleshooting_nodephone/#Do_I_have_to_open_up_any_ports_i
pass in on egress inet proto udp from $sip_provider to self
pass out on egress inet proto udp from self to $sip_provider

Grandstream HT814 settings

Some notable settings first before I dump full screenshots…

Ring Cadence Settings

A special thank-you to @scottsip on the Grandstream forums for pointing me to this document from the ITU (Page 4 has the settings for Australia). Also worth mentioning is this Whirlpool forum thread and this review/teardown of the Grandstream HT802. The ones marked (?) I have no idea what they’re for.

  • System Ring Cadence: c=400/200-400/2000;
  • Dial Tone: f1=400@-12,f2=425@-12;
  • Busy Tone: f1=425,c=38/38;
  • Reorder Tone: f1=480,f2=620,c=25/25; (?)
  • Confirmation Tone: f1=350@-11,f2=440@-11,c=100/100-100/100-100/100; (?)
  • Call Waiting Tone: f1=425,c=20/20-20/440;
  • Prompt Tone: f1=350@-17,f2=440@-17,c=0/0; (?)
  • Conference Party Hangup Tone: f1=425@-15,c=600/600;

Notable fax-related settings

I changed lots of these trying to get something working, so if you set this and still don’t get any joy, check the screenshots below.

  • Fax Mode: T.38
  • Re-INVITE After Fax Tone Detected: Enabled
  • Jitter Buffer Type: Fixed
  • Jitter Buffer Length: Low (note, I can get away with this because it’s Ethernet LAN from ATA to Asterisk box)
  • Gain:
    • TX: 0dB
    • RX: 0dB
  • Disable Line Echo Canceller: Yes
  • Disable Network Echo Suppressor: Yes

Screenshots

Grandstream GXP1615 Settings

These are much the same as the HT814 above. The cadence settings use a slightly different syntax (they don’t have the @nn parts) and there are more tone settings. Again, the ones marked with (?) have an unknown purpose.

  • System Ringtone: c=400/200-400/2000;
  • Dial Tone: f1=400,f2=425;
  • Second Dial Tone: f1=450,f2=425; (?)
  • Message Waiting: f1=525;
  • Ring Back Tone: f1=1209,f2=852,c=20/20-20/200;
  • Call-Waiting Tone: f1=425,c=20/20-20/440;
    • Gain: Low
  • Busy Tone: f1=425,c=38/38;
  • Reorder Tone: f1=480,f2=620,c=25/25; (?)
GXP1615 ringtone settings

The steps forward

Things I need to figure out with this system:

  • Feature Codes: these are used to signal events like “transferring calls”, “park”, and other features you might want to initiate in the PBX. They are normally signalled using DTMF tone sequences, however this can give problems if your tones clash with some IVR you’re interacting with. I’m currently researching this area.
  • OPUS support: I mentioned the HT814 supports it, and as it’s an open CODEC I’d like to support it. There is this project which provides OPUS support to Asterisk. I’ll probably look at writing an OpenBSD port for it.
  • Voice mail and IVR… I’m thinking something along the lines of “Dial the year of birth of the person you wish to reach”… then that can ring the relevant phones or offer to leave a message.
  • I’d like to test wideband audio at some point. Work seems to have G.722 on their phones (or maybe its OPUS?), I should see if wideband pass-through in fact works.