WICEN

Earmuff headset reverse engineering

So, previously I’ve tried looking for a suitable earmuff headset, and drew blanks: they wanted $400+ for them and I wasn’t willing to spend that much. That said, there are times when such a headset is useful: if it’s raining heavily on the roof of the annex, hearing the radio is bloody difficult!

Now, my tactical headset should fulfil that, but really it was intended for less noisy environments. What if that’s not enough? Years ago, Brisbane WICEN used to help out with the car rallies, and there you really do need something that will muffle outside noise. I had tried to make my own, with limited success, but one wondered how the commercial options compared.

The other day, I spotted this earmuff headset. One thing I note, the seller seems unaware of how these things are worn — that’s a neck band fellas, not a head band. (Or maybe their ears are wider than they are tall? Mine are taller than they are wide.)

This is up the upper limit of what I was willing to spend, but what the hell… we’ll try it. They arrived today (about 6 days ahead of schedule). They’re comfortable enough — sound dampening is reasonable. I like the fact the microphone is on the right side. There’s a PTT button on the left ear-cup and a volume control on the right.

Being a behind-the-head design, it plays nice with my hard hat too. The head strap is a little tricky in my case, since my hard hat has a chin strap — I figured out I had to undo the head strap, thread the headset through the chin strap, then put hard hat and headset on together. I should be able to wear it with the coolie hat as well.

A downside is basically when I do this, I can’t take the headset off without taking the hard hat off too, but the intent of this was something that I was going to leave on anyway, and the same criticism is true of the tactical headset too.

The headset, with its packaging and cables.

One wrinkle is that this headset you can see, uses a Kenwood-style pin-out — my Kenwood TH-D72A died recently, so great timing there. Luckily for us though, it’s just a small pigtail: we can make our own up to suit this pin-out. The pig-tail actually is a 6-pin Mini-DIN: Jaycar part number PP0366. I was thinking of chopping cables, but this is even simpler, we’ll just make our own pigtail.

At the other end, it appears to be a 5-pin mini-XLR, but I haven’t tried buzzing that out at this stage. As is typical with comms headsets, this one is mono. There’s a 3.5mm jack (left earcup) for plugging in a comms receiver or media player, and it works — in mono. The microphone appears to be an electret.

For now, I’ve decided to have a look at reverse-engineering the pin-out to use the headset as-is. I rummaged around and found the aforementioned connector in my junk box, still sealed in its packet, so I opened that up, and used the little insert inside to plug into the socket, put the headset on, and went buzzing with the multimeter. Using the Kenwood pigtail for clues (since I know the Kenwood pin-out), I located the PTT, speaker and microphone connections on the headset cable.

One pin appears to be a no-connect; the PTT and microphone are commoned.

Reverse-engineered Mini-DIN 6 pin-out

Yaesu FTM-350AR audio interface: part two

Today I rummaged through my junk box and pulled out:

  • a screw-terminal type RJ-45 jack
  • a 3.5mm stereo panel-mount socket

Sadly, the socket is too big to fit inside the RJ-45 jack housing, but I was able to cobble together a test using the jack as a break-out.

As it happens, there’s some repeater traffic that I’m hearing through the tablet beautifully. I set up Mumble to just transmit on VOX activity — this seems good enough. The signal out of the radio is quiet, so VOX works well enough for detecting the incoming signal with no false detection.

There’s a fraction-second delay, which is to be expected for a cobbled together Radio-over-IP solution such as this. It’ll be annoying if you happen to be next to the radio, but probably fine if you’re away from the radio (or you use a headset so you hear the delayed signal through the headset louder than the real-time signal out of the radio).

Things I need to look into from here:

  • Computer transmit audio interface
  • Computer PTT interface
  • Mumble integration into the above

Yaesu FTM-350AR audio interface: part one

I do a lot of support for horse endurance rides up at Imbil, and that means running a base for about 3 or 4 check-points. My go-to radio for this has been the Yaesu FTM-350AR, which is a good rig, but the head unit has woefully pathetic audio output.

The mono speaker is better, and you can get pretty good results plugging an un-amplified pair of speakers into the back — I use some old “SoundBLASTER” speakers from a SoundBLASTER 16 sound card. Works great if you’re in earshot of the radio, and much better than the built-in speaker in the radio body or the pathetic excuse for speakers in the head unit.

The radio can do cross-band, and this works pretty well, assuming you can find a pair of frequencies that don’t have interference. However, cross-band means you can’t use the radio directly as a standard radio, you must use a second radio to communicate on the cross-band link. In addition, it also means you can’t monitor more than one frequency at a time in base, or run packet from the same radio.

I’ve considered whether I make an audio interface that would link the radio to Mumble. Mumble is a VoIP solution for online gaming, and amongst other things, behaves much like a PTT-style radio in operation. The thinking was to interface the audio and PTT signals between the radio and the Mumble client, so that any transmit audio on the Mumble channel is fed to the radio, and vice versa.

Computer to radio is easy enough. Some years ago when I was doing the Brisbane to Gold Coast bicycle ride (net controller, in the back seat of the control vehicle), I had made an adaptor that let me plug in my DIN-5 headsets into the microphone port. I basically ignored most of the contacts, just paying attention to 0V (GND), EXTMIC and PTT.

Microphone pin-out for the FTM-350AR

An annoyance at the time is there was nowhere to plug headphones except the external speaker port, which is a little over-powered for headphones, so still had to rely on the radio’s speaker with my headset.

For a two-way radio interface, I’ll need to tap into that speaker audio. Now, an attenuator pad for the external speaker connector is a viable option, however it has the downside that it disables the internal speakers, meaning you need either a double-adaptor break-out, or you rely on the computer forwarding the audio somehow.

I contemplated opening up the head unit and adding a headphone jack, but then came to the conclusion that the audio is unlikely to be a digital signal being sent to the head unit — it must exist in analogue form on the 8-wire link between the head-unit and main-unit. Sure enough, I found some schematics, and there it is:

FTM-350AR head unit connector, showing left and right speaker connections

This goes off to a (rather anaemic) audio amplifier, so likely are line-level signals. Looks like I should be able to make a cable that taps off pins 2, 3 and 4 bringing those out to a 3.5mm jack which I can plug into an audio interface of my choosing. I have a Behringer U202 USB audio interface, seems like a good candidate for this experiment.

I lack the PTT signal on this connector, it is likely multiplexed in the TXD line, so unless I feel like reverse-engineering Yaesu’s protocols, the easiest bet is that I only tap received audio from here, and use the microphone socket for transmit audio (which I already know how to drive).

As for the software side, talKKonnect is one such option I could employ here.

Time to rummage through the junk box and see what I have on hand.

A stereo/binaural tactical headset: part three

So, last time I 90% finished the headset I’m likely to use at horse endurance ride and other “quiet” emergency comms events in the near future. The audio quality (at least on receive) sounds great. From what I can tell between hand-helds, the transmit audio sounds good. It’s quite comfortable to wear for extended periods, and while my modifications do muffle sound slightly, it’s perfectly workable.

There are just a couple of niggles:

  1. the headset uses a dynamic microphone, thus is not compatible (microphone-wise) with the other radio interfaces I have
  2. I used solid-core CAT5 which is sure to develop a fault at some inconvenient moment
  3. the cable to the connector is way too short

CAT5 was fine for a proof-of-concept, but really, I want a stranded cable for this. Being a dynamic microphone, it’s not necessary for it to be screened, and in fact, we should not be using unbalanced coaxial-type cable like we’d use on an electret microphone. That brings up another problem: interfaces designed for an electret will not work with this microphone — the impedance is too low and they’ll supply a bias current which needs to be blocked for dynamic microphones.

Right now I use a DIN-5 connector, but this is misleading — it implies it’ll connect to any radio interface with a DIN-5, and that my electret headsets will plug into its interfaces. At most I can listen with such a set-up, but not talk. The real answer is to use a completely different connector to avoid getting them mixed up. I decided whatever I used, it should be relatively common: exotic connectors are a pain to replace when they break. My criteria is as follows:

  • As discussed, common, readily available.
  • Cheap
  • Able to carry both speaker and microphone audio in a single connector so we don’t get speaker and microphone mixed up
  • Polarised, so we can’t get a connector around the wrong way
  • Ruggedised
  • Panel and cable mount versions available

Contenders I was considering were the 240° DIN-5 (I bought some by mistake once), 5-pin XLR and mini-XLRs, and the humble “CB microphone” connector. Other options I’ve used in the past include the DIN-7/DIN-8 and DE15HD (aka “VGA” connectors). DIN-7/DIN-8s can be fiddly to solder, and are overkill for the number of contacts. Same with DE15HDs — and the DE15HDs do not like moisture!

In the end, I decided the CB microphone connector seemed like my best bet. Altronics and Jaycar both sell these. I don’t know what the official name of these things is. They were common on radio equipment made between the mid-70s through to the late 80s — my Yaesu FT-290R-II uses an 8-pin connector, my Kenwood TS-120S uses a 4-pin. They’re pretty rugged, feature a screwing locking ring, and have beefy contacts for passing current. Usually the socket is available as a panel-mount only, but I found Altronics sell a cable-mount version (and today I notice Jaycar do too). If someone knows a RS/Mouser/Element14/Digikey link for these, I’ll put it here.

The big decision was to also consider how to wire the connector up. As this is “my own” standard, I can use whatever I like, but for the sake of future-me, I’ll document what I decided as I’ve forgotten how I wired up DIN-5’s before. I did have it written down, but misplaced that scrap of paper. I ended up quickly opening up a connector and taking this photo to refresh my memory.

A photo of an actual headset connector, showing the connections.

To wit, I therefore shall commit to public record, exactly how I wired this thing, and propose a standard for dynamic microphone headsets.

The current (left) DIN-5 pin-out, and my proposed “CB microphone” pin-out — both looking into socket

Some will point out that yes, I’m creating yet another standard. In my defence, mine is aimed at stereo headsets, which traditionally have been two separate 3.5mm phone jacks. Very easy to mix up. Some might argue that there exists a new standard in the form of the 4-pole TRRS connector, however not all interfaces are compatible — at the time when I devised the DIN-5 connector, I was using a Nokia 3310 which did not like having the microphone and speaker connected to a common pin.

Keeping them separate also allows me to do balanced audio tricks for interfacing electret microphones with radios like the Yaesu FT-857D which expect a dynamic microphone. For this; I need 5 contacts — left/right speaker, speaker common, and two for the microphone. There are 5-pole TRRRS connectors, the TP-105 is one such example — but they’re not common outside of the aviation industry where they are used.

For the cabling, I’ve cut the CAT5 cabling shorter, and spliced onto the end some 4-wire telephone ribbon onto each side. That makes the headset cable a comfortable length. I began by first soldering the “CB microphone” connector, choosing colours for the speaker and microphone connections and wiring it up in a “loop”, before cutting the far end of the loop, stripping back insulation and tinning the wires. I used a multimeter to decide which was the “left” and “right” connections — then these were spliced with some heat shrink.

After a quick test on the radio, I sealed it up using some hot-melt glue. This should prevent the solder joints from flexing and thus prolong the life of the connection.

I might look at a small J-FET or BJT adaptor cable that will allow me to use this headset in place of an electret microphone headset — as it’d be nice to be able to just plug this into the tablet to listen to music or use with VoIP. I’ve got extra line-mounted sockets for that. Not sure if it’s viable to go the other direction — I’d need a small battery to power the electret I think, that or a bypass switch on the PTT cable to allow me to power an electret microphone.

That though, is a project for another day.

A stereo/binaural tactical headset: part two

So a few weeks back, a couple of tactical headsets turned up ordered from Amazon. When I tested them out, the first thing I found was the speaker audio, whilst okay for speech, was very tinny. I wanted a headset that I could tolerate wearing for horse endurance ride events where I often need to juggle a notepad, radio and maybe a tablet or keyboard. A headset works well for this. Also, if there’s a rain event and you’re under a canvas roof, hearing the radio can be a real challenge!

At the same time I wanted to be able to hear ambient noises, so I needed something that didn’t completely enclose me off. If IRoQ ever starts up again, I might be re-thinking this but for now, this is what I’m doing.

The two I bought are “bowman”-style headsets, which are normally mono. I wanted a stereo headset, so bought two, figuring they’re modular enough that I should be able to cobble them into one. I didn’t expect to have to do surgery on them, but there you go. I dug through my junk box, and found an old computer headset that was minus its microphone with 30mm drivers in it and foam ear pads. Pretty cheap set that you can probably buy at a corner-store computer shop for no more than about $15.

As a test, I grabbed the dissected headset from the previous post, and heated up the soldering iron. I de-soldered the original speaker, grabbed a speaker from this computer headset, de-soldered it from its original cabling and tacked the two wires from the bowman headset to it. I then grabbed my Alinco set and tried a little listening. BIG improvement! No, not audiophile-grade, but not crappy telephone grade either!

The speaker out of the computer headset was glued to a piece of plastic that clipped to the earcup and provided a surface for the foam padding to stretch over. As such, it didn’t quite “fit” in the space of the old one — so I trimmed the plastic back a bit and found I could jam it in there quite snugly. I then just needed something to “hold” it there. Anyway, proof of concept done, time to attack the second victim.

I tore open the second bowman headset I had, and fired up the soldering iron to liberate its speaker. It’s a similar (but not identical) one to the other headset. Also, the foam spacer is a different shape — I guess they just grab whatever is laying around the workshop. (sounds familiar!)

The one on the left was pulled out of the second headset this morning. The one on the right is from the previous teardown.

I grabbed the other speaker from the computer headset, tacked it onto the wires and tested — it too sounded a lot better. A little trimming, and it was ready for permanent installation.

Now, if I just wanted mono headsets, I could have left it there, but I wanted one stereo one. The U94 connector does have enough conductors to support this if I common the microphone and speakers, but there’s already civilian and military “standards” for these things, I don’t need to muddy the waters further with a custom one! For now I thought I’ll use my DIN-5 connector standard for this. So rummaged through the junk box, found a DIN-5 plug and socket. I also grabbed a length of CAT5 cable (solid-core, although stranded would have been better).

I de-soldered the U94 cables from both headsets, stripped the jacket off the CAT5, and separated two pairs for each side. To each headset, I soldered two of the four pairs: left side – blue/white blue to speaker, brown/white brown to microphone; right side – orange/white orange to speaker, green/white green to microphone. I then soldered the other ends to my DIN-5 plug — paralleling the two microphone connections so that I could choose which side I used the microphone on. (Or even put one microphone on each side — this does work although it looks damn silly!)

I wired up the DIN-5 socket to one of the U94 cables, bridging left/right channels. My standard actually uses electret microphones, and I suspect these headsets use dynamic microphones. When I plugged in the headset into my tablet — the microphone was not detected, so I’d say the tablet was expecting a 2kOhm electret not a 900ohm dynamic. But, plugging everything into the PTT cable for the Alinco, it all works — and sounds a lot better.

I finished up by fabricating new pieces of plastic to hold the speakers in — an old 2L milk bottle gave up some PET plastic for the job. I cut an oval-shaped piece with a hole in the centre for the speaker’s sound, and glued that over the speaker. I note the plastic now covers the openings that I was supposed to hear through, but the impact is minimal.

I still need to do something better for cable retention, but I’ll think of something. Maybe hot glue…

The two new speakers installed.

For the headband, I ditched the top-band and just used the two elastic straps — one across the front, one around the back. I find this works well — although the headsets are designed to use a single elastic strap, I suspect the strap was designed with smaller heads in mind (often the way with Made-in-China stuff) — I found it got a tad uncomfortable after a couple of hours.

Mostly Finished headset.

With the two straps on this “stereo” set, it’s a lot more balanced and comfortable. Plus, the speakers being of higher quality, listening comfort is improved — a big plus given horse ride events can go 24 hours, and I’ll likely be there operating for that entire period.

I’ll have to source an alternate 5-pin connector for these — being dynamic microphones, compatibility with devices that expect electret microphones is not a given. Maybe I need to use 120° 5-pin DINs or something. Something other than a U94 or a standard DIN-5, because this is stereo (unlike normal U94 headsets) and uses a dynamic microphone (unlike my other headsets).

Thinking about SDR on the bike

So, for close to a decade now, I’ve had a bicycle-mobile station. Originally just restricted to 2m/70cm FM, it expanded to 2m SSB with the FT-290RII, then later all-band using a FT-857D.

It’s remained largely unchanged all this time. The station is able to receive MW/SW stations as well, and with some limitations, FM broadcast as well. My recent radio purchases will expand this a bit, freeing up the FT-857D’s general-coverage receiver to just focus on amateur bands. It’s been a long-term project though to move to SDR for reception.

What I have now

Already acquired is a Raspberry Pi 4 (8GB model) and a NWDR DRAWS interface board. I actually started out with a Raspberry Pi 3 + DRAWS and was waiting for the case for it to fit into. At that stage was the idea that the FT-897D would do much as it does now, no SDR involved, and I’d put a small hand-held with its own antenna as an APRS rig being driven by the second port on the DRAWS.

Since then; I bought the HackRF One for work (I needed something that could give me a view of the 2.4GHz ISM band for development of the WideSky Hub), the SDR bug firmly bit. Initially it was just DAB+ reception, I decided to get a RTL-SDR to do that so my radio listening wouldn’t be interrupted when a colleague needed to borrow the HackRF. That RTL-SDR saw some use receiving UHF CB traffic at horse endurance ride events at Imbil — I stated to consider whether maybe this might be a better option as a receiver for more than just commercial radio broadcasts.

Hence I purchased the Pi4: I figured that’d have enough CPU grunt that it’d still be able to decode a reasonable amount even if the CPU throttled itself for thermal management purposes. A pair of SDR interfaces would allow me to monitor a couple of bands simultaneously, such as 2m and 70cm together, or 2m/70cm and one of the HF bands.

Even the RTL-SDR v3 dongles are wide enough to watch the entire 2m band. With CAT control of the FT-857D, it’d be possible for the Pi4 to switch the FT-857D to the same frequency and possibly manage some antenna switching relays as well.

A rough design

This morning I came up with this:

A rough design of the SDR set-up

A critical design feature is that this must have a “pass-through” option so that in the event the computer crashes/fails, I can bypass all the fancy stuff and still use the FT-857D safely as I do now without all the fancy SDR stuff.

So while in SDR mode: the station pushbuttons on the handlebar go to a small sequencing MCU that can report events to the Pi4, on transmit the Pi4 can then instruct that MCU to connect the antennas into bypass mode, short-out the SDR inputs to protect them, then engage the PTT on the FT-857D, and transmit audio can either be delivered direct via the analogue inputs as they are now, or over USB/WiFi/Bluetooth through the MiniDIN6 DATA port.

The thinking is to have two SDRs, one of which is “agile” between HF/6m and 2m/70cm modes.

The front-end will be handled via the tablet: a Samsung Galaxy Active3 which can connect over WiFi or USB CDC-Ethernet.

I’ve shown gain-blocks between the antennas and the receivers, this is largely for impedance matching as well as to account for the losses involved in antenna sharing. Not sure what these will technically look like.

The two on the HF side should be ideally 0-60MHz devices. If I use the AirSpy HF+ as pictured, the VHF/UHF LNA connected to it only has to concentrate on the VHF band below 260MHz (really 144-148MHz, but let’s widen that to 87-230MHz for FM broadcast, air-band and DAB+) since that’s where the AirSpy stops.

The other, for now I’m looking at a RTL-SDR since I have one spare, but that could be any VHF/UHF capable SDR including the AirSpy Mini — the LNA on it, as well as the one feeding the FT-857D in receive mode will both need to handle 144-450MHz at a minimum.

It may be these frequency bands are “too wide” for a single device, and so I need to consider band-pass filters + separate band-specific LNAs and additional switching circuitry.

SDR selection

There are a couple of options I’ve considered:

The thing I don’t like about the SDRPlay Duo is the non-free nature of its libraries which seem to be only available for i386 or AMD64. Otherwise on paper it looks like a nice option.

KerberosSDR/KrakenSDR seems like overkill. It’s basically four (or five) RTL-SDRs sharing a common oscillator which is essential for direction-finding, but let’s face it, I’ll never have enough antennas to make such an application feasible on the bicycle. It looks like an echidna now!

BladeRF looks nice, but is pricey and stops short of the HF band so would need an up-converter like the RTL-SDR — not a show-stopper. That said, it’s dual-channel and can transmit as well as receive, so cross-band repeater would be doable.

I should try this with the HackRF One some day, see if I can combine a conventional transceiver + RPi + DRAWS/UDRC + HackRF One to make a cross-band repeater.

The Airspy HF+ is available domestically, and isn’t too badly priced. It doesn’t transmit like the HackRF does, but then again I could stuff one of my Wouxun KG-UVD1Ps in there wired up to the second DRAWS port if I wanted a traditional cross-band set-up.

Next steps

It would seem the LNA / antenna sharing side of things needs consideration next. RF relays will need to be procured that can handle seeing 100W of RF. Where I’ve drawn a single switch, that’ll likely be multiple in reality — when the transmitter is connected to the antenna, the receivers should all be shorted to ground so they don’t get blown up by stray RF.

Maybe the LNAs feeding the FT-857D will need to be connected to a dummy-load to protect them, not sure. Perhaps LNAs aren’t strictly necessary, and I can “cheat” by just connecting receivers in parallel, but I’m not comfortable with this idea right now. So this is the area of research I’m focusing on right now.

Phone patching to Zoom

Brisbane Area WICEN Group (Inc) lately has been caught up in this whole COVID-19 situation, unable to meet face-to-face for business meetings. Like a lot of groups, we’ve had to turn to doing things online.

Initially, Cisco WebEx was trialled, however this had significant compatibility issues, most notably, under Linux — it just straight plain didn’t work. Zoom however, has proven fairly simple to operate and seems to work, so we’ve been using that for a number of “social” meetings and at least one business meeting so far.

A challenge we have though, is that one of our members does not have a computer or smart-phone. Mobile telephony is unreliable in his area (Kelvin Grove), and so yee olde PSTN is the most reliable service. For him to attend meetings, we need some way of patching that PSTN line into the meeting.

The first step is to get something you can patch to. In my case, it was a soft-phone and a SIP VoIP service. I used Twinkle to provide that link. You could also use others like baresip, Linphone or anything else of your choosing. This connects to your sound card at one end, and a Voice Service Provider; in my case it’s my Asterisk server through Internode NodePhone.

The problem is though, while you can certainly make a call outbound whilst in a conference, the person on the phone won’t be able to hear the conference, nor will the conference attendees be able to hear the person on the phone.

Enter JACK

JACK is a audio routing framework for Unix-like operating systems that allows for audio to be routed between applications. It is geared towards multimedia production and professional audio, but since there’s a plug-in in the ALSA framework, it is very handy for linking audio between applications that would otherwise be incompatible.

For this to work, one application has to work either directly with JACK, or via the ALSA plug-in. Many support, and will use, an alternate framework called PulseAudio. Conference applications like Zoom and Jitsi almost universally rely on this as their sound card interface on Linux.

PulseAudio unfortunately is not able to route audio with the same flexibility, but it can route audio to JACK. In particular, JACKv2 and its jackdbus is the path of least resistance. Once JACK starts, PulseAudio detects its presence, and loads a module that connects PulseAudio as a client of JACK.

A limitation with this is PulseAudio will pre-mix all audio streams it receives from its clients into one single monolithic (stereo) feed before presenting that to JACK. I haven’t figured out a work-around for this, but thankfully for this use case, it doesn’t matter. For our purposes, we have just one PulseAudio application: Zoom (or Jitsi), and so long as we keep it that way, things will work.

Software tools

  • jack2: The audio routing daemon.
  • qjackctl: This is a front-end for controlling JACK. It is optional, but if you’re not familiar with JACK, it’s the path of least resistance. It allows you to configure, start and stop JACK, and to control patch-bay configuration.
  • SIP Client, in my case, Twinkle.
  • ALSA JACK Plug-in, part of alsa-plugins.
  • PulseAudio JACK plug-in, part of PulseAudio.

Setting up the JACK ALSA plug-in

To expose JACK to ALSA applications, you’ll need to configure your ${HOME}/.asoundrc file. Now, if your SIP client happens to support JACK natively, you can skip this step, just set it up to talk to JACK and you’re set.

Otherwise, have a look at guides such as this one from the ArchLinux team.

I have the following in my .asoundrc:

pcm.!default {
        type plug
        slave { pcm "jack" }
}

pcm.jack {
        type jack
        playback_ports {
                0 system:playback_1
                1 system:playback_2
        }
        capture_ports {
                0 system:capture_1
                1 system:capture_1
        }
}

The first part sets my default ALSA device to jack, then the second block defines what jack is. You could possibly skip the first block, in which case your SIP client will need to be told to use jack (or maybe plug:jack) as the ALSA audio device for input/output.

Configuring qjackctl

At this point, to test this we need a JACK audio server running, so start qjackctl. You’ll see a window like this:

qjackctl in operation

This shows it actually running, most likely for you this will not be the case. Over on the right you’ll see Setup… — click that, and you’ll get something like this:

Parameters screen

The first tab is the parameters screen. Here, you’ll want to direct this at your audio device that your speakers/microphone are connected to.

The sample rate may be limited by your audio device. In my experience, JACK hates devices that can’t do the same sample rate for input and output.

My audio device is a Logitech G930 wireless USB headset, and it definitely has this limitation: it can play audio right up to 48kHz, but will only do a meagre 16kHz on capture. JACK thus limits me to both directions running 16kHz. If your device can do 48kHz, that’d be better if you intend to use it for tasks other than audio conferencing. (If your device is also wireless, I’d be interested in knowing where you got it!)

JACK literature seems to recommend 3 periods/buffer for USB devices. The rest is a matter of experiment. 1024 samples/period seems to work fine on my hardware most of the time. Your mileage may vary. Good setups may get away with less, which will decrease latency (mine is 192ms… good enough for me).

The other tab has more settings:

Advanced settings

The things I’ve changed here are:

  • Force 16-bit: since my audio device cannot do anything but 16-bit linear PCM, I force 16-bit mode (rather than the default of 32-bit mode)
  • Channels I/O: output is stereo but input is mono, so I set 1 channel in, two channels out.

Once all is set, Apply then OK.

Now, on qjackctl itself, click the “Start” button. It should report that it has started. You don’t need to click any play buttons to make it work from here. You may have noticed that PulseAudio has detected the JACK server and will now connect to it. If click “Graph”, you’ll see something like this:

qjackctl‘s Graph window

This is the thing you’ll use in qjackctl the most. Here, you can see the “system” boxes represent your audio device, and “PulseAudio JACK Sink”/”PulseAudio JACK Source” represent everything that’s connected to PulseAudio.

You should be able to play sound in PulseAudio, and direct applications there to use the JACK virtual sound card. pavucontrol (normally shipped with PulseAudio) may be handy for moving things onto the JACK virtual device.

Configuring your telephony client

I’ll use Twinkle as the example here. In the preferences, look for a section called Audio. You should see this:

Twinkle audio settings

Here, I’ve set my ringing device to pulse to have that ring PulseAudio. This allows me to direct the audio to my laptop’s on-board sound card so I can hear the phone ring without the headset on.

Since jack was made my default device, I can leave the others as “Default Device”. Otherwise, you’d specify jack or plug:jack as the audio device. This should be set on both Speaker and Microphone settings.

Click OK once you’re done.

Configuring Zoom

I’ll use Zoom here, but the process is similar for Jitsi. In the settings, look for the Audio section.

Zoom audio settings

Set both Speaker and Microphone to JACK (sink and source respectively). Use the “Test Speaker” function to ensure it’s all working.

The patch up

Now, it doesn’t matter whether you call first, then join the meeting, or vice versa. You can even have the PSTN caller call you. The thing is, you want to establish a link to both your PSTN caller and your conference.

The assumption is that you now have a session active in both programs, you’re hearing both the PSTN caller and the conference in your headset, when you speak, both groups hear you. To let them hear each other, do this:

Go to qjackctl‘s patch bay. You’ll see PulseAudio is there, but you’ll also see the instance of the ALSA plug-in connected to JACK. That’s your telephony client. Both will be connected to the system boxes. You need to draw new links between those two new boxes, and the PulseAudio boxes like this:

qjackctl patching Twinkle to Zoom

Here, Zoom is represented by the PulseAudio boxes (since it is using PulseAudio to talk to JACK), and Twinkle is represented by the boxes named alsa-jack… (tip: the number is the PID of the ALSA application if you’re not sure).

Once you draw the connections, the parties should be able to hear each-other. You’ll need to monitor this dialogue from time to time: if either of PulseAudio or the phone client disconnect from JACK momentarily, the connections will need to be re-made. Twinkle will do this if you do a three-way conference, then one person hangs up.

Anyway, that’s the basics covered. There’s more that can be done, for example, recording the audio, or piping audio from something else (e.g. a media player) is just a case of directing it either at JACK directly or via the ALSA plug-in, and drawing connections where you need them.

Pondering audio streaming over LANs

Lately, I’ve been socially distancing a home and so there’s been a few projects that have been considered that otherwise wouldn’t ordinarily get a look in on a count of lack-of-time.

One of these has been setting up a Raspberry Pi with DRAWS board for use on the bicycle as a radio interface. The DRAWS interface is basically a sound card, RTC, GPS and UART interface for radio interfacing applications. It is built around the TI TMS320AIC3204.

Right now, I’m still waiting for the case to put it in, even though the PCB itself arrived months ago. Consequently it has not seen action on the bike yet. It has gotten some use though at home, primarily as an OpenThread border router for 3 WideSky hubs.

My original idea was to interface it to Mumble, a VoIP server for in-game chat. The idea being that, on events like the Yarraman to Wulkuraka bike ride, I’d fire up the phone, connect it to an AP run by the Raspberry Pi on the bike, and plug my headset into the phone:144/430MHz→2.4GHz cross-band.

That’s still on the cards, but another use case came up: digital. It’d be real nice to interface this over WiFi to a stronger machine for digital modes. Sound card over network sharing. For this, Mumble would not do, I need a lossless audio transport.

Audio streaming options

For audio streaming, I know of 3 options:

  • PulseAudio network streaming
  • netjack
  • trx

PulseAudio I’ve found can be hit-and-miss on the Raspberry Pi, and IMO, is asking for trouble with digital modes. PulseAudio works fine for audio (speech, music, etc). It will make assumptions though about the nature of that audio. The problem is we’re not dealing with “audio” as such, we’re dealing with modem tones. Human ears cannot detect phase easily, data modems can and regularly do. So PA is likely to do things like re-sample the audio to synchronise the two stations, possibly use lossy codecs like OPUS or CELT, and make other changes which will mess with the signal in unpredictable ways.

netjack is another possibility, but like PulseAudio, is geared towards low-latency audio streaming. From what I’ve read, later versions use OPUS, which is a no-no for digital modes. Within a workstation, JACK sounds like a close fit, because although it is geared to audio, its use in professional audio means it’s less likely to make decisions that would incur loss, but it is a finicky beast to get working at times, so it’s a question mark there.

trx was a third option. It uses RTP to stream audio over a network, and just aims to do just that one thing. Digging into the code, present versions use OPUS, older versions use CELT. The use of RTP seemed promising though, it actually uses oRTP from the Linphone project, and last weekend I had a fiddle to see if I could swap out OPUS for linear PCM. oRTP is not that well documented, and I came away frustrated, wondering why the receiver was ignoring the messages being sent by the sender.

It’s worth noting that trx probably isn’t a good example of a streaming application using oRTP. It advertises the stream as G711u, but then sends OPUS data. What it should be doing is sending it as a dynamic content type (e.g. 96), and if this were a SIP session, there’d be a RTPMAP sent via Session Description Protocol to say content type 96 was OPUS.

I looked around for other RTP libraries to see if there was something “simpler” or better documented. I drew a blank. I then had a look at the RTP/RTCP specs themselves published by the IETF. I came to the conclusion that RTP was trying to solve a much more complicated use case than mine. My audio stream won’t traverse anything more sophisticated than a WiFi AP or an Ethernet switch. There’s potential for packet loss due to interference or weak signal propagation between WiFi nodes, but latency is likely to remain pretty consistent and out-of-order handling should be almost a non-issue.

Another gripe I had with RTP is its almost non-consideration of linear PCM. PCMA and PCMU exist, 16-bit linear PCM at 44.1kHz sampling exists (woohoo, CD quality), but how about 48kHz? Nope. You have to use SDP for that.

Custom protocol ideas

With this in mind, my own custom protocol looks like the simplest path forward. Some simple systems that used by GQRX just encapsulate raw audio in UDP messages, fire them at some destination and hope for the best. Some people use TCP, with reasonable results.

My concern with TCP is that if packets get dropped, it’ll try re-sending them, increasing latency and never quite catching up. Using UDP side-steps this, if a packet is lost, it is forgotten about, so things will break up, then recover. Probably a better strategy for what I’m after.

I also want some flexibility in audio streams, it’d be nice to be able to switch sample rates, bit depths, channels, etc. RTP gets close with its L16/44100/2 format (the Philips Red-book standard audio format). In some cases, 16kHz would be fine, or even 8kHz 16-bit linear PCM. 44.1k works, but is wasteful. So a header is needed on packets to at least describe what format is being sent. Since we’re adding a header, we might as well set aside a few bytes for a timestamp like RTP so we can maintain synchronisation.

So with that, we wind up with these fields:

  • Timestamp
  • Sample rate
  • Number of channels
  • Sample format

Timestamp

The timestamp field in RTP is basically measured in ticks of some clock of known frequency, e.g. for PCMU it is a 8kHz clock. It starts with some value, then increments up monotonically. Simple enough concept. If we make this frequency the sample rate of the audio stream, I think that will be good enough.

At 48kHz 16-bit stereo; data will be streaming at 192kbps. We can tolerate wrap-around, and at this data rate, we’d see a 16-bit counter overflow every ~341ms, which whilst not unworkable, is getting tight. Better to use a 32-bit counter for this, which would extend that overflow to over 6 hours.

Sample rate encoding

We can either support an integer field, or we can encode the rate somehow. An integer field would need a range up to 768k to support every rate ALSA supports. That’s another 32-bit integer. Or, we can be a bit clever: nearly every sample rate in common use is a harmonic of 8kHz or 11.025kHz, so we devise a scheme consisting of a “base” rate and multiplier. 48kHz? That’s 8kHz×6. 44.1kHz? That’s 11.025kHz×4.

If we restrict ourselves to those two base rates, we can support standard rates from 8kHz through to 1.4MHz by allocating a single bit to select 8kHz/11.025kHz and 7 bits for the multiplier: the selected sample rate is the base rate multiplied by the multipler incremented by one. We’re unlikely to use every single 8kHz step though. Wikipedia lists some common rates and as we go up, the steps get bigger, so let’s borrow 3 multiplier bits for a left-shift amount.

7 6 5 4 3 2 1 0
B S S S M M M M

B = Base rate: (0) 8000 Hz, (1) 11025 Hz
S = Shift amount
M = Multiplier - 1

Rate = (Base << S) * (M + 1)

Examples:
  00000000b (0x00): 8kHz
  00010000b (0x10): 16kHz
  10100000b (0xa0): 44.1kHz
  00100000b (0x20): 48kHz
  01010010b (0x52): 768kHz (ALSA limit)
  11111111b (0xff): 22.5792MHz (yes, insane)

Other settings

I primarily want to consider linear PCM types. Technically that includes unsigned PCM, but since that’s losslessly transcodable to signed PCM, we could ignore it. So we could just encode the number of bytes needed for a single channel sample, minus one. Thus 0 would be 8-bits; 1 would be 16-bits; 2 would be 32-bits and 3 would be 64-bits. That needs just two bits. For future-proofing, I’d probably earmark two extra bits; reserved for now, but might be used to indicate “compressed” (and possibly lossy) formats.

The remaining 4 bits could specify a number of channels, again minus 1 (mono would be 0, stereo 1, etc up to 16).

Packet type

For the sake of alignment, I might include a 16-bit identifier field so the packet can be recognised as being this custom audio format, and to allow multiplexing of in-band control messages, but I think the concept is there.

Playing with speech synthesis

This afternoon, I was pondering about how I might do text-to-speech, but still have the result sound somewhat natural. For what use case? Well, two that come to mind…

The first being for doing “strapper call” announcements at horse endurance rides. A horse endurance ride is where competitors and their horses traverse a long (sometimes as long as 320km) trail through a wilderness area. Usually these rides (particularly the long ones) are broken up into separate stages or “legs”.

Upon arrival back at base, the competitor has a limited amount of time to get the horse’s vital signs into acceptable ranges before they must present to the vet. If the horse has a too-high temperature, or their horse’s heart rate is too high, they are “vetted out”.

When the competitor reaches the final check-point, ideally you want to let that competitor’s support team know they’re on their way back to base so they can be there to meet the competitor and begin their work with the horse.

Historically, this was done over a PA system, however this isn’t always possible for the people at base to achieve. So having an automated mechanism to do this would be great. In recent times, Brisbane WICEN has been developing a public display that people can see real-time results on, and this also doubles as a strapper-call display.

Getting the information to that display is something of a work-in-progress, but it’s recognised that if you miss the message popping up on the display, there’s no repeat. A better solution would be to “read out” the message. Then you don’t have to be watching the screen, you can go about your business. This could be done over a PA system, or at one location there’s an extensive WiFi network there, so streaming via Icecast is possible.

But how do you get the text into speech?

Enter flite

flite is a minimalist speech synthesizer from the Festival project. Out of the box it includes 3 voices, mostly male American voices. (I think the rms one might be Richard M. Stallman, but I could be wrong on that!) There’s a couple of demos there that can be run direct from the command line.

So, for the sake of argument, let’s try something simple, I’ll use the slt voice (a US female voice) and just get the program to read out what might otherwise be read out during a horse ride event:

$ flite_cmu_us_slt -t 'strapper call for the 160 kilometer event competitor numbers 123 and 234' slt-strapper-nopunctuation-digits.wav
slt-strapper-nopunctuation-digits.ogg

Not bad, but not that great either. Specifically, the speech is probably a little quick. The question is, how do you control this? Turns out there’s a bit of hidden functionality.

There is an option marked -ssml which tells flite to interpret the text as SSML. However, if you try it, you may find it does little to improve matters, I don’t think flite actually implements much of it.

Things are improved if we spell everything out. So if you instead replace the digits with words, you do get a better result:

$ flite_cmu_us_slt -t 'strapper call for the one hundred and sixty kilometer event competitor number one two three and two three four' slt-strapper-nopunctuation-words.wav
slt-strapper-nopunctuation-words.ogg

Definitely better. It could use some pauses. Now, we don’t have very fine-grained control over those pauses, but we can introduce some punctuation to have some control nonetheless.

$ flite_cmu_us_slt -t 'strapper call.  for the one hundred and sixty kilometer event.  competitor number one two three and two three four' slt-strapper-punctuation.wav
slt-strapper-punctuation.ogg

Much better. Of course it still sounds somewhat robotic though. I’m not sure how to adjust the cadence on the whole, but presumably we can just feed the text in piece-wise, render those to individual .wav files, then stitch them together with the pauses we want.

How about other changes though? If you look at flite --help, there is feature options which can control the synthesis. There’s no real documentation on what these do, what I’ve found so far was found by grep-ing through the flite source code. Tip: do a grep for feat_set_, and you’ll see a whole heap.

Controlling pitch

There’s two parameters for the pitch… int_f0_target_mean controls the “centre” frequency of the speech in Hertz, and int_f0_target_stddev controls the deviation. For the slt voice, …mean seems to sit around 160Hz and the deviation is about 20Hz.

So we can say, set the frequency to 90Hz and get a lower tone:

$ flite_cmu_us_slt --setf int_f0_target_mean=90 -t 'strapper call' slt-strapper-mean-90.wav
slt-strapper-mean-90.ogg

… or 200Hz for a higher one:

$ flite_cmu_us_slt --setf int_f0_target_mean=200 -t 'strapper call' slt-strapper-mean-200.wav
slt-strapper-mean-200.ogg

… or we can change the variance:

$ flite_cmu_us_slt --setf int_f0_target_stddev=0.0 -t 'strapper call' slt-strapper-stddev-0.wav
$ flite_cmu_us_slt --setf int_f0_target_stddev=70.0 -t 'strapper call' slt-strapper-stddev-70.wav
slt-strapper-stddev-0.ogg
slt-strapper-stddev-70.ogg

We can’t change these values during a block of speech, but presumably we can cut up the text we want to render, render each piece at the frequency/variance we want, then stitch those together.

Controlling rate

So I mentioned we can control the rate, somewhat coarsely using usual punctuation devices. We can also change the rate overall by setting duration_stretch. This basically is a control of how “long” we want to stretch out the pronunciation of words.

$ flite_cmu_us_slt --setf duration_stretch=0.5 -t 'strapper call' slt-strapper-stretch-05.wav
$ flite_cmu_us_slt --setf duration_stretch=0.7 -t 'strapper call' slt-strapper-stretch-07.wav
$ flite_cmu_us_slt --setf duration_stretch=1.0 -t 'strapper call' slt-strapper-stretch-10.wav
$ flite_cmu_us_slt --setf duration_stretch=1.3 -t 'strapper call' slt-strapper-stretch-13.wav
$ flite_cmu_us_slt --setf duration_stretch=2.0 -t 'strapper call' slt-strapper-stretch-20.wav
slt-strapper-stretch-05.ogg
slt-strapper-stretch-07.ogg
slt-strapper-stretch-10.ogg
slt-strapper-stretch-13.ogg
slt-strapper-stretch-20.ogg

Putting it together

So it looks as if all the pieces are there, we just need to stitch them together.

RC=0 stuartl@rikishi /tmp $ flite_cmu_us_slt --setf duration_stretch=1.2 --setf int_f0_target_stddev=50.0 --setf int_f0_target_mean=180.0 -t 'strapper call' slt-strapper-call.wav
RC=0 stuartl@rikishi /tmp $ flite_cmu_us_slt --setf duration_stretch=1.1 --setf int_f0_target_stddev=30.0 --setf int_f0_target_mean=180.0 -t 'for the, one hundred, and sixty kilometer event' slt-160km-event.wav
RC=0 stuartl@rikishi /tmp $ flite_cmu_us_slt --setf duration_stretch=1.4 --setf int_f0_target_stddev=40.0 --setf int_f0_target_mean=180.0 -t 'competitors, one two three, and, two three four' slt-competitors.wav
Above files stitched together in Audacity

Here, I manually imported all three files into Audacity, arranged them, then exported the result, but there’s no reason why the same could not be achieved by a program, I’m just inserting pauses after all.

There are tools for manipulating RIFF waveform files in most languages, and generating silence is not rocket science. The voice itself could be fine-tuned, but that’s simply a matter of tweaking settings. Generating the text is basically a look-up table feeding into snprintf (or its equivalent in your programming language of choice).

It’d be nice to implement a wrapper around flite that took the full SSML or JSML text and rendered it out as speech, but this gets pretty close without writing much code at all. Definitely worth continuing with.

DC Power Distribution

So, lately I’ve been helping out with running the base at a few horse rides up at Imbil. This involves amongst other things, running three radios, a base computer, laptops, and other paraphernalia.

The whole kit needs to run off an unregulated 12V DC supply, consisting of two 105Ah AGM batteries which have solar and mains back-up. The outlet for this is a Anderson SB50 connector, fairly standard for caravans.

Catch being, this is temporary. So no permanent linkages, we need to be able to disconnect and pack everything away when not in use. One bug bear is having enough DC outlets for everything. Especially of the 30A Anderson Power Pole variety, since most of our radios use those.

The monitor for the base computer uses a cigarette lighter adapter, while the base computer itself (an Intel NUC) has a cable terminated with a 30A power pole. There’s also a WiFi router which has a micro-USB power input — thankfully the monitor’s adaptor embeds a USB power outlet, so we can run it off that.

We need two amateur radios (one for voice comms, one for packet), and a CB set for communications with the ride organisers (who are otherwise not licensed to use amateur bands). We may also see a move to commercial frequencies, so that’s potentially another radio or two.

I started thinking about ways we could make a modular power distribution system.

The thought was, if we made PDU boxes where the inlet and outlet were nice big SB50s, configured so that they would mate when the boxes were joined up, we could have a flexible PDU system where we just clip it together like Lego bricks.

This is a work in progress, but I figured I’d post what I have so far.

Power outlets on the distribution box, yet to be wired up.

I still need to do the internal wiring, but above is basically what I was thinking of. There’s room for up to 6 consumers via the 30A power pole connections along one side, each with its own 20A breaker. (The connectors are rated at 45A.)

Originally I was aiming for 6 cigarette lighter sockets, but after receiving the parts, I realised that wouldn’t fit, but two seems to work okay, and we can always make a second box and slap that on the end. Each has a 15A breaker.

Protecting the upstream power source is a 50A breaker. So total of the down-stream port + all outlets on the box itself may not exceed 50A.

The upstream and downstream ports are positioned so that boxes can just be butted up against each-other for the connectors to mate. I’ve got to fine-tune the positioning a bit, and right now the connectors are also on an angle, but this hopefully shows the concept…

The idea for maintenance is the box will fold out. Not sure if the connection between all the outputs on the lid will be via a bus bar or using individual cables going to the tie point inside the box just yet. Those 30A outlets are just begging for a single cable to visit each bus-bar style. I also have to figure out how I’ll connect to the cigarette lighter sockets too.

Hopefully I’ll get this done before the next ride event.