jack

New headset: Logitech Zone Wireless

So, recently I misplaced the headset adaptor I use for my aging ZTE T83… which is getting on nearly 6 years old now. I had the parts on hand to make a new adaptor, so whipped up a new one, but found the 3.5mm plug would not stay in the socket.

Evidently, this socket has reached the number of insert/remove cycles, and will no longer function reliably. I do like music on the go, and while I’m no fan of Bluetooth, it is a feature my phone supports. I’ve also been hacking an older Logtech headset I have so that I can re-purpose it for use at work, but so far, it’s been about 15 months since I did any real work in the office. Thanks to China’s little gift, I’ve been working at home.

At work, I was using the Logitech H800 which did both USB and Bluetooth. Handy… but one downside it had is it didn’t do both, you selected the mode via a slider switch on the back of one of the ear cups. The other downside is that being an “open ear” design, it tended to leak sound, so my colleagues got treated to the sound track of my daily work.

My father now uses that headset since he needed one for video conferencing (again, thank-you China) and it was the best-condition headset I had on hand. I switched to using a now rather tatty-looking G930 before later getting a ATH-G1WL which is doing the task at home nicely. The ATH-G1WL is better all-round for a wireless USB headset, but it’s a one-trick pony: it just does USB audio. It does it well, better than anything else I’ve used, but that’s all it does. Great for home, where I may want better fidelity and for applications that don’t like asymmetric sample rates, but useless with my now Bluetooth-only phone.

I had a look around, and found the Zone Wireless headset… I wondered how it stacked up against the H800. Double the cost, is it worth it?

Firstly, my environment: I’m mostly running Linux, but I will probably use this headset with Android a lot… maybe OpenBSD. The primary use case will be mobile devices, so my existing Android phone, and a Samsung Active3 8″ tablet I have coming. The fact this unit like the H800 does both Bluetooth and USB is an attractive feature. Another interesting advertised feature is that it can be simultaneously connected to both, unlike the H800 which is exclusively one or the other.

First impressions

So, it turned up today. In the box was a USB-C cable (probably the first real use I have for such a cable), a USB-A to USB-C adaptor (for all you young whipper-snappers with USB-C ports exclusively), the headset itself, the USB dongle, and a bag to stow everything in.

Interestingly, each of these has a set-up guide. Ohh, and at the time of writing, yes, there are 6 links titled “Setup Guide (PDF)”… the bottom-right one is the one for the headset (guess how I discovered that?). Amongst those is a set-up guide for the bag. (Who’d have thought some fabric bag with a draw-string closure needed a set-up guide?) I guess they’re aiming this at the Pointy Haired Boss.

Many functions are controlled using an “app” installed on a mobile device. I haven’t tried this as I suspect Android 4.1 is too old. Maybe I can look at that when the tablet turns up, as it should be recent enough. It’d be nice to duplicate this functionality on Linux, but ehh… enough of it works without.

Also unlike the H800… there’s nowhere on the headset to stash the dongle when not in use. This is a bit of a nuisance, but they do provide the little bag to stow it in. The assumption is I guess that it’ll permanently occupy a USB port, since the same dongle also talks to their range of keyboards and mice.

USB audio functionality

I had the Raspberry Pi 3 running as a DAB+ receiver, Triple M Classic Rock had The Beatles Seargeant Pepper’s Lonely Hearts Club Band album on… so I plugged the dongle in to see how they compared with my desktop speakers (plugged in to the “headphone” jack). Now this isn’t the best test for sound quality for two reasons: (1) this DAB+ station is broadcasting 64kbps HE-AAC, and (2) the “headphone” jack on the Pi is hardly known as high fidelity, but it gave me a rough idea.

Audio quality was definitely reasonable. No better or worse than the H800. I haven’t tried the microphone yet, but it looks as if it’s on par with the H800 as well. Like every other Logitech headset I’ve owned to date, it too forces asymmetric sample rates, if you’re looking at using JACK, consider something else:

stuartl@vk4msl-pi3:~ $ cat /proc/asound/Wireless/stream0
Logitech Zone Wireless at usb-3f980000.usb-1.4, full speed : USB Audio

Playback:
  Status: Running
    Interface = 2
    Altset = 1
    Packet Size = 192
    Momentary freq = 48000 Hz (0x30.0000)
  Interface 2
    Altset 1
    Format: S16_LE
    Channels: 2
    Endpoint: 3 OUT (NONE)
    Rates: 48000
    Bits: 16

Capture:
  Status: Stop
  Interface 1
    Altset 1
    Format: S16_LE
    Channels: 1
    Endpoint: 3 IN (NONE)
    Rates: 16000
    Bits: 16

The control buttons seem to work, and there’s even a HID device appearing under Linux, but testing with xev reveals no reaction when I press the up/down/MFB buttons.

Bluetooth functionality

With the dongle plugged in, I reached for my phone, turned on its Bluetooth radio, pressed the power button on the headset for a couple of seconds, then told my phone to go look for the device. It detected the headset and paired straight away. Fairly painless, as you’d expect, even given the ancient Android device it was paired with. (Bluetooth 5 headset… meet Bluetooth 3 host!)

I then tried pulling up some music… the headset immediately switched streams, I was now hearing Albert Hammond – Free Electric Band. Hit pause, and I was back to DAB+ on the Raspberry Pi.

Yes, it was connected to both the USB dongle and the phone, which was fantastic. One thing it does NOT do though, at least out-of-the-box, is “mix” the two streams. Great for telephone calls I suppose, but forget listening to something in the background via your computer whilst you converse with somebody on the phone.

The audio quality was good though. Some cheaper Bluetooth headsets often sound “watery” to my hearing (probably the audio CODEC), which is why I avoided them prior to buying the H800, the H800 was the first that sounded “normal”, and this carries that on.

I’m not sure what the microphone sounds like in this mode. I suspect with my old phone, it’ll drop back to the HSP profile, which has an 8kHz sample rate, no wideband audio. I’ll know more when the tablet turns up as it should be able to better put the headset through its paces.

Noise cancellation

One nice feature of this headset is that unlike the H800, this is a closed-ear design which does a reasonable amount of passive noise suppression. So likely will leak sound less than the H800. Press the ANC button, and an announcement reports that noise cancellation is now ON… and there’s a subtle further muffling of outside noise.

It won’t pass AS/NZS:1270, but it will at least mean I’m not turning the volume up quite so loud, which is better for my hearing anyway. Doing this is at the cost of about an hour’s battery life apparently.

Left/right channel swapping

Another nice feature, this is the first headset I’ve owned where you can put the microphone on either side you like. Put the headset on either way around, flip the microphone to where your mouth is: as you pass roughly the 15° point on the boom, the channels switch so left/right channels are the correct way around for the way you’re wearing it.

This isn’t a difficult thing to achieve, but I have no idea why more companies don’t do it. There seems to be this defacto standard of microphone/controls on the left, which is annoying, as I prefer it and controls on the right. Some headsets (Logitech wired USB) were right-hand side only, but this puts the choice in the user’s hands. This should be encouraged.

Verdict

Well, it’s pricey, but it gets the job done and has a few nice features to boot. I’ll be doing some more testing when more equipment turns up, but it seems to play nice with what I have now.

The ability to switch between USB and Bluetooth sources is welcome, it’d be nice to have some software mixing possible, but that’s not the end of the world, it’s an improvement nonetheless.

Audio quality seems decent enough, with playback sample rates able to match DVD audio sample rates (at 16-bits linear PCM). Microphone sample rate should be sufficient for wideband voice (but not ultra-wideband or fullband).

It’s nice to be able to put the microphone on the side of my choosing rather than having that dictated to me.

The audio cancellation is a nice feature, and one I expect to make more use of in an open-plan environment.

The asymmetric record/playback sample rates might be a bit of a nuisance if you use software that expects these to be symmetric.

Somewhere to stash the dongle on the headset would’ve been nicer than having a carry bag.

It’d be nice if there was some sort of protocol spec for the functions offered in the “app” so that those who cannot or choose not to run it, can still access those functions.

New headset: AudioTechnica ATH-G1WL

So, for a decade now, I’ve been looking at a way of un-tethering myself from VoIP and radio applications. Headsets are great, but it does mean you’re chained to whatever it’s plugged into by your head.

Two solutions exist for this: get a headset with a wireless interface, or get a portable device to plug the headset into.

The devices I’d like to use are a combination of analogue and computer-based devices. Some have Bluetooth… I did buy the BU-1 module for my Yaesu VX-8DR years, and still have it installed in the FTM-350AR, but found it was largely a gimmick as it didn’t work with the majority of Bluetooth headsets on the market, and was buggy with the ones it did work with.

Years ago, I hit upon a solution using a wireless USB headset and a desktop PC connected to the radio. Then, the headset was one of the original Logitech wireless USB headsets, which I still have and still works… although it is in need of a re-build as the head-band is falling to pieces and the leatherette covering on the earpieces has long perished.

One bug bear I had with it though, is that the microphone would only do 16kHz sample rates. At the time when I did that set-up, I was running a dual-Pentium III workstation with a PCI sound-card which was fairly frequency-agile, so it could work to the limitations of the headset, however newer sound cards are generally locked to 48kHz, and maybe support 44.1kHz if you’re lucky.

I later bought a Logitech G930 headset, but found it had the same problem. It also was a bit “flat” sounding, given it is a “surround sound” headset, it compromises on the audio output. This, and the fact that JACK just would not talk to it any higher than 16kHz, meant that it was relegated to VoIP only. I’ve been using VoIP a lot thanks to China’s little gift, even doing phone patches using JACK, Twinkle and Zoom for people who don’t have Internet access.

I looked into options. One option I considered was the AstroGaming A50 Gen 4. These are a pricey option, but one nice feature is they do have an analogue interface, so in theory, it could wire direct to my old radios. That said, I couldn’t find any documentation on the sample rates supported, so I asked.

… crickets chirping …

After hearing nothing, I decided they really didn’t want my business, and there were things that made me uneasy about sinking $500 on that set. In the end I stumbled on these.

Reading the specs, the microphone frequency range was 30Hz to 20kHz… that meant for them to not be falsely advertising, they had to be sampling at at least 40kHz. 44.1 or 48kHz was fine by me. These are pricey too, but not as bad as the A50s, retailing at AU$340.

I took the plunge:

RC=0 stuartl@rikishi ~ $ cat /proc/asound/card1/stream0 
Unknown manufacturer ATH-G1WL at usb-0000:00:14.0-4, full speed : USB Audio

Playback:
  Status: Running
    Interface = 1
    Altset = 1
    Packet Size = 200
    Momentary freq = 48000 Hz (0x30.0000)
  Interface 1
    Altset 1
    Format: S16_LE
    Channels: 2
    Endpoint: 1 OUT (ADAPTIVE)
    Rates: 48000
    Bits: 16

Capture:
  Status: Running
    Interface = 2
    Altset = 1
    Packet Size = 100
    Momentary freq = 48000 Hz (0x30.0000)
  Interface 2
    Altset 1
    Format: S16_LE
    Channels: 1
    Endpoint: 2 IN (ASYNC)
    Rates: 48000
    Bits: 16

Okay, so these are locked to 48kHz… that works for me. Oddly enough, I used to get a lot of XRUNS in jackd with the Logitech sets… I don’t get any using this set, even with triple the sample rate. This set is very well behaved with jackd. Allegedly the headset is “broadcast”-grade.

Well, not sure about that, but it performs a lot better than the Logitech ones did. AudioTechnica have plenty of skin in the audio game, have done for decades (the cartridge on my father’s turntable… an old Rotel from the late 70s) was manufactured by them. So it’s possible, but it’s also possible it’s marketing BS.

The audio quality is decent though. I’ve only used it for VoIP applications so far, people have noticed the microphone is more “bassy”.

The big difference from my end, is notifications from Slack and my music listening. Previously, since I was locked to 16kHz on the headset, it was no good listening to music there since I got basically AM radio quality. So I used the on-board sound card for that with PulseAudio. Slack though (which my workplace uses), refuses to send notification sounds there, so I missed hearing notifications as I didn’t have the headset on.

Now, I have the music going through the headset with the notification sounds. I miss nothing. PulseAudio also had a habit of “glitching”, momentary drop-outs in audio. This is gone when using JACK.

My latency is just 64msec. I can’t quite ditch PulseAudio, as without it, tools like Zoom won’t see the JACK virtual source/sink… this seems to be a limitation of the QtMultimedia back-end being used: it doesn’t list virtual sound interfaces and doesn’t let people put in arbitrary ALSA device strings (QAudioDeviceInfo just provides a list).

At the moment though, I don’t need to route between VoIP applications (other than Twinkle, which talks to ALSA direct), so I can live with it staying there for now.

Phone patching to Zoom

Brisbane Area WICEN Group (Inc) lately has been caught up in this whole COVID-19 situation, unable to meet face-to-face for business meetings. Like a lot of groups, we’ve had to turn to doing things online.

Initially, Cisco WebEx was trialled, however this had significant compatibility issues, most notably, under Linux — it just straight plain didn’t work. Zoom however, has proven fairly simple to operate and seems to work, so we’ve been using that for a number of “social” meetings and at least one business meeting so far.

A challenge we have though, is that one of our members does not have a computer or smart-phone. Mobile telephony is unreliable in his area (Kelvin Grove), and so yee olde PSTN is the most reliable service. For him to attend meetings, we need some way of patching that PSTN line into the meeting.

The first step is to get something you can patch to. In my case, it was a soft-phone and a SIP VoIP service. I used Twinkle to provide that link. You could also use others like baresip, Linphone or anything else of your choosing. This connects to your sound card at one end, and a Voice Service Provider; in my case it’s my Asterisk server through Internode NodePhone.

The problem is though, while you can certainly make a call outbound whilst in a conference, the person on the phone won’t be able to hear the conference, nor will the conference attendees be able to hear the person on the phone.

Enter JACK

JACK is a audio routing framework for Unix-like operating systems that allows for audio to be routed between applications. It is geared towards multimedia production and professional audio, but since there’s a plug-in in the ALSA framework, it is very handy for linking audio between applications that would otherwise be incompatible.

For this to work, one application has to work either directly with JACK, or via the ALSA plug-in. Many support, and will use, an alternate framework called PulseAudio. Conference applications like Zoom and Jitsi almost universally rely on this as their sound card interface on Linux.

PulseAudio unfortunately is not able to route audio with the same flexibility, but it can route audio to JACK. In particular, JACKv2 and its jackdbus is the path of least resistance. Once JACK starts, PulseAudio detects its presence, and loads a module that connects PulseAudio as a client of JACK.

A limitation with this is PulseAudio will pre-mix all audio streams it receives from its clients into one single monolithic (stereo) feed before presenting that to JACK. I haven’t figured out a work-around for this, but thankfully for this use case, it doesn’t matter. For our purposes, we have just one PulseAudio application: Zoom (or Jitsi), and so long as we keep it that way, things will work.

Software tools

  • jack2: The audio routing daemon.
  • qjackctl: This is a front-end for controlling JACK. It is optional, but if you’re not familiar with JACK, it’s the path of least resistance. It allows you to configure, start and stop JACK, and to control patch-bay configuration.
  • SIP Client, in my case, Twinkle.
  • ALSA JACK Plug-in, part of alsa-plugins.
  • PulseAudio JACK plug-in, part of PulseAudio.

Setting up the JACK ALSA plug-in

To expose JACK to ALSA applications, you’ll need to configure your ${HOME}/.asoundrc file. Now, if your SIP client happens to support JACK natively, you can skip this step, just set it up to talk to JACK and you’re set.

Otherwise, have a look at guides such as this one from the ArchLinux team.

I have the following in my .asoundrc:

pcm.!default {
        type plug
        slave { pcm "jack" }
}

pcm.jack {
        type jack
        playback_ports {
                0 system:playback_1
                1 system:playback_2
        }
        capture_ports {
                0 system:capture_1
                1 system:capture_1
        }
}

The first part sets my default ALSA device to jack, then the second block defines what jack is. You could possibly skip the first block, in which case your SIP client will need to be told to use jack (or maybe plug:jack) as the ALSA audio device for input/output.

Configuring qjackctl

At this point, to test this we need a JACK audio server running, so start qjackctl. You’ll see a window like this:

qjackctl in operation

This shows it actually running, most likely for you this will not be the case. Over on the right you’ll see Setup… — click that, and you’ll get something like this:

Parameters screen

The first tab is the parameters screen. Here, you’ll want to direct this at your audio device that your speakers/microphone are connected to.

The sample rate may be limited by your audio device. In my experience, JACK hates devices that can’t do the same sample rate for input and output.

My audio device is a Logitech G930 wireless USB headset, and it definitely has this limitation: it can play audio right up to 48kHz, but will only do a meagre 16kHz on capture. JACK thus limits me to both directions running 16kHz. If your device can do 48kHz, that’d be better if you intend to use it for tasks other than audio conferencing. (If your device is also wireless, I’d be interested in knowing where you got it!)

JACK literature seems to recommend 3 periods/buffer for USB devices. The rest is a matter of experiment. 1024 samples/period seems to work fine on my hardware most of the time. Your mileage may vary. Good setups may get away with less, which will decrease latency (mine is 192ms… good enough for me).

The other tab has more settings:

Advanced settings

The things I’ve changed here are:

  • Force 16-bit: since my audio device cannot do anything but 16-bit linear PCM, I force 16-bit mode (rather than the default of 32-bit mode)
  • Channels I/O: output is stereo but input is mono, so I set 1 channel in, two channels out.

Once all is set, Apply then OK.

Now, on qjackctl itself, click the “Start” button. It should report that it has started. You don’t need to click any play buttons to make it work from here. You may have noticed that PulseAudio has detected the JACK server and will now connect to it. If click “Graph”, you’ll see something like this:

qjackctl‘s Graph window

This is the thing you’ll use in qjackctl the most. Here, you can see the “system” boxes represent your audio device, and “PulseAudio JACK Sink”/”PulseAudio JACK Source” represent everything that’s connected to PulseAudio.

You should be able to play sound in PulseAudio, and direct applications there to use the JACK virtual sound card. pavucontrol (normally shipped with PulseAudio) may be handy for moving things onto the JACK virtual device.

Configuring your telephony client

I’ll use Twinkle as the example here. In the preferences, look for a section called Audio. You should see this:

Twinkle audio settings

Here, I’ve set my ringing device to pulse to have that ring PulseAudio. This allows me to direct the audio to my laptop’s on-board sound card so I can hear the phone ring without the headset on.

Since jack was made my default device, I can leave the others as “Default Device”. Otherwise, you’d specify jack or plug:jack as the audio device. This should be set on both Speaker and Microphone settings.

Click OK once you’re done.

Configuring Zoom

I’ll use Zoom here, but the process is similar for Jitsi. In the settings, look for the Audio section.

Zoom audio settings

Set both Speaker and Microphone to JACK (sink and source respectively). Use the “Test Speaker” function to ensure it’s all working.

The patch up

Now, it doesn’t matter whether you call first, then join the meeting, or vice versa. You can even have the PSTN caller call you. The thing is, you want to establish a link to both your PSTN caller and your conference.

The assumption is that you now have a session active in both programs, you’re hearing both the PSTN caller and the conference in your headset, when you speak, both groups hear you. To let them hear each other, do this:

Go to qjackctl‘s patch bay. You’ll see PulseAudio is there, but you’ll also see the instance of the ALSA plug-in connected to JACK. That’s your telephony client. Both will be connected to the system boxes. You need to draw new links between those two new boxes, and the PulseAudio boxes like this:

qjackctl patching Twinkle to Zoom

Here, Zoom is represented by the PulseAudio boxes (since it is using PulseAudio to talk to JACK), and Twinkle is represented by the boxes named alsa-jack… (tip: the number is the PID of the ALSA application if you’re not sure).

Once you draw the connections, the parties should be able to hear each-other. You’ll need to monitor this dialogue from time to time: if either of PulseAudio or the phone client disconnect from JACK momentarily, the connections will need to be re-made. Twinkle will do this if you do a three-way conference, then one person hangs up.

Anyway, that’s the basics covered. There’s more that can be done, for example, recording the audio, or piping audio from something else (e.g. a media player) is just a case of directing it either at JACK directly or via the ALSA plug-in, and drawing connections where you need them.

Pondering audio streaming over LANs

Lately, I’ve been socially distancing a home and so there’s been a few projects that have been considered that otherwise wouldn’t ordinarily get a look in on a count of lack-of-time.

One of these has been setting up a Raspberry Pi with DRAWS board for use on the bicycle as a radio interface. The DRAWS interface is basically a sound card, RTC, GPS and UART interface for radio interfacing applications. It is built around the TI TMS320AIC3204.

Right now, I’m still waiting for the case to put it in, even though the PCB itself arrived months ago. Consequently it has not seen action on the bike yet. It has gotten some use though at home, primarily as an OpenThread border router for 3 WideSky hubs.

My original idea was to interface it to Mumble, a VoIP server for in-game chat. The idea being that, on events like the Yarraman to Wulkuraka bike ride, I’d fire up the phone, connect it to an AP run by the Raspberry Pi on the bike, and plug my headset into the phone:144/430MHz→2.4GHz cross-band.

That’s still on the cards, but another use case came up: digital. It’d be real nice to interface this over WiFi to a stronger machine for digital modes. Sound card over network sharing. For this, Mumble would not do, I need a lossless audio transport.

Audio streaming options

For audio streaming, I know of 3 options:

  • PulseAudio network streaming
  • netjack
  • trx

PulseAudio I’ve found can be hit-and-miss on the Raspberry Pi, and IMO, is asking for trouble with digital modes. PulseAudio works fine for audio (speech, music, etc). It will make assumptions though about the nature of that audio. The problem is we’re not dealing with “audio” as such, we’re dealing with modem tones. Human ears cannot detect phase easily, data modems can and regularly do. So PA is likely to do things like re-sample the audio to synchronise the two stations, possibly use lossy codecs like OPUS or CELT, and make other changes which will mess with the signal in unpredictable ways.

netjack is another possibility, but like PulseAudio, is geared towards low-latency audio streaming. From what I’ve read, later versions use OPUS, which is a no-no for digital modes. Within a workstation, JACK sounds like a close fit, because although it is geared to audio, its use in professional audio means it’s less likely to make decisions that would incur loss, but it is a finicky beast to get working at times, so it’s a question mark there.

trx was a third option. It uses RTP to stream audio over a network, and just aims to do just that one thing. Digging into the code, present versions use OPUS, older versions use CELT. The use of RTP seemed promising though, it actually uses oRTP from the Linphone project, and last weekend I had a fiddle to see if I could swap out OPUS for linear PCM. oRTP is not that well documented, and I came away frustrated, wondering why the receiver was ignoring the messages being sent by the sender.

It’s worth noting that trx probably isn’t a good example of a streaming application using oRTP. It advertises the stream as G711u, but then sends OPUS data. What it should be doing is sending it as a dynamic content type (e.g. 96), and if this were a SIP session, there’d be a RTPMAP sent via Session Description Protocol to say content type 96 was OPUS.

I looked around for other RTP libraries to see if there was something “simpler” or better documented. I drew a blank. I then had a look at the RTP/RTCP specs themselves published by the IETF. I came to the conclusion that RTP was trying to solve a much more complicated use case than mine. My audio stream won’t traverse anything more sophisticated than a WiFi AP or an Ethernet switch. There’s potential for packet loss due to interference or weak signal propagation between WiFi nodes, but latency is likely to remain pretty consistent and out-of-order handling should be almost a non-issue.

Another gripe I had with RTP is its almost non-consideration of linear PCM. PCMA and PCMU exist, 16-bit linear PCM at 44.1kHz sampling exists (woohoo, CD quality), but how about 48kHz? Nope. You have to use SDP for that.

Custom protocol ideas

With this in mind, my own custom protocol looks like the simplest path forward. Some simple systems that used by GQRX just encapsulate raw audio in UDP messages, fire them at some destination and hope for the best. Some people use TCP, with reasonable results.

My concern with TCP is that if packets get dropped, it’ll try re-sending them, increasing latency and never quite catching up. Using UDP side-steps this, if a packet is lost, it is forgotten about, so things will break up, then recover. Probably a better strategy for what I’m after.

I also want some flexibility in audio streams, it’d be nice to be able to switch sample rates, bit depths, channels, etc. RTP gets close with its L16/44100/2 format (the Philips Red-book standard audio format). In some cases, 16kHz would be fine, or even 8kHz 16-bit linear PCM. 44.1k works, but is wasteful. So a header is needed on packets to at least describe what format is being sent. Since we’re adding a header, we might as well set aside a few bytes for a timestamp like RTP so we can maintain synchronisation.

So with that, we wind up with these fields:

  • Timestamp
  • Sample rate
  • Number of channels
  • Sample format

Timestamp

The timestamp field in RTP is basically measured in ticks of some clock of known frequency, e.g. for PCMU it is a 8kHz clock. It starts with some value, then increments up monotonically. Simple enough concept. If we make this frequency the sample rate of the audio stream, I think that will be good enough.

At 48kHz 16-bit stereo; data will be streaming at 192kbps. We can tolerate wrap-around, and at this data rate, we’d see a 16-bit counter overflow every ~341ms, which whilst not unworkable, is getting tight. Better to use a 32-bit counter for this, which would extend that overflow to over 6 hours.

Sample rate encoding

We can either support an integer field, or we can encode the rate somehow. An integer field would need a range up to 768k to support every rate ALSA supports. That’s another 32-bit integer. Or, we can be a bit clever: nearly every sample rate in common use is a harmonic of 8kHz or 11.025kHz, so we devise a scheme consisting of a “base” rate and multiplier. 48kHz? That’s 8kHz×6. 44.1kHz? That’s 11.025kHz×4.

If we restrict ourselves to those two base rates, we can support standard rates from 8kHz through to 1.4MHz by allocating a single bit to select 8kHz/11.025kHz and 7 bits for the multiplier: the selected sample rate is the base rate multiplied by the multipler incremented by one. We’re unlikely to use every single 8kHz step though. Wikipedia lists some common rates and as we go up, the steps get bigger, so let’s borrow 3 multiplier bits for a left-shift amount.

7 6 5 4 3 2 1 0
B S S S M M M M

B = Base rate: (0) 8000 Hz, (1) 11025 Hz
S = Shift amount
M = Multiplier - 1

Rate = (Base << S) * (M + 1)

Examples:
  00000000b (0x00): 8kHz
  00010000b (0x10): 16kHz
  10100000b (0xa0): 44.1kHz
  00100000b (0x20): 48kHz
  01010010b (0x52): 768kHz (ALSA limit)
  11111111b (0xff): 22.5792MHz (yes, insane)

Other settings

I primarily want to consider linear PCM types. Technically that includes unsigned PCM, but since that’s losslessly transcodable to signed PCM, we could ignore it. So we could just encode the number of bytes needed for a single channel sample, minus one. Thus 0 would be 8-bits; 1 would be 16-bits; 2 would be 32-bits and 3 would be 64-bits. That needs just two bits. For future-proofing, I’d probably earmark two extra bits; reserved for now, but might be used to indicate “compressed” (and possibly lossy) formats.

The remaining 4 bits could specify a number of channels, again minus 1 (mono would be 0, stereo 1, etc up to 16).

Packet type

For the sake of alignment, I might include a 16-bit identifier field so the packet can be recognised as being this custom audio format, and to allow multiplexing of in-band control messages, but I think the concept is there.