Dec 312020
 

So, this last 2 years, I’ve been trying to keep multiple projects on the go, then others come along and pile their own projects on top. It kinda makes a mess of one’s free time, including for things like keeping on top of where things have been put.

COVID-19 has not helped here, as it’s meant I’ve lugged a lot of gear that belongs to my workplace, or belongs at my workplace, home, to use there. This all needs tracking to ensure nothing is lost.

Years ago, I threw together a crude parts catalogue system. It was built on Django, django-mptt and PostgreSQL, and basically abused the admin part of Django to manage electronic parts storage.

I later re-purposed some of its code for an estate database for my late grandmother: I just wrote a front-end so that members of the family could be given login accounts, and “claim” certain items of the estate. In that sense, the concept was extremely powerful.

The overarching principle of how both these systems worked is that you had “items” stored within “locations”. Locations were in a tree-structure (hence django-mptt) where a location could contain further “locations”… e.g. a root-level location might be a bed room, within that might be a couple of wardrobes and draws, and there might be containers within those.

You could nest locations as deeply as you liked. In my parts database, I didn’t consider rooms, but I’d have labelled boxes like “IC Parts 1”, “IC Parts 2”, these were Plano StowAway 932 boxes… which work okay, although I’ve since discovered you don’t leave the inner boxes exposed to UV light: the plastic becomes brittle and falls apart.

The inner boxes themselves were labelled by their position within the outer box (row, column), and each “bin” inside the inner box was labelled by row and column.

IC tubes themselves were also labelled, so if I had several sitting in a box, I could identify them and their location. Some were small enough to fit inside these boxes, others were stored in large storage tubs (I have two).

If I wanted to know where I had put some LM311 op-amps, I might look up the database and it’d tell me that there were 3 of them in IC Box 1/Row 2/Row 3/Column 5. If luck was on my side, I’d go to that box, pull out the inner box, open it up and find what I was looking for plugged into some anti-static foam or stashed in a small IC tube.

The parts themselves were fairly basic, just a description, a link to a data sheet, and some other particulars. I’d then have a separate table that recorded how many of each part was present, and in which location.

So from the locations perspective, it did everything I wanted, but parametric search was out of the question.

The place here looks like a tip now, so I really do need to get on top of what I have, so much so I’m telling people no more projects until I get on top of what I have now.

Other solutions exist. OpenERP had a warehouse inventory module, and I suspect Odoo continues this, but it’s a bit of a beast to try and figure out and it seems customisation has been significantly curtailed from the OpenERP days.

PartKeepr (if you can tolerate deliberate bad spelling) is another option. It seems to have very good parametric search of parts, but one downside is that it has a flat view of locations. There’s a proposal to enhance this, but it’s been languishing for 4 years now.

VRT used to have a semi-active track-and-trace business built on a tracking software package called P-Trak. P-Trak had some nice ideas (including a surprisingly modern message-passing back-end, even if it was a proprietary one), but is overkill of my needs, and it’s a pain to try and deploy, even if I was licensed to do so.

That doesn’t mean though I can’t borrow some ideas from it. It integrated barcode scanners as part of the user interface, something these open-source part inventory packages seem to overlook. I don’t have a dedicated barcode scanner, but I do have a phone with a camera, and a webcam on my netbook. Libraries exist to do this from a web browser, such as this one for QR codes.

My big problem right now is the need to do a stock-take to see what I’ve still got, and what I’ve added since then, along with where it has gone. I’ve got a lot of “random boxes” now which are unlabelled, and just have random items thrown in due to lack-of-time. It’s likely those items won’t remain there either. I need some frictionless way to record where things are getting put. It doesn’t matter exactly where something gets put, just so long as I record that information for use later. If something is going to move to a new location, I want to be able to record that with as little fuss as possible.

So the thinking is this:

  • Print labels for all my storage locations with UUIDs stored as barcodes
  • Enter those storage locations into a database using the UUIDs allocated
  • Expand (or re-write) my parts catalogue database to handle these UUIDs:
    • adding new locations (e.g. when a consignment comes in)
    • recording movement of containers between parent locations
    • sub-dividing locations (e.g. recording the content of a consignment)
    • (partial and complete) merging locations (e.g. picking parts from stock into a project-specific container)

The first step on this journey is to catalogue the storage containers I have now. Some are already entered into the old system, so I’ve grabbed a snapshot of that and can pick through it. Others are new boxes that have arrived since, and had additional things thrown in.

I looked at ways I could label the boxes. Previously that was a spirit pen hand-writing a label, but this does not scale. If I’m to do things efficiently, then a barcode seems the logical way to go since it uses what I already have.

Something new comes in? Put a barcode on the box, scan it, enter it into the system as a new location, then mark where that box is being stored by scanning the location barcode where I’ll put the box. Later, I’ll grab the box, open it up, and I might repeat the process with any IC tubes or packets of parts inside, marking them as being present inside that box.

Need something? Look up where it is, then “check it out” into my work area… now, ideally when I’m finished, it should go back there, but if I’m in a hurry, I just throw it in a box, somewhere, then record that I put it there. Next time I need it, I can look up where it is. Logical order isn’t needed up front, and can come later.

So, step 1 is to label all the locations. Since I’m doing this before the database is fully worked-out, I want to avoid ID clashes, I’m using UUIDs to label all the locations. Initially I thought of QR codes, but then realised some of the “locations” are DIP IC storage tubes, which do not permit large square labels. I did some experiments with Code-128, but found it was near impossible to reliably encode a UUID that way, my phone had difficulty recognising the entire barcode.

I returned to the idea of QR-codes, and found that my phone will scan a 10mm×10mm QR code printed on a page. That’s about the right height for the side of an IC tube. We had some inkjet labels kicking around, small 38.1×21.2mm labels arranged in a 5×11 grid (Avery J8651/L7651 layout). Could I make a script that generated a page full of QR codes?

Turns out, pylabels will do this. It is built on reportlab which amongst other things, embeds a barcode generator that supports various symbologies including QR codes. @hugohadfield had contributed a pull request which demonstrated using this tool with QR codes. I just had to tweak this for my needs.

# This file is part of pylabels, a Python library to create PDFs for printing
# labels.
# Copyright (C) 2012, 2013, 2014 Blair Bonnett
#
# pylabels is free software: you can redistribute it and/or modify it under the
# terms of the GNU General Public License as published by the Free Software
# Foundation, either version 3 of the License, or (at your option) any later
# version.
#
# pylabels is distributed in the hope that it will be useful, but WITHOUT ANY
# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
# A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along with
# pylabels.  If not, see <http://www.gnu.org/licenses/>.

import uuid

import labels
from reportlab.graphics.barcode import qr
from reportlab.lib.units import mm

# Create an A4 portrait (210mm x 297mm) sheets with 5 columns and 13 rows of
# labels. Each label is 38.1mm x 21.2mm with a 2mm rounded corner. The margins
# are automatically calculated.
specs = labels.Specification(210, 297, 5, 13, 38.1, 21.2, corner_radius=2,
        left_margin=6.7, right_margin=3, top_margin=10.7, bottom_margin=10.7)

def draw_label(label, width, height, obj):
    size = 12 * mm
    label.add(qr.QrCodeWidget(
            str(uuid.uuid4()),
            barHeight=height, barWidth=size, barBorder=2))

# Create the sheet.
sheet = labels.Sheet(specs, draw_label, border=True)

sheet.add_labels(range(1, 66))

# Save the file and we are done.
sheet.save('basic.pdf')
print("{0:d} label(s) output on {1:d} page(s).".format(sheet.label_count, sheet.page_count))

The alignment is slightly off, but not severely. I’ll fine tune it later. I’m already through about 30 of those labels. It’s enough to get me started.

For the larger J8165 2×4 sheets, the following specs work. (I can see this being a database table!)

# Specifications for Avery J8165 2×4 99.1×67.7mm
specs = labels.Specification(210, 297, 2, 4, 99.1, 67.7, corner_radius=3,
        left_margin=5.5, right_margin=4.5, top_margin=13.5, bottom_margin=12.5)

Later when I get the database ready (standing up a new VM to host the database and writing the code) I can enter this information in and get back on top of my inventory once again.

Dec 282020
 

So, a while back I tore apart an old Logitech wireless headset with the intention of using its bits to make a wireless USB audio interface. I was undecided whether the headset circuitry would “live” in a new headset, or whether it’d be a separate unit to which I could attach any headset.

I ended up doing the latter. I found through Mouser a suitable enclosure for the original circuitry and have fitted it with cable glands and sockets for the charger input (which now sports a standard barrel jack) and a DIN-5 connector for the earpiece/microphone connections.

The first thing to do was to get rid of that proprietary power connector. The two outer contacts are the +6V and 0V pins, shown here in orange and white/orange coloured cable respectively. I used a blob of heat-melt glue to secure it so I didn’t rip pads off.

Replacing the power connector. +6V is orange, 0V is orange/white.

The socket is “illuminated” by a LED on the PCB. Maybe I’ll look at some sort of light-pipe arrangement to bring that outside, we’ll see.

The other end, just got wired to a plain barrel jack. Future improvement might be to put a 6V DC-DC converter, allowing me to plug in any old 12V source, but for now, this’ll do. I just have to remember to watch what lead I grab. Whilst I was there, I also put in a cable gland for the audio interface connection.

Power socket and audio connections mounted in case.

One challenge with the board design is that there is not one antenna, but two, plus some rather lumpy tantalum capacitors near the second antenna. I suspect the two antennas are for handling polarisation, which will shift as you move your head and as the signal propagates. Either way, they meant the PCB wouldn’t sit “flat”. No problem, I had some old cardboard boxes which provided the solution:

PCB spacer, with cut-out for high-clearance parts.

The cardboard is a good option since it’s readily available and won’t attenuate the 2.4GHz signal much. It was also easy to work with.

I haven’t exposed the three push-buttons on that side of the PCB at this stage. I suppose drilling a hole and making a small “poker” to hit the buttons isn’t out of the question. This isn’t much different to what Logitech’s original case did. I’ll tackle that later. I need a similar solution for the slide-switch used for power.

One issue I faced was wrangling the now over-length FFC that linked the two sides. Previously, this spanned the headband, but now it only needed to reach a few centimetres at most. Eyeballing the original cable, I found this short replacement. I’ll have to figure out how to mount that floating PCB somehow, but at least it’s a clean solution.

Replacement FFC.

At this point, it was a case of finish wiring everything up. I haven’t tried any audio as yet, that will come in time. It still powers up, sees the transceiver, so there’s still “life” in this.

Powering up post-surgery.

I plugged it into its charger and let it run for a while just to top the LiPo cell.

Charging for the first time since mounting.

One thing I’m not happy with is the angle the battery is sitting at, since it’s just a bit wider than the space between the mounting posts. I might try shaving some material off the posts to see if I can get the battery to sit “flat”. I only need about 1mm, which should still allow enough clearance for the screwdriver and screw to pass the cell safely.

The polarity of the speakers is a guess on my part. Neither end seemed to be grounded, hopefully the drivers don’t mind being “common-ed”, otherwise I might need to cram some small isolation transformers in there.

Dec 162020
 

Well, this month has been a funny one. When we moved to the NBN back in March, we went from having a 500GB a month quota, to a 100GB a month, with a link speed of 50Mbps.

That seemed, at the time, like a reasonable compromise, since much of the time, my typical usage has been around 60~70GB a month. There’s no Netflicks subscriptions here, but my father does hit YouTube rather hard, and I lately have been downloading music (legally) from time to time.

This year has also seen me working from home, and doing a lot of Slack and Zoom calls. Zoom in particular, is pricey quota-wise, since everyone insists on running webcams. Despite this, the extra Internet use has been manageable. Couple of times we got around 90GB, maybe sailing close to the 100GB, but never over. This is what it looked like last month:

November’s Internet quota usage

This month, that changed:

Internet usage this month

Now, the start of the month data got missed because of a glitch between collectd and the Internode quota monitoring script I have. Two of the spikes can be attributed to:

  • the arrival of a Windows 10-based laptop doing its out-of-box updates (~4GB)
  • my desktop doing its 3-monthly OS updates (~5GB)

That isn’t enough to account for why things have nearly doubled though. A few prospects were in my mind:

  • a web-based script going haywire in a browser (this has happened, and cost me dearly, before)
  • genuine local user Internet activity increases
  • website traffic increases
  • server or workstation compromise

Looking over the netflow data

Now, last time I had this happen, I did two things:

  • I set up collectd/influxdb/Grafana to be able to monitor my Internet usage and quota
  • I set up nfcapd on the border router to monitor my usage

This is pretty easy to set up in OpenBSD, and well worth doing.

I keep about 30 days’ worth of netflow data on the border router. So naturally, I haul that back to my workstation and run nfdump over it to see what jumps out.

Looking through the list of “flows”, one target identified was a development machine hosted at Vultr… checking the IP address, revealed it was one of the WideSky test instances my workplace uses, about 5GB of HTTP requests and about 4GB of VPN traffic — admittedly the couple of WideSky hubs I have here have the logging settings cranked high.

That though doesn’t explain it. The bulk of the traffic was scattered amongst a number of hosts. I didn’t see it until I tried aggregating it by /16 subnet:

RC=0 stuartl@rikishi /tmp $ nfdump -R /tmp/nfcapd -A srcip,dstip -o long6 -O bytes 'net 114.119.0.0/16'  
Date first seen          Duration Proto                             Src IP Addr:Port                                 Dst IP Addr:Port     Flags Tos  Packets    Bytes Flows
2020-11-27 23:11:30.000 1630599.000 0                             150.101.176.226:0     ->                         114.119.146.185:0     ........   0    4.7 M    6.8 G  2535
2020-11-22 13:02:41.000 2099541.000 0                             150.101.176.226:0     ->                         114.119.133.234:0     ........   0    4.3 M    6.1 G  2376
2020-11-18 14:38:42.000 2439079.000 0                             150.101.176.226:0     ->                         114.119.140.107:0     ........   0    3.8 M    5.4 G  2418
2020-11-20 10:43:58.000 2280070.000 0                             150.101.176.226:0     ->                          114.119.141.52:0     ........   0    3.7 M    5.3 G  2421
2020-11-21 22:34:35.000 2151244.000 0                             150.101.176.226:0     ->                         114.119.159.109:0     ........   0    3.4 M    4.9 G  2446
2020-11-24 00:11:52.000 1972657.000 0                             150.101.176.226:0     ->                          114.119.136.13:0     ........   0    3.4 M    4.8 G  2399
2020-11-25 04:24:32.000 1870854.000 0                             150.101.176.226:0     ->                         114.119.136.215:0     ........   0    3.3 M    4.8 G  2473
2020-11-24 15:49:55.000 1916848.000 0                             150.101.176.226:0     ->                           114.119.151.0:0     ........   0    3.0 M    4.4 G  2435
2020-11-27 20:15:43.000 1641316.000 0                             150.101.176.226:0     ->                         114.119.129.181:0     ........   0    2.6 M    3.7 G  2426
2020-11-27 21:38:37.000 1636635.000 0                             150.101.176.226:0     ->                          114.119.159.16:0     ........   0    2.5 M    3.6 G  2419
2020-11-27 23:11:30.000 1630599.000 0                             114.119.146.185:0     ->                         150.101.176.226:0     ........   0    4.1 M  175.9 M  2535
…
2020-11-19 22:02:04.000     0.000 0                             150.101.176.226:0     ->                         114.119.138.111:0     ........   0        3      132     1
2020-11-25 03:37:11.000     0.000 0                             150.101.176.226:0     ->                          114.119.152.27:0     ........   0        3      132     1
2020-12-06 19:59:49.000     0.000 0                             150.101.176.226:0     ->                         114.119.151.153:0     ........   0        3      132     1
2020-11-22 08:23:11.000     0.000 0                             150.101.176.226:0     ->                          114.119.130.23:0     ........   0        3      132     1
2020-11-25 15:43:47.000     0.000 0                             150.101.176.226:0     ->                         114.119.128.219:0     ........   0        3      132     1
2020-11-24 09:05:13.000     0.000 0                             150.101.176.226:0     ->                          114.119.140.85:0     ........   0        3      132     1
Summary: total flows: 56059, total bytes: 51.7 G, total packets: 65.0 M, avg bps: 150213, avg pps: 23, avg bpp: 794
Time window: 2020-11-13 11:01:52 - 2020-12-16 20:19:41
Total flows processed: 39077053, Blocks skipped: 0, Bytes read: 2698309352
Sys: 3.744s flows/second: 10436251.9 Wall: 15.108s flows/second: 2586482.6 

51.7GB in a month!!! Drilling further, I noted it was mostly targeted at TCP ports 80 and 443, and UDP port 53. Web traffic, in other words. Reverse look-up on a randomly selected IP showed the reverse pointer petalbot-xxx-xxx-xxx-xxx.aspiegel.com, and indeed, in server logs for various sites I host, I saw PetalBot in the user agent.

Plucking some petals off PetalBot

So, I needed to put the brakes on this somehow. I’m fine with them indexing my site, just they should have some consideration and restraint about how quickly they do so.

Thus, I amended pf.conf:

# Rate-limited "friends"
ratelimit_dst4="{ 114.119.0.0/16 }"
#ratelimit_dst6="{ }"

# Traffic shaping queues
queue root on $external  bandwidth 25M max 25M
queue slow parent root   bandwidth 256K max 512K
queue bulk parent root   bandwidth 25M default

# …

# Rate-limit certain targets
pass out on egress proto { tcp, udp, icmp } from any to $ratelimit_dst4 modulate state (pflow) set queue slow
#pass out on egress proto { tcp, udp, icmp6 } from any to $ratelimit_dst6 modulate state (pflow) set queue slow

So, the first line defines the root queue on my external interface, and sets the upload bandwidth for 25Mbps (next month, I will be dropping my speed to 25Mbps in favour of an “unlimited” quota).

Then, I define a queue which is restricted to 256kbps (peak 512kbps), and define all traffic going to a specific list of networks, to use that queue. PetalBot should now see a mere 512kbps at most from this end, which should severely crimp how quickly it can guzzle my quota, whilst still permitting it to index my site.

Yesterday, PetalBot chewed through 8GB… let’s see what it does tomorrow.

Dec 132020
 

So, in the last 12 months or so, I’ve grown my music collection in a big way. Basically over the Christmas – New Year break, I was stuck at home, coughing and spluttering due to the bushfire smoke in the area (and yes, I realise it was no where near as bad in Brisbane as it was in other parts of the country).

I spent a lot of time listening to the radio, and one of the local radio stations was doing a “25 years in 25 days” feature, covering many iconic tracks from the latter part of last decade. Now, I’ve always been a big music listener. Admittedly, I’m very much a music luddite, with the vast majority of my music spanning 1965~1995… with some spill over as far back as 1955 and going as forward as 2005 (maybe slightly further).

Trouble is, I’m not overly familiar with the names, and the moment I walk into a music shop, I’m like the hungry patron walking into a food court: I want to eat something, but what? My mind goes blank as my mind is bombarded with all kinds of possibilities.

So when this count-down appeared on the radio, naturally, I found myself looking up the play list, and I came away with a long “shopping list” of songs I’d look for. Since then, a decent amount has been obtained as CDs from the likes of Amazon and Sanity… however, for some songs, I found it was easiest to obtain them as a digital download in FLAC format.

Now, for me, my music is a long-term investment. An investment that transcends changes in media formats. I do agree with ensuring that the “creators” of these works are suitably compensated for their efforts, but I do not agree with paying for the same thing multiple times.

A few people have had to perform in a studio (or on stage), someone’s had to collect the recordings, mix them, work with the creators to assemble those into an album, work with other creative people to come up with cover art, marketing… all that costs money, and I’m happy to contribute to that. The rest is simply an act of duplication: and yes, that has a cost, but it’s minimal and highly automated compared to the process of creating the initial work in the first place.

To me, the physical media represents one “license”, to perform that work, in private, on one device. Even if I make a few million copies myself, so long as I only play one of those copies at a time, I am keeping in the spirit of that license.

Thus, I work on the principle of keeping an “archival” copy, from which I can derive working copies that get day-to-day playback. The day-to-day copy will be in some lossy format for convenience.

A decade ago that was MP3, but due to licensing issues, that became awkward, so I switched over to Ogg/Vorbis, which also reduced the storage requirements by 40% whilst not having much audible impact on the sound quality (if anything, it improved). Since I also had to ditch the illegally downloaded MP3s in the process, that also had a “cleaning” effect: I insisted then on that I have a “license” for each song after that, whether that be wax cylinder, tape reel, 8-track, cassette tape, vinyl record, CD, whatever.

This year saw the first time I returned to music downloads, but this time, downloading legally purchased FLAC files. This leads to an interesting problem, how do you store these files in a manner that will last?

Audio archiving and CDs

I’m far from the first person with this problem, and the problem isn’t specific to audio. The archiving business is big money, and sometimes it does go wrong, whether it be old media being re-purposed (e.g. old tapes of “The Goon Show” being re-recorded with other material by the BBC), destruction (e.g. Universal Studios fire), or just old fashioned media degredation.

The procedure for film-based media (whether it be optical film, or magnetic media) usually involves temperature and humidity control, along with periodic inspection. Time-consuming, expensive, error prone.

CDs are reasonably resilient, particularly proper audio CDs made to the Red Book audio disc standard. In the CD-DA standard, uncompressed PCM audio is Reed Solomon encoded to achieve forward error correction of the PCM data. Thus, if a minor surface defect develops on the media, there is hopefully enough data intact to recover the audio samples and play on as if nothing had happened.

The fact that one can take a disc purchased decades ago, and still play it, is testament to this design feature.

I’m not sure what features exist in DVDs along the same lines. While there is the “video object” container format, the purpose of this seems to be more about copyright protection than about resiliency of the content.

Much of the above applies to pressed media. Recordable media (CD-Rs) sadly isn’t as resilient. In particular, the quality of blanks varies, with some able to withstand years of abuse, and others degrading after 18 months. Notably, the dye fades, and so you start to experience data loss beginning with the edge of the disc.

This works great for stuff I’ve purchased on CDs. Vinyl records if looked after, will also age well, although it’d be nice to have a CD back-up in case my record player packs it in. However, this presents a problem for my digital downloads.

At the moment, my strategy is to download the files to a directory, save a copy of the email receipt with them, place my GPG public key along-side, take SHA-256 hashes of all of the files, then digitally sign the hashes. I then place a copy on an old 1TB HDD, and burn a copy to CD-R or DVD-R. This will get me by for the next few years, but I’ve been “burned” by recordable media failing, and HDDs are not infallible either.

Getting discs pressed only makes sense when you need thousands of copies. I just need one or two. So I need some media that will last the distance, but can be produced in small quantities at home from readily available blanks.

Archiving formats

So, there are a few options out there for archival storage. Let’s consider a few:

Magnetic tape

Professional outfits seem to work on tape storage. Magnetic media, with all the overheads that implies. The newest drive in the house is a DDS-2 DAT drive, the media for which has not been produced in years, so that’s a lame duck. LTO is the new kid on the block, and LTO-6 drives are pricey!

Magneto-Optical

MO drives are another option from the past… we do have a 5¼” SCSI MO drive sitting in the cupboard, which takes 2GB cartridges, but where do you get the media from? Moreover, what do I do when this unit croaks (if it hasn’t already)?

Flash

Flash media sounds tempting, but then one must remember how flash works. It’s a capacitor on the gate of a MOSFET, storing a charge. The dielectric material around this capacitor has a finite resistance, which will cause “leakage” of the charge, meaning over time, your data “rots” much like it does on magnetic media. No one is quite sure what the retention truly is. NOR flash is better for endurance than NAND, but if it’s a recent device with more than about 32MB of storage, it’ll likely be NAND.

PROM

I did consider whether PROMs could be used for this, the idea being you’d work out what you wanted to store, burn a PROM with the data as ISO9660, then package it up with a small MCU that presents it as CD-ROM. The concept could work since it worked great for game consoles from the 80s. In practice they don’t make PROMs big enough. Best I can do is about 1 floppy’s worth: maybe 8 seconds of audio.

Hard drives

HDDs are an option, and for now that’s half my present interim solution. I have a 1TB drive formatted UDF which I store my downloads on. The drive is one of the old object storage drives from the server cluster after I upgraded to 2TB drives. So not a long-term solution. I am presently also recovering data from an old 500GB drive (PATA!), and observing what age does to these disks when they’re not exercised frequently. In short, I can’t rely on this alone.

CDs, DVDs and Bluray

So, we’re back to optical media. All three of these are available as blank record-able media, and even Bluray drives can read CDs. (Unlike LTO: where an LTO-$X drive might be backward compatible with LTO-$(X-2) but no further.)

There are blanks out there that are designed for archival use, notably the M-Disc DVD media, are allegedly capable of lasting 1000 years.

I don’t plan to wait that long to see if their claims stack up.

All of these formats use the same file systems normally, either ISO-9660 or UDF. Neither of these file systems offer any kind of forward error correction of data, so if the dye fades, or the disc gets scratched, you can potentially lose data.

Right now, my other mechanism, is to use CDs and DVDs, burned with the same material I put on the aforementioned 1TB HDD. The optical media is formatted ISO-9660 with Joliet and Rock-Ridge extensions. It works for now, but I know from hard experience that CD-Rs and DVD-Rs aren’t forever. Question is, can they be improved?

File system thoughts

Obviously genuinely better quality media will help in this archiving endeavour, but the thought is can I improve the odds? Can I sacrifice some storage capacity to achieve data resilience?

Audio CDs, as I mentioned, use Reed-Solomon encoding. Specifically, Cross-Interleaved Reed-Solomon encoding. ISO-9660 is a file system that supports extensions on the base standard.

I mentioned two before, Rock-Ridge and Joliet. On top of Rock-Ridge, there’s also zisofs, which adds transparent decompression to a Rock-Ridge file system. What if, I could make a copy of each file’s blocks that were RS-encoded, and placed them around the disc surface so that if the original file was unreadable, we could turn to the forward-error corrected copy?

There is some precedent in such a proposal. In Usenet, the “parchive” format was popularised as a way of adding FEC to files distributed on Usenet. That at least has the concept of what I’m wishing to achieve.

The other area of research is how can I make the ISO-9660 filesystem metadata more resilient. No good the files surviving if the filesystem metadata that records where they are is dead.

Video DVD are often dual UDF/ISO-9660 file systems, the so-called “UDF Bridge” format. Thus, it must be possible for a foreign file system to live amongst the blocks of an ISO-9660 file system. Conceptually, if we could take a copy of the ISO-9660 filesystem metadata, FEC-encode those blocks, and map them around the drive, we can make the file system resilient too.

FEC algorithms are another consideration. RS is a tempting prospect for two reasons:

zfec used in Tahoe-LAFS is another option, as is Golay, and many others. They’ll need to be assessed on their merits.

Anyway, there are some ideas… I’ll ponder further details in another post.

Nov 042020
 

So, today I was doing some work for my workplace, when a critical 4G modem-router dropped offline, cutting me off. Figuring it was a temporary issue, I thought I’d resume this headset rebuild.

The replacement battery had turned up a few weeks back, a beefy 1000mAh 3.7V LiPo pack intended for Raspberry Pi PiJuice battery supplies. This pack was chosen because it has a similar arrangement to the original Logitech battery: single cell with a 10k thermistor. As a bonus, the pack appears to be at least triple the original cell’s capacity. I just had to wire it up.

The new battery being charged by the headset circuitry

The connectors are not compatible, being both physically different sizes and the thermistor and 0V pins being swapped. Black and Red are the same conventions on both packs, that is, red for +3.7V and black for 0V… the thermister connects to the blue wire on the old pack and yellow on the new pack.

I’m at two minds whether to embed the electronics directly into the earmuffs, or whether to just wire the microphone/speaker connections to a DIN-5 connector to make it compatible with my other headsets.

For now at least, the battery is charging, not getting hot, and the circuitry detects the base transceiver when it is plugged into a laptop USB port, so the circuitry is still “functional”.

Aug 082020
 

So, lately I’ve been working from home, which means amongst other things, playing music as loud as I like, not getting distracted by co-workers, and the kettle a short walking distance from my workspace.

Eventually though, I will have to return to the office. For this aim, I’ll need to be able to “drown out” the background noise. Music is good for this, but not everyone is into the same tastes as I am — I am a bit of a music luddite.

Years ago (when the ink was drying on my foundation license) I purchased a Logitech Wireless headset. Model A-00006 (yes, that is quite old now). This headset worked, but it did have two flaws:

  1. the audio isolation wasn’t great, so they tended to “leak” sound
  2. they had the dreaded asymmetric audio sample rate problem with JACK.

Now, for home use I bought a much nicer set which solves issue (2) and isn’t too bad with (1), but I’d like to keep them home. (2) isn’t a problem at work since I don’t generally have the need for the audio routing I use at home. So (2) isn’t going to be a problem.

This headset sat in a box for some years, and over time, the headband and earpads have fallen to bits. The electronics are still good. What if I bought a pair of earmuffs and stuffed the old headset guts inside those? That’d solve issue (1) nicely.

Getting inside

These things didn’t open up without a fight. I found that where the speakers are concerned, you will permanently break the housing. The two sides are joined by a 8-wire ribbon cable. The majority of the electronics is in the right-hand side. The battery is most of the guts of the left side.

You’ll need to destroy the headband to liberate the ribbon cable, and you’ll need to destroy the speakers’ housing to get at the screws behind.

I now have the un-housed headset guts sitting on the table, with the original charger plugged in charging the very flat battery, which is a single-cell 3.7V LiPo pouch cell; no idea what capacity it is, I doubt it’s more than 200mAh.

Charging

I plugged everything back together and tried the headset out. It still works, although instead of indicating a solid amber LED for charging, it was showing a slow blink.

fccid.io has a copy of the original documentation online, as well as photos of the guts here. In that, it did not discuss what a “slow blink” meant, which had me concerned that maybe the battery had been left too long and was no longer safe to charge.

Battery voltage whilst charging is sitting around 4.2V, which sounds fair for a 3.7V cell. It eventually stops blinking, going solid, but then the other LED turns RED. Disconnecting the battery reveals 0V across the pins.

So I might be up for a new battery, the PiJuice batteries look to be a similar arrangement (single cell with termistor pin) and may be a good upgrade anyway.

Next steps

I prefer the microphone on the right-hand side, so that’s one thing I’ll be looking at changing. The ribbon cable connects using small FPC connectors, so I’m thinking I might see if I can de-solder those and put a beefier 2.54″ KK-style connector in its place. This will require soldering some wire-wrap wire up to the pins, but the advantage is it’s a much easier connector to work with.

The break-out board on the left side is very simple, no components other than a momentary switch for detecting the microphone boom position, and pads to which the left speaker, microphone, mute LED and battery connect. I’ll still put the battery in the left side, so there’ll still be 5 wires running across the headband. It should be easier to interface with a new battery as well doing this.

I will also have to bring the buttons and switch out to the outside of the earcup, so I’ll probably use KK connectors for those too. The power switch is a through-hole part, so that should be easy.

I’ll probably replace the proprietary power connector with a barrel jack too. Not sure if these will charge from 5V, the original charger has a 6V output.

I think once I’ve got more hacker-friendly connectors onto this, I should be able to look at readying the new home for the electronics.

Jun 212020
 

So, for a decade now, I’ve been looking at a way of un-tethering myself from VoIP and radio applications. Headsets are great, but it does mean you’re chained to whatever it’s plugged into by your head.

Two solutions exist for this: get a headset with a wireless interface, or get a portable device to plug the headset into.

The devices I’d like to use are a combination of analogue and computer-based devices. Some have Bluetooth… I did buy the BU-1 module for my Yaesu VX-8DR years, and still have it installed in the FTM-350AR, but found it was largely a gimmick as it didn’t work with the majority of Bluetooth headsets on the market, and was buggy with the ones it did work with.

Years ago, I hit upon a solution using a wireless USB headset and a desktop PC connected to the radio. Then, the headset was one of the original Logitech wireless USB headsets, which I still have and still works… although it is in need of a re-build as the head-band is falling to pieces and the leatherette covering on the earpieces has long perished.

One bug bear I had with it though, is that the microphone would only do 16kHz sample rates. At the time when I did that set-up, I was running a dual-Pentium III workstation with a PCI sound-card which was fairly frequency-agile, so it could work to the limitations of the headset, however newer sound cards are generally locked to 48kHz, and maybe support 44.1kHz if you’re lucky.

I later bought a Logitech G930 headset, but found it had the same problem. It also was a bit “flat” sounding, given it is a “surround sound” headset, it compromises on the audio output. This, and the fact that JACK just would not talk to it any higher than 16kHz, meant that it was relegated to VoIP only. I’ve been using VoIP a lot thanks to China’s little gift, even doing phone patches using JACK, Twinkle and Zoom for people who don’t have Internet access.

I looked into options. One option I considered was the AstroGaming A50 Gen 4. These are a pricey option, but one nice feature is they do have an analogue interface, so in theory, it could wire direct to my old radios. That said, I couldn’t find any documentation on the sample rates supported, so I asked.

… crickets chirping …

After hearing nothing, I decided they really didn’t want my business, and there were things that made me uneasy about sinking $500 on that set. In the end I stumbled on these.

Reading the specs, the microphone frequency range was 30Hz to 20kHz… that meant for them to not be falsely advertising, they had to be sampling at at least 40kHz. 44.1 or 48kHz was fine by me. These are pricey too, but not as bad as the A50s, retailing at AU$340.

I took the plunge:

RC=0 stuartl@rikishi ~ $ cat /proc/asound/card1/stream0 
Unknown manufacturer ATH-G1WL at usb-0000:00:14.0-4, full speed : USB Audio

Playback:
  Status: Running
    Interface = 1
    Altset = 1
    Packet Size = 200
    Momentary freq = 48000 Hz (0x30.0000)
  Interface 1
    Altset 1
    Format: S16_LE
    Channels: 2
    Endpoint: 1 OUT (ADAPTIVE)
    Rates: 48000
    Bits: 16

Capture:
  Status: Running
    Interface = 2
    Altset = 1
    Packet Size = 100
    Momentary freq = 48000 Hz (0x30.0000)
  Interface 2
    Altset 1
    Format: S16_LE
    Channels: 1
    Endpoint: 2 IN (ASYNC)
    Rates: 48000
    Bits: 16

Okay, so these are locked to 48kHz… that works for me. Oddly enough, I used to get a lot of XRUNS in jackd with the Logitech sets… I don’t get any using this set, even with triple the sample rate. This set is very well behaved with jackd. Allegedly the headset is “broadcast”-grade.

Well, not sure about that, but it performs a lot better than the Logitech ones did. AudioTechnica have plenty of skin in the audio game, have done for decades (the cartridge on my father’s turntable… an old Rotel from the late 70s) was manufactured by them. So it’s possible, but it’s also possible it’s marketing BS.

The audio quality is decent though. I’ve only used it for VoIP applications so far, people have noticed the microphone is more “bassy”.

The big difference from my end, is notifications from Slack and my music listening. Previously, since I was locked to 16kHz on the headset, it was no good listening to music there since I got basically AM radio quality. So I used the on-board sound card for that with PulseAudio. Slack though (which my workplace uses), refuses to send notification sounds there, so I missed hearing notifications as I didn’t have the headset on.

Now, I have the music going through the headset with the notification sounds. I miss nothing. PulseAudio also had a habit of “glitching”, momentary drop-outs in audio. This is gone when using JACK.

My latency is just 64msec. I can’t quite ditch PulseAudio, as without it, tools like Zoom won’t see the JACK virtual source/sink… this seems to be a limitation of the QtMultimedia back-end being used: it doesn’t list virtual sound interfaces and doesn’t let people put in arbitrary ALSA device strings (QAudioDeviceInfo just provides a list).

At the moment though, I don’t need to route between VoIP applications (other than Twinkle, which talks to ALSA direct), so I can live with it staying there for now.

Jun 082020
 

Well, it’s been a while since we’ve had a cat in the house. Our establishment has been home to many pets over the years, mostly dogs and cats.

Our last was a domestic moggy named Emma, who we had as a kitten back in 1990 and past away in 2008 a few months short of her 18th birthday. A good innings for a feline!

Recently, my maternal grandmother passed away. She had been living alone since my grandfather had to move to a nursing home, and making a red-hot go of it, but bleeding on the brain eventually caused her demise. She was looking after a cat there, a 5-year old Russian Blue / domestic short-hair cross named Sam.

Just before my grandmothers’ passing, I had raised concerns about his welfare, since no one was in the house looking after him. Apparently he was getting occasional visits to re-fill bowls and take out cat litter trays, but it wasn’t ideal. There was a definite concern that he could be “forgotten”.

My two uncles on that side both have pets that Sam likely would not get along with. My mother has twice as many cats as she’s theoretically allowed. My sister’s husband dislikes cats. Most of my cousins are in rental accommodation. I was one of the few in the family that could take on a cat.

Not ideal for us, because we do go away for WICEN events occasionally and the people we’d have left Emma with are now either passed away or in palliative care. So we’ll have to figure something out when the time comes. But, at least here he’s got human attention, he’s got food, he’s got a clean litter tray, and a bigger house to run around in.

For now, he’s in my name for welfare purposes… the estate isn’t worked out yet (there’s 6 months delay in case someone “contests” the will). I can transfer ownership of him to the person officially inheriting him, but smart money is that’ll be me anyway.

Of course one fun thing is that thanks to a little gift from China, I’m working from home. Right now, the task at hand is developing a Modbus driver, so my “workbench”^W^Wthe dining room table has my laptop, and an industrial PC that’s pretending to be a Modbus/RTU device.

Sam has discovered jumping on the keyboard gets instant attention:

Sam, trying a, errm, paw, at JavaScript
The “JavaScript” code

Yeah, why am I reminded of the FVWM Cats page?

Jun 012020
 

Brisbane Area WICEN Group (Inc) lately has been caught up in this whole COVID-19 situation, unable to meet face-to-face for business meetings. Like a lot of groups, we’ve had to turn to doing things online.

Initially, Cisco WebEx was trialled, however this had significant compatibility issues, most notably, under Linux — it just straight plain didn’t work. Zoom however, has proven fairly simple to operate and seems to work, so we’ve been using that for a number of “social” meetings and at least one business meeting so far.

A challenge we have though, is that one of our members does not have a computer or smart-phone. Mobile telephony is unreliable in his area (Kelvin Grove), and so yee olde PSTN is the most reliable service. For him to attend meetings, we need some way of patching that PSTN line into the meeting.

The first step is to get something you can patch to. In my case, it was a soft-phone and a SIP VoIP service. I used Twinkle to provide that link. You could also use others like baresip, Linphone or anything else of your choosing. This connects to your sound card at one end, and a Voice Service Provider; in my case it’s my Asterisk server through Internode NodePhone.

The problem is though, while you can certainly make a call outbound whilst in a conference, the person on the phone won’t be able to hear the conference, nor will the conference attendees be able to hear the person on the phone.

Enter JACK

JACK is a audio routing framework for Unix-like operating systems that allows for audio to be routed between applications. It is geared towards multimedia production and professional audio, but since there’s a plug-in in the ALSA framework, it is very handy for linking audio between applications that would otherwise be incompatible.

For this to work, one application has to work either directly with JACK, or via the ALSA plug-in. Many support, and will use, an alternate framework called PulseAudio. Conference applications like Zoom and Jitsi almost universally rely on this as their sound card interface on Linux.

PulseAudio unfortunately is not able to route audio with the same flexibility, but it can route audio to JACK. In particular, JACKv2 and its jackdbus is the path of least resistance. Once JACK starts, PulseAudio detects its presence, and loads a module that connects PulseAudio as a client of JACK.

A limitation with this is PulseAudio will pre-mix all audio streams it receives from its clients into one single monolithic (stereo) feed before presenting that to JACK. I haven’t figured out a work-around for this, but thankfully for this use case, it doesn’t matter. For our purposes, we have just one PulseAudio application: Zoom (or Jitsi), and so long as we keep it that way, things will work.

Software tools

  • jack2: The audio routing daemon.
  • qjackctl: This is a front-end for controlling JACK. It is optional, but if you’re not familiar with JACK, it’s the path of least resistance. It allows you to configure, start and stop JACK, and to control patch-bay configuration.
  • SIP Client, in my case, Twinkle.
  • ALSA JACK Plug-in, part of alsa-plugins.
  • PulseAudio JACK plug-in, part of PulseAudio.

Setting up the JACK ALSA plug-in

To expose JACK to ALSA applications, you’ll need to configure your ${HOME}/.asoundrc file. Now, if your SIP client happens to support JACK natively, you can skip this step, just set it up to talk to JACK and you’re set.

Otherwise, have a look at guides such as this one from the ArchLinux team.

I have the following in my .asoundrc:

pcm.!default {
        type plug
        slave { pcm "jack" }
}

pcm.jack {
        type jack
        playback_ports {
                0 system:playback_1
                1 system:playback_2
        }
        capture_ports {
                0 system:capture_1
                1 system:capture_1
        }
}

The first part sets my default ALSA device to jack, then the second block defines what jack is. You could possibly skip the first block, in which case your SIP client will need to be told to use jack (or maybe plug:jack) as the ALSA audio device for input/output.

Configuring qjackctl

At this point, to test this we need a JACK audio server running, so start qjackctl. You’ll see a window like this:

qjackctl in operation

This shows it actually running, most likely for you this will not be the case. Over on the right you’ll see Setup… — click that, and you’ll get something like this:

Parameters screen

The first tab is the parameters screen. Here, you’ll want to direct this at your audio device that your speakers/microphone are connected to.

The sample rate may be limited by your audio device. In my experience, JACK hates devices that can’t do the same sample rate for input and output.

My audio device is a Logitech G930 wireless USB headset, and it definitely has this limitation: it can play audio right up to 48kHz, but will only do a meagre 16kHz on capture. JACK thus limits me to both directions running 16kHz. If your device can do 48kHz, that’d be better if you intend to use it for tasks other than audio conferencing. (If your device is also wireless, I’d be interested in knowing where you got it!)

JACK literature seems to recommend 3 periods/buffer for USB devices. The rest is a matter of experiment. 1024 samples/period seems to work fine on my hardware most of the time. Your mileage may vary. Good setups may get away with less, which will decrease latency (mine is 192ms… good enough for me).

The other tab has more settings:

Advanced settings

The things I’ve changed here are:

  • Force 16-bit: since my audio device cannot do anything but 16-bit linear PCM, I force 16-bit mode (rather than the default of 32-bit mode)
  • Channels I/O: output is stereo but input is mono, so I set 1 channel in, two channels out.

Once all is set, Apply then OK.

Now, on qjackctl itself, click the “Start” button. It should report that it has started. You don’t need to click any play buttons to make it work from here. You may have noticed that PulseAudio has detected the JACK server and will now connect to it. If click “Graph”, you’ll see something like this:

qjackctl‘s Graph window

This is the thing you’ll use in qjackctl the most. Here, you can see the “system” boxes represent your audio device, and “PulseAudio JACK Sink”/”PulseAudio JACK Source” represent everything that’s connected to PulseAudio.

You should be able to play sound in PulseAudio, and direct applications there to use the JACK virtual sound card. pavucontrol (normally shipped with PulseAudio) may be handy for moving things onto the JACK virtual device.

Configuring your telephony client

I’ll use Twinkle as the example here. In the preferences, look for a section called Audio. You should see this:

Twinkle audio settings

Here, I’ve set my ringing device to pulse to have that ring PulseAudio. This allows me to direct the audio to my laptop’s on-board sound card so I can hear the phone ring without the headset on.

Since jack was made my default device, I can leave the others as “Default Device”. Otherwise, you’d specify jack or plug:jack as the audio device. This should be set on both Speaker and Microphone settings.

Click OK once you’re done.

Configuring Zoom

I’ll use Zoom here, but the process is similar for Jitsi. In the settings, look for the Audio section.

Zoom audio settings

Set both Speaker and Microphone to JACK (sink and source respectively). Use the “Test Speaker” function to ensure it’s all working.

The patch up

Now, it doesn’t matter whether you call first, then join the meeting, or vice versa. You can even have the PSTN caller call you. The thing is, you want to establish a link to both your PSTN caller and your conference.

The assumption is that you now have a session active in both programs, you’re hearing both the PSTN caller and the conference in your headset, when you speak, both groups hear you. To let them hear each other, do this:

Go to qjackctl‘s patch bay. You’ll see PulseAudio is there, but you’ll also see the instance of the ALSA plug-in connected to JACK. That’s your telephony client. Both will be connected to the system boxes. You need to draw new links between those two new boxes, and the PulseAudio boxes like this:

qjackctl patching Twinkle to Zoom

Here, Zoom is represented by the PulseAudio boxes (since it is using PulseAudio to talk to JACK), and Twinkle is represented by the boxes named alsa-jack… (tip: the number is the PID of the ALSA application if you’re not sure).

Once you draw the connections, the parties should be able to hear each-other. You’ll need to monitor this dialogue from time to time: if either of PulseAudio or the phone client disconnect from JACK momentarily, the connections will need to be re-made. Twinkle will do this if you do a three-way conference, then one person hangs up.

Anyway, that’s the basics covered. There’s more that can be done, for example, recording the audio, or piping audio from something else (e.g. a media player) is just a case of directing it either at JACK directly or via the ALSA plug-in, and drawing connections where you need them.

May 262020
 

Lately, I’ve been socially distancing a home and so there’s been a few projects that have been considered that otherwise wouldn’t ordinarily get a look in on a count of lack-of-time.

One of these has been setting up a Raspberry Pi with DRAWS board for use on the bicycle as a radio interface. The DRAWS interface is basically a sound card, RTC, GPS and UART interface for radio interfacing applications. It is built around the TI TMS320AIC3204.

Right now, I’m still waiting for the case to put it in, even though the PCB itself arrived months ago. Consequently it has not seen action on the bike yet. It has gotten some use though at home, primarily as an OpenThread border router for 3 WideSky hubs.

My original idea was to interface it to Mumble, a VoIP server for in-game chat. The idea being that, on events like the Yarraman to Wulkuraka bike ride, I’d fire up the phone, connect it to an AP run by the Raspberry Pi on the bike, and plug my headset into the phone:144/430MHz→2.4GHz cross-band.

That’s still on the cards, but another use case came up: digital. It’d be real nice to interface this over WiFi to a stronger machine for digital modes. Sound card over network sharing. For this, Mumble would not do, I need a lossless audio transport.

Audio streaming options

For audio streaming, I know of 3 options:

  • PulseAudio network streaming
  • netjack
  • trx

PulseAudio I’ve found can be hit-and-miss on the Raspberry Pi, and IMO, is asking for trouble with digital modes. PulseAudio works fine for audio (speech, music, etc). It will make assumptions though about the nature of that audio. The problem is we’re not dealing with “audio” as such, we’re dealing with modem tones. Human ears cannot detect phase easily, data modems can and regularly do. So PA is likely to do things like re-sample the audio to synchronise the two stations, possibly use lossy codecs like OPUS or CELT, and make other changes which will mess with the signal in unpredictable ways.

netjack is another possibility, but like PulseAudio, is geared towards low-latency audio streaming. From what I’ve read, later versions use OPUS, which is a no-no for digital modes. Within a workstation, JACK sounds like a close fit, because although it is geared to audio, its use in professional audio means it’s less likely to make decisions that would incur loss, but it is a finicky beast to get working at times, so it’s a question mark there.

trx was a third option. It uses RTP to stream audio over a network, and just aims to do just that one thing. Digging into the code, present versions use OPUS, older versions use CELT. The use of RTP seemed promising though, it actually uses oRTP from the Linphone project, and last weekend I had a fiddle to see if I could swap out OPUS for linear PCM. oRTP is not that well documented, and I came away frustrated, wondering why the receiver was ignoring the messages being sent by the sender.

It’s worth noting that trx probably isn’t a good example of a streaming application using oRTP. It advertises the stream as G711u, but then sends OPUS data. What it should be doing is sending it as a dynamic content type (e.g. 96), and if this were a SIP session, there’d be a RTPMAP sent via Session Description Protocol to say content type 96 was OPUS.

I looked around for other RTP libraries to see if there was something “simpler” or better documented. I drew a blank. I then had a look at the RTP/RTCP specs themselves published by the IETF. I came to the conclusion that RTP was trying to solve a much more complicated use case than mine. My audio stream won’t traverse anything more sophisticated than a WiFi AP or an Ethernet switch. There’s potential for packet loss due to interference or weak signal propagation between WiFi nodes, but latency is likely to remain pretty consistent and out-of-order handling should be almost a non-issue.

Another gripe I had with RTP is its almost non-consideration of linear PCM. PCMA and PCMU exist, 16-bit linear PCM at 44.1kHz sampling exists (woohoo, CD quality), but how about 48kHz? Nope. You have to use SDP for that.

Custom protocol ideas

With this in mind, my own custom protocol looks like the simplest path forward. Some simple systems that used by GQRX just encapsulate raw audio in UDP messages, fire them at some destination and hope for the best. Some people use TCP, with reasonable results.

My concern with TCP is that if packets get dropped, it’ll try re-sending them, increasing latency and never quite catching up. Using UDP side-steps this, if a packet is lost, it is forgotten about, so things will break up, then recover. Probably a better strategy for what I’m after.

I also want some flexibility in audio streams, it’d be nice to be able to switch sample rates, bit depths, channels, etc. RTP gets close with its L16/44100/2 format (the Philips Red-book standard audio format). In some cases, 16kHz would be fine, or even 8kHz 16-bit linear PCM. 44.1k works, but is wasteful. So a header is needed on packets to at least describe what format is being sent. Since we’re adding a header, we might as well set aside a few bytes for a timestamp like RTP so we can maintain synchronisation.

So with that, we wind up with these fields:

  • Timestamp
  • Sample rate
  • Number of channels
  • Sample format

Timestamp

The timestamp field in RTP is basically measured in ticks of some clock of known frequency, e.g. for PCMU it is a 8kHz clock. It starts with some value, then increments up monotonically. Simple enough concept. If we make this frequency the sample rate of the audio stream, I think that will be good enough.

At 48kHz 16-bit stereo; data will be streaming at 192kbps. We can tolerate wrap-around, and at this data rate, we’d see a 16-bit counter overflow every ~341ms, which whilst not unworkable, is getting tight. Better to use a 32-bit counter for this, which would extend that overflow to over 6 hours.

Sample rate encoding

We can either support an integer field, or we can encode the rate somehow. An integer field would need a range up to 768k to support every rate ALSA supports. That’s another 32-bit integer. Or, we can be a bit clever: nearly every sample rate in common use is a harmonic of 8kHz or 11.025kHz, so we devise a scheme consisting of a “base” rate and multiplier. 48kHz? That’s 8kHz×6. 44.1kHz? That’s 11.025kHz×4.

If we restrict ourselves to those two base rates, we can support standard rates from 8kHz through to 1.4MHz by allocating a single bit to select 8kHz/11.025kHz and 7 bits for the multiplier: the selected sample rate is the base rate multiplied by the multipler incremented by one. We’re unlikely to use every single 8kHz step though. Wikipedia lists some common rates and as we go up, the steps get bigger, so let’s borrow 3 multiplier bits for a left-shift amount.

7 6 5 4 3 2 1 0
B S S S M M M M

B = Base rate: (0) 8000 Hz, (1) 11025 Hz
S = Shift amount
M = Multiplier - 1

Rate = (Base << S) * (M + 1)

Examples:
  00000000b (0x00): 8kHz
  00010000b (0x10): 16kHz
  10100000b (0xa0): 44.1kHz
  00100000b (0x20): 48kHz
  01010010b (0x52): 768kHz (ALSA limit)
  11111111b (0xff): 22.5792MHz (yes, insane)

Other settings

I primarily want to consider linear PCM types. Technically that includes unsigned PCM, but since that’s losslessly transcodable to signed PCM, we could ignore it. So we could just encode the number of bytes needed for a single channel sample, minus one. Thus 0 would be 8-bits; 1 would be 16-bits; 2 would be 32-bits and 3 would be 64-bits. That needs just two bits. For future-proofing, I’d probably earmark two extra bits; reserved for now, but might be used to indicate “compressed” (and possibly lossy) formats.

The remaining 4 bits could specify a number of channels, again minus 1 (mono would be 0, stereo 1, etc up to 16).

Packet type

For the sake of alignment, I might include a 16-bit identifier field so the packet can be recognised as being this custom audio format, and to allow multiplexing of in-band control messages, but I think the concept is there.