May 262020

Lately, I’ve been socially distancing a home and so there’s been a few projects that have been considered that otherwise wouldn’t ordinarily get a look in on a count of lack-of-time.

One of these has been setting up a Raspberry Pi with DRAWS board for use on the bicycle as a radio interface. The DRAWS interface is basically a sound card, RTC, GPS and UART interface for radio interfacing applications. It is built around the TI TMS320AIC3204.

Right now, I’m still waiting for the case to put it in, even though the PCB itself arrived months ago. Consequently it has not seen action on the bike yet. It has gotten some use though at home, primarily as an OpenThread border router for 3 WideSky hubs.

My original idea was to interface it to Mumble, a VoIP server for in-game chat. The idea being that, on events like the Yarraman to Wulkuraka bike ride, I’d fire up the phone, connect it to an AP run by the Raspberry Pi on the bike, and plug my headset into the phone:144/430MHz→2.4GHz cross-band.

That’s still on the cards, but another use case came up: digital. It’d be real nice to interface this over WiFi to a stronger machine for digital modes. Sound card over network sharing. For this, Mumble would not do, I need a lossless audio transport.

Audio streaming options

For audio streaming, I know of 3 options:

  • PulseAudio network streaming
  • netjack
  • trx

PulseAudio I’ve found can be hit-and-miss on the Raspberry Pi, and IMO, is asking for trouble with digital modes. PulseAudio works fine for audio (speech, music, etc). It will make assumptions though about the nature of that audio. The problem is we’re not dealing with “audio” as such, we’re dealing with modem tones. Human ears cannot detect phase easily, data modems can and regularly do. So PA is likely to do things like re-sample the audio to synchronise the two stations, possibly use lossy codecs like OPUS or CELT, and make other changes which will mess with the signal in unpredictable ways.

netjack is another possibility, but like PulseAudio, is geared towards low-latency audio streaming. From what I’ve read, later versions use OPUS, which is a no-no for digital modes. Within a workstation, JACK sounds like a close fit, because although it is geared to audio, its use in professional audio means it’s less likely to make decisions that would incur loss, but it is a finicky beast to get working at times, so it’s a question mark there.

trx was a third option. It uses RTP to stream audio over a network, and just aims to do just that one thing. Digging into the code, present versions use OPUS, older versions use CELT. The use of RTP seemed promising though, it actually uses oRTP from the Linphone project, and last weekend I had a fiddle to see if I could swap out OPUS for linear PCM. oRTP is not that well documented, and I came away frustrated, wondering why the receiver was ignoring the messages being sent by the sender.

It’s worth noting that trx probably isn’t a good example of a streaming application using oRTP. It advertises the stream as G711u, but then sends OPUS data. What it should be doing is sending it as a dynamic content type (e.g. 96), and if this were a SIP session, there’d be a RTPMAP sent via Session Description Protocol to say content type 96 was OPUS.

I looked around for other RTP libraries to see if there was something “simpler” or better documented. I drew a blank. I then had a look at the RTP/RTCP specs themselves published by the IETF. I came to the conclusion that RTP was trying to solve a much more complicated use case than mine. My audio stream won’t traverse anything more sophisticated than a WiFi AP or an Ethernet switch. There’s potential for packet loss due to interference or weak signal propagation between WiFi nodes, but latency is likely to remain pretty consistent and out-of-order handling should be almost a non-issue.

Another gripe I had with RTP is its almost non-consideration of linear PCM. PCMA and PCMU exist, 16-bit linear PCM at 44.1kHz sampling exists (woohoo, CD quality), but how about 48kHz? Nope. You have to use SDP for that.

Custom protocol ideas

With this in mind, my own custom protocol looks like the simplest path forward. Some simple systems that used by GQRX just encapsulate raw audio in UDP messages, fire them at some destination and hope for the best. Some people use TCP, with reasonable results.

My concern with TCP is that if packets get dropped, it’ll try re-sending them, increasing latency and never quite catching up. Using UDP side-steps this, if a packet is lost, it is forgotten about, so things will break up, then recover. Probably a better strategy for what I’m after.

I also want some flexibility in audio streams, it’d be nice to be able to switch sample rates, bit depths, channels, etc. RTP gets close with its L16/44100/2 format (the Philips Red-book standard audio format). In some cases, 16kHz would be fine, or even 8kHz 16-bit linear PCM. 44.1k works, but is wasteful. So a header is needed on packets to at least describe what format is being sent. Since we’re adding a header, we might as well set aside a few bytes for a timestamp like RTP so we can maintain synchronisation.

So with that, we wind up with these fields:

  • Timestamp
  • Sample rate
  • Number of channels
  • Sample format


The timestamp field in RTP is basically measured in ticks of some clock of known frequency, e.g. for PCMU it is a 8kHz clock. It starts with some value, then increments up monotonically. Simple enough concept. If we make this frequency the sample rate of the audio stream, I think that will be good enough.

At 48kHz 16-bit stereo; data will be streaming at 192kbps. We can tolerate wrap-around, and at this data rate, we’d see a 16-bit counter overflow every ~341ms, which whilst not unworkable, is getting tight. Better to use a 32-bit counter for this, which would extend that overflow to over 6 hours.

Sample rate encoding

We can either support an integer field, or we can encode the rate somehow. An integer field would need a range up to 768k to support every rate ALSA supports. That’s another 32-bit integer. Or, we can be a bit clever: nearly every sample rate in common use is a harmonic of 8kHz or 11.025kHz, so we devise a scheme consisting of a “base” rate and multiplier. 48kHz? That’s 8kHz×6. 44.1kHz? That’s 11.025kHz×4.

If we restrict ourselves to those two base rates, we can support standard rates from 8kHz through to 1.4MHz by allocating a single bit to select 8kHz/11.025kHz and 7 bits for the multiplier: the selected sample rate is the base rate multiplied by the multipler incremented by one. We’re unlikely to use every single 8kHz step though. Wikipedia lists some common rates and as we go up, the steps get bigger, so let’s borrow 3 multiplier bits for a left-shift amount.

7 6 5 4 3 2 1 0

B = Base rate: (0) 8000 Hz, (1) 11025 Hz
S = Shift amount
M = Multiplier - 1

Rate = (Base << S) * (M + 1)

  00000000b (0x00): 8kHz
  00010000b (0x10): 16kHz
  10100000b (0xa0): 44.1kHz
  00100000b (0x20): 48kHz
  01010010b (0x52): 768kHz (ALSA limit)
  11111111b (0xff): 22.5792MHz (yes, insane)

Other settings

I primarily want to consider linear PCM types. Technically that includes unsigned PCM, but since that’s losslessly transcodable to signed PCM, we could ignore it. So we could just encode the number of bytes needed for a single channel sample, minus one. Thus 0 would be 8-bits; 1 would be 16-bits; 2 would be 32-bits and 3 would be 64-bits. That needs just two bits. For future-proofing, I’d probably earmark two extra bits; reserved for now, but might be used to indicate “compressed” (and possibly lossy) formats.

The remaining 4 bits could specify a number of channels, again minus 1 (mono would be 0, stereo 1, etc up to 16).

Packet type

For the sake of alignment, I might include a 16-bit identifier field so the packet can be recognised as being this custom audio format, and to allow multiplexing of in-band control messages, but I think the concept is there.

May 122020

So, the other day I pondered about whether BlueTrace could be ported to an older device, or somehow re-implemented so it would be compatible with older phones.

The Australian Government has released their version of TraceTogether, COVIDSafe, which is available for newer devices on the Google and Apple application repositories. It suffers a number of technical issues, one glaring one being that even on devices it theoretically supports, it doesn’t work properly unless you have it running in the foreground and your phone unlocked!

Well, there’s a fail right there! Lots of people, actually need to be able to lock their phones. (e.g. a condition of their employment, preventing pocket dials, saving battery life, etc…)

My phone, will never run COVIDSafe, as provided. Even compiling it for Android 4.1 won’t be enough, it uses Bluetooth Low Energy, which is a Bluetooth 4.0 feature. However, the government did one thing right, they have published the source code. A quick fish-eye over the diff against TraceTogether, suggests the changes are largely superficial.

Interestingly, although the original code is GPLv3, our government has decided to supply their own license. I’m not sure how legal that is. Others have questioned this too.

So, maybe I can run it after all? All I need is a device that can do BLE. That then “phones home” somehow, to retrieve tokens or upload data. Newer phones (almost anything Android-based) usually can do WiFi hotspot, which would work fine with a ESP32.

Older phones don’t have WiFi at all, but many can still provide an Internet connection over a Bluetooth link, likely via the LAN Access Profile. I think this would mean my “token” would need to negotiate HTTPS itself. Not fun on a MCU, but I suspect someone has possibly done it already on ESP32.

Nordic platforms are another option if we go the pure Bluetooth route. I have two nRF52840-DK boards kicking around here, bought for OpenThread development, but not yet in use. A nicety is these do have a holder for a CR2032 cell, so can operate battery-powered.

Either way, I think it important that the chosen platform be:

  1. easily available through usual channels
  2. cheap
  3. hackable, so the devices can be re-purposed after this COVID-19 nonsense blows over

A first step might be to see if COVIDSafe can be cleaved in two… with the BLE part running on a ESP32 or nRF52840, and the HTTPS part running on my Android phone. Also useful, would be some sort of staging server so I can test my code without exposing things. Not sure if there is such a beast publicly available that we can all make use of.

Guess that’ll be the next bit to look at.

May 032020

This afternoon, I was pondering about how I might do text-to-speech, but still have the result sound somewhat natural. For what use case? Well, two that come to mind…

The first being for doing “strapper call” announcements at horse endurance rides. A horse endurance ride is where competitors and their horses traverse a long (sometimes as long as 320km) trail through a wilderness area. Usually these rides (particularly the long ones) are broken up into separate stages or “legs”.

Upon arrival back at base, the competitor has a limited amount of time to get the horse’s vital signs into acceptable ranges before they must present to the vet. If the horse has a too-high temperature, or their horse’s heart rate is too high, they are “vetted out”.

When the competitor reaches the final check-point, ideally you want to let that competitor’s support team know they’re on their way back to base so they can be there to meet the competitor and begin their work with the horse.

Historically, this was done over a PA system, however this isn’t always possible for the people at base to achieve. So having an automated mechanism to do this would be great. In recent times, Brisbane WICEN has been developing a public display that people can see real-time results on, and this also doubles as a strapper-call display.

Getting the information to that display is something of a work-in-progress, but it’s recognised that if you miss the message popping up on the display, there’s no repeat. A better solution would be to “read out” the message. Then you don’t have to be watching the screen, you can go about your business. This could be done over a PA system, or at one location there’s an extensive WiFi network there, so streaming via Icecast is possible.

But how do you get the text into speech?

Enter flite

flite is a minimalist speech synthesizer from the Festival project. Out of the box it includes 3 voices, mostly male American voices. (I think the rms one might be Richard M. Stallman, but I could be wrong on that!) There’s a couple of demos there that can be run direct from the command line.

So, for the sake of argument, let’s try something simple, I’ll use the slt voice (a US female voice) and just get the program to read out what might otherwise be read out during a horse ride event:

$ flite_cmu_us_slt -t 'strapper call for the 160 kilometer event competitor numbers 123 and 234' slt-strapper-nopunctuation-digits.wav

Not bad, but not that great either. Specifically, the speech is probably a little quick. The question is, how do you control this? Turns out there’s a bit of hidden functionality.

There is an option marked -ssml which tells flite to interpret the text as SSML. However, if you try it, you may find it does little to improve matters, I don’t think flite actually implements much of it.

Things are improved if we spell everything out. So if you instead replace the digits with words, you do get a better result:

$ flite_cmu_us_slt -t 'strapper call for the one hundred and sixty kilometer event competitor number one two three and two three four' slt-strapper-nopunctuation-words.wav

Definitely better. It could use some pauses. Now, we don’t have very fine-grained control over those pauses, but we can introduce some punctuation to have some control nonetheless.

$ flite_cmu_us_slt -t 'strapper call.  for the one hundred and sixty kilometer event.  competitor number one two three and two three four' slt-strapper-punctuation.wav

Much better. Of course it still sounds somewhat robotic though. I’m not sure how to adjust the cadence on the whole, but presumably we can just feed the text in piece-wise, render those to individual .wav files, then stitch them together with the pauses we want.

How about other changes though? If you look at flite --help, there is feature options which can control the synthesis. There’s no real documentation on what these do, what I’ve found so far was found by grep-ing through the flite source code. Tip: do a grep for feat_set_, and you’ll see a whole heap.

Controlling pitch

There’s two parameters for the pitch… int_f0_target_mean controls the “centre” frequency of the speech in Hertz, and int_f0_target_stddev controls the deviation. For the slt voice, …mean seems to sit around 160Hz and the deviation is about 20Hz.

So we can say, set the frequency to 90Hz and get a lower tone:

$ flite_cmu_us_slt --setf int_f0_target_mean=90 -t 'strapper call' slt-strapper-mean-90.wav

… or 200Hz for a higher one:

$ flite_cmu_us_slt --setf int_f0_target_mean=200 -t 'strapper call' slt-strapper-mean-200.wav

… or we can change the variance:

$ flite_cmu_us_slt --setf int_f0_target_stddev=0.0 -t 'strapper call' slt-strapper-stddev-0.wav
$ flite_cmu_us_slt --setf int_f0_target_stddev=70.0 -t 'strapper call' slt-strapper-stddev-70.wav

We can’t change these values during a block of speech, but presumably we can cut up the text we want to render, render each piece at the frequency/variance we want, then stitch those together.

Controlling rate

So I mentioned we can control the rate, somewhat coarsely using usual punctuation devices. We can also change the rate overall by setting duration_stretch. This basically is a control of how “long” we want to stretch out the pronunciation of words.

$ flite_cmu_us_slt --setf duration_stretch=0.5 -t 'strapper call' slt-strapper-stretch-05.wav
$ flite_cmu_us_slt --setf duration_stretch=0.7 -t 'strapper call' slt-strapper-stretch-07.wav
$ flite_cmu_us_slt --setf duration_stretch=1.0 -t 'strapper call' slt-strapper-stretch-10.wav
$ flite_cmu_us_slt --setf duration_stretch=1.3 -t 'strapper call' slt-strapper-stretch-13.wav
$ flite_cmu_us_slt --setf duration_stretch=2.0 -t 'strapper call' slt-strapper-stretch-20.wav

Putting it together

So it looks as if all the pieces are there, we just need to stitch them together.

RC=0 stuartl@rikishi /tmp $ flite_cmu_us_slt --setf duration_stretch=1.2 --setf int_f0_target_stddev=50.0 --setf int_f0_target_mean=180.0 -t 'strapper call' slt-strapper-call.wav
RC=0 stuartl@rikishi /tmp $ flite_cmu_us_slt --setf duration_stretch=1.1 --setf int_f0_target_stddev=30.0 --setf int_f0_target_mean=180.0 -t 'for the, one hundred, and sixty kilometer event' slt-160km-event.wav
RC=0 stuartl@rikishi /tmp $ flite_cmu_us_slt --setf duration_stretch=1.4 --setf int_f0_target_stddev=40.0 --setf int_f0_target_mean=180.0 -t 'competitors, one two three, and, two three four' slt-competitors.wav
Above files stitched together in Audacity

Here, I manually imported all three files into Audacity, arranged them, then exported the result, but there’s no reason why the same could not be achieved by a program, I’m just inserting pauses after all.

There are tools for manipulating RIFF waveform files in most languages, and generating silence is not rocket science. The voice itself could be fine-tuned, but that’s simply a matter of tweaking settings. Generating the text is basically a look-up table feeding into snprintf (or its equivalent in your programming language of choice).

It’d be nice to implement a wrapper around flite that took the full SSML or JSML text and rendered it out as speech, but this gets pretty close without writing much code at all. Definitely worth continuing with.

Apr 192020

COVID-SARS-2 is a nasty condition caused by COVID-19 that has seen many a person’s life cut short. The COVID-19 virus which originated from Wuhan, China has one particularly insidious trait: it can be spread by asymptomatic people. That is, you do not have to be suffering symptoms to be an infectious carrier of the condition.

As frustrating as isolation has been, it’s really our only viable solution to preventing this infectious condition from spreading like wildfire until we get a vaccine that will finally knock it on the head.

One solution that has been proposed has been to use contract tracing applications which rely on Bluetooth messaging to detect when an infected person comes into contact with others. Singapore developed the TraceTogether application. The Australian Government look like they might be adopting this application, our deputy CMO even suggesting it’d be made compulsory (before the PM poured water on that plan).

Now, the Android version of this, requires Android 5.1. My phone runs 4.1: I cannot run this application. Not everybody is in the habit of using Bluetooth, or even carries a phone. This got me thinking: can this be implemented in a stand-alone device?

The guts of this application is a protocol called BlueTrace which is described in this whitepaper. Reference implementations exist for Android and iOS.

I’ll have to look at the nitty-gritty of it, but essentially it looks like a stand-alone implementation on a ESP32 module maybe a doable proposition. The protocol basically works like this:

  • Clients register using some contact details (e.g. a telephone number) to a server, which then issues back a “user ID” (randomised).
  • The server then uses this to generate “temporary IDs” which are constructed by concatenating the “User ID” and token life-time start/finish timestamps together, encrypting that with the secret key, then appending the IV and an authentication token. This BLOB is then Base64-encoded.
  • The client pulls down batches of these temporary IDs (forward-dated) to use for when it has no Internet connection available.
  • Clients, then exchange these temporary IDs using BLE messaging.

This, looks doable in an ESP32 module. The ESP32 could be loaded up with tokens by a workstation. You then go about your daily business, carrying this device with you. When you get home, you plug the device into your workstation, and it uploads the “temporary IDs” it saw.

I’ll have to dig out my ESP32 module, but this looks like a doable proposition.

Apr 112020

So, for years… decades even, our telephone service has been via the Public Switched Telephone Network. Originally intended for just voice traffic, this later became our Internet connection, using dial-up modems, then using ADSL.

The number itself gained two digits in its life-time: originally 6 digits (late 70s/early 80s), it gained a 0 at some point, then in 1996 a 3 was prepended to all Brisbane numbers.

So yeah, we’ve had our phone a long time, and the underlying technology has remained largely the same for that time period. Even the handset is the same one from all those years ago. Telecom Australia used to pay for it as a “priority service” back then as my father was working for them at the time.

By September, this will change. The National Broadband Network is in our suburb, and a little while back I migrated the ADSL2+ connection over to HFC. After a brief hiccup getting OpenBSD talking to it, we were away. It’s been pretty stable so far. Stable enough now that I haven’t had the ADSL modem connected in weeks.

At the time of migration, I could have migrated the telephone too to NBN phone, however there’s a snag: how do you access NBN phone? The answer is the ISP sends you a pre-configured ATA+Router+WiFi AP (and sometimes there’s an ADSL modem in there too). As I had decided to use my own router, I didn’t have such a device.

I asked Internode about the SIP details, and the situation is this: when they provision a NBN connection, there’s an automated process that configures their VoIP service then stores the credentials and other settings in a ACS server. They have no visibility of these credentials at all. Ordinarily, if I had purchased hardware, they would have provisioned that with credentials that would allow it to authenticate over the TR069 protocol. So without spending $200 on a device I wasn’t going to use, I wasn’t getting these credentials.

There’s another option though: VoIP. The NBN phone is actually a VoIP service, so fundamentally nothing much changes. Keeping the services separate though does give me flexibility in the future. There’s a list of VoIP providers as long as my arm that I can go with.

VoIP is notoriously difficult to set up though, I didn’t want to mess up our only incoming telephone service, so I decided to migrate the NBN phone number over to Internode’s NodePhone VoIP service which would allow me to experiment.


So, first thing to consider is what my needs are. We have 4 devices that are plugged into the telephone line at present:

  • Telecom Australia Touchfone 200 wired telephone
  • Telstra 9200a cordless telephone base-station
  • Epson WF-7510 printer/scanner/fax
  • Maestro Jetstream 56kbps modem

Now, we don’t send many faxes (maybe two a year), and the last time that modem was used, it was to dial into a weighbridge system in Rockhampton after my workplace had moved to VoIP.

That weighbridge system was originally built in 1995 on top of computers running SCO OpenServer 5 and using SCO UUCP over dial-up lines running at 1200 baud (some of the nodes were at remote sites with dodgy phone lines). In 2013 the core servers were upgraded with new hardware and Ubuntu 12.04LTS, but the UUCP links remained as the SCO boxes at sites were slowly replaced. mgetty would answer the phone, and certain user accounts would use uucico as the “shell”. For admin purposes, we could log in, and that would give us a BASH prompt.

Any time we had a support issue at work, muggins would be the one to literally “dial in” to that site. While it’s been years since I’ve needed to touch it (and I think now there’s a VPN to that site), I wanted to retain dial-up capability if needed.

As for the faxes… the stuff we’re doing can be done over email. Getting the modem and the fax machine working is a stretch goal.

99% of the traffic will be voice traffic. I’m not sure if it’s possible to use double-adaptors with an ATA, and so for hardware I opted for the Grandstream HT814. These are available domestically (e.g. MyITHub) for under AU$130 and looked to be pretty good bang-per-buck. I just wasn’t sure how well it’d get along with the old T200.

As a contingency plan, I also ordered an IP phone, a Grandstream GXP1615 which is also available locally. That way if the T200 gave me problems, I’d just wire up Ethernet to the old point where the T200 was and put the GXP1615 there.

Since I’ve got two SIP devices, I need to be able to control incoming and outgoing call flows. So that means running a soft-PBX somewhere. Asterisk is the obvious choice here, being a free-software VoIP package with lots of flexibility. OpenBSD 6.6 ships with Asterisk 16.6.2.

Opening the account

Opening the account is simple enough. As I was porting the NBN phone number over, I just needed some details off my last Internode bill. Later when I go to do the house phone some time next financial year, I’ll be after a Telstra bill (assuming Telstra Wholesale have sorted out their CoVID-19 issues by then).

I looked at the hardware options that Internode offered… and again, it was basically the same as what they offer for their Internet service.

One thing I note is that while Internode has been a great ISP (I switched to them in 2012), their credentials as a VSP are still developing somewhat.

Initial connection

Provisioning was much less smooth than my initial ADSL connection (which was practically seamless) or the subsequent move to HFC NBN. The first bump in the road was when they emailed me:

Subject: Internode: Your Internode NodePhone VoIP service xxxxxxxxxx can’t make/receive calls yet [xxxxxxxxx]

Following activation of your NBN service, we ran some tests to make sure things were running smoothly.

We weren’t able to detect that your NodePhone VoIP service (phone number xxxxxxxxxx) has been set up successfully.

Initial contact regarding the telephone service…

Okay, fair enough, on the date I received that email I had only just received the ATA that I had purchased (through another supplier). Evidently they thought I had the hardware already. I found some time and quickly cobbled together a set-up:

First test set-up: siproxd on the border router to the HT814

I did some quick research, rather than doing NAT, I figured siproxd was closer to my eventual goals, so I installed that on the border router as there were fewer knobs and dials to deal with. I configured the HT814 with the settings as best I understood them from Internode’s guides with one difference: I set the Proxy field to my border router’s internal IP address.

This failed with Internode’s server giving me a 404: Not Found response when my HT814 sent its REGISTER request. I took some captures using tshark and reported this back to their helpdesk. I disconnected the ATA since there was no sense in banging on their front door every 20 seconds: it wasn’t working.

They got back to me a few days later and told me I had the right user name, and that my number still wasn’t registered. Plugging the ATA in, same response, 404: Not Found. In exasperation, I tried some changes:

  • Different settings on the HT814 (too many to list)
  • Trying with Twinkle on my laptop connected to the DMZ, both through via siproxd and also shutting down siproxd and setting up NAT.
  • Installing Asterisk on the border router (which was in the plans) and trying to set that up.

All three hit the same problem, 404: Not Found. I figured either I had entered the same wrong details 3 times, or it was definitely their end. So I reported that, unplugged the ATA and waited. This was on the 27th March.

On the 31st March, I get an email from Internode Provisioning to say they would be porting the number soon. After receiving this email, REGISTER worked, I was getting 200: OK in reply. Funnily enough, there was no challenge to credentials, it just saw my end and said “OK, you’re in”.

Calling outbound after this worked: the call trace showed the Internode end challenging the ATA for credentials, but once supplied, it connected the call, all worked. Inbound calls were a different matter though, a recorded (female) voice announced that the number was invalid. We were half-way there.

The next day, that message changed, it now was a recorded (male) voice announcing the number was unavailable, and offering to leave a voice-mail message. Progress. I had not changed my end, I still had T200 → HT814 → siproxd on the border router as my set-up. I was not seeing any incoming traffic being blocked by pf, although I was seeing people playing with sipvicious. I decided to firewall off my SIP ports, only exposing them to Internode’s SIP server.

Later on the help-desk got back to me. Despite their server telling me 200: OK when I sent a REGISTER, the number was still “not registered”. Confusing!

On a whim, I ripped out siproxd again and set up NAT on the border router. BINGO! We registered, and incoming calls worked. Internode use Broadsoft’s BroadWorks platform for their NodePhone VoIP service, and something about the operation of siproxd confused it. It then sent a misleading response leading my devices to think they were registered when they were not!

At this point, it was time to do two things: uninstall siproxd, and start reading up on Asterisk.

Configuring Asterisk

Asterisk is regarded as the “swiss army knife” of VoIP. It can be configured to do a lot. Officially, it is supported on Linux i386 and AMD64 platforms, but there are builds of it for platforms such as the Raspberry Pi.

OpenBSD also build and ship it in their ports. I wanted to avoid the pain of NAT, and I also wanted to avoid the telephone service going off-line if my cluster went haywire. So installation of this on the border router was a no-brainer. Yes, it’s adding a bit more attack surface to that box, but with appropriate firewalling rules, this could be managed.

As for configuration, there were two SIP channel drivers I could use, either the old chan_sip method, or the newer res_pjsip method. I had seen guides that discussed the older method (e.g. OCAU, WP), however in the back of my mind is the question: “how long do Digium plan to keep supporting two methods?” Thus from the outset I decided to use res_pjsip.

Audio CODECs

This is a pretty big aspect of configuration and should not be skimped on. You’ll find even if you buy all your equipment from one supplier, there are differences in what audio CODECs are supported. For instance the HT814 supports the OPUS CODEC (something I’d like to take advantage of eventually) but the GXP1615 does not. Some CODECs require patent licenses (e.g. SILK, SIREN7, G.722.1, G.722.2/AMR-WB), some did require patent licenses but are now “free” (G.729, G.723.1) and some are open-source (OPUS, Speex).

Some also go by multiple names. G.711a is also called PCMA and G.711u is PCMU. Pretty much everything supports these, and depending on the country you’re in, one will be preferred over the other. In my case, G.711a is the preferred option.

Your voice provider is a factor here too. Some only support particular CODECs, some will allow any CODEC. NodePhone VoIP allegedly will allow you to use whatever you like, but calls into and out of their network to non-VoIP targets may be restricted to narrow-band CODECs.

Internode-specific gotcha: G.729

One gotcha I stumbled on the hard way was that sometimes SIP implementations do not play by the rules.

Specifically, I found once I got Asterisk installed, incoming calls would be immediately hung-up on the moment I answered. Mobile or PSTN didn’t matter. I fired up tshark again and dug into the problem. The only clue I had was this error message:

[Apr  5 16:27:40] WARNING[-1][C-00000001] channel.c: Unable to find a codec translation path: (g729) -> (alaw)

Even if I specified only use G.711a (alaw), somehow I’d still get that message. I asked about it on the Asterisk forum. I was seeing this pattern in my SIP traffic:

|Time     | ${PROVIDER}                           |
|         |                   | ${ASTERISK_BOX}   |                   
|0.000000 |         INVITE SDP (g711A g7          |SIP INVITE From: <sip:${MYPSTNNUM}@${PROVIDER}29;user=phone> To:"${PROVIDER_NAME}"<sip:${MYSIPNUM}@${PROVIDER_DOMAIN}> Call-ID:BW061858648050420-1965318496@${PROVIDER}   CSeq:612303213
|         |(5060)   ------------------>  (5060)   |
|0.003443 |         100 Trying|                   |SIP Status 100 Trying
|         |(5060)   <------------------  (5060)   |
|0.025416 |         180 Ringing                   |SIP Status 180 Ringing
|         |(5060)   <------------------  (5060)   |
|0.954015 |         200 OK SDP (g711A g7          |SIP Status 200 OK
|         |(5060)   <------------------  (5060)   |
|0.986287 |         RTP (g711A)                   |RTP, 6 packets. Duration: 0.099s SSRC: 0x4F53507
|         |(27642)  <------------------  (18024)  |
|1.092729 |         RTP (g729)                    |RTP, 3 packets. Duration: 0.041s SSRC: 0xCD74F85
|         |(27642)  ------------------>  (18024)  |
|1.097523 |         ACK       |                   |SIP Request INVITE ACK 200 CSeq:612303213
|         |(5060)   ------------------>  (5060)   |
|1.099552 |         BYE       |                   |SIP Request BYE CSeq:29850
|         |(5060)   <------------------  (5060)   |
|1.149521 |         200 OK    |                   |SIP Status 200 OK
|         |(5060)   ------------------>  (5060)   |

I was reliably informed that this was definitely against the rules. So another help-desk email informing them of the problem. If you’re an Internode NodePhone VoIP customer, and you see the above behaviour, these are your options:

  1. Purchase and Install Digium’s G.729 CODEC: only an option if you are running Asterisk on a i386 or AMD64-based Linux machine.
  2. If, like me, you’re running Asterisk on something else, or you despise proprietary software, there is an open-source G.729 CODEC. On OpenBSD, install the asterisk-g729 package.
  3. Asterisk have added a work-around in later versions of their code. Patches exist for version 17, 16 and 13. It is also included in 13.32.0, 16.9.0 and 17.3.0. OpenBSD 6.7 will likely ship with a version of Asterisk that includes this work-around.
  4. You can also complain to their help-desk. We can work-around their problem, but really, it’s their end that’s doing the wrong thing, they should fix it.

Dial Plans

The other big thing to consider is your dial-plan layout. Every SIP endpoint is assigned a context which is used in extensions.conf to determine what is meant when a particular sequence of digits is entered or how a call should be routed.

The pattern syntax is documented on their Pattern Matching page. Notably, extensions are “literal” if they do not begin with an underscore (_) and a X matches any numeric digit. So an example:

exten => 123456,1,DoSomething()
exten => 123987,1,DoSomethingDifferent()
exten => _123XXX,1,DoSomethingElse()

The non-pattern extensions take precedence. So the number 123456 would exactly match the first extension listed above and would call DoSomething(). The number 123987 would exactly match the second entry and would call DoSomethingDifferent(). Any other 6-digit number beginning with 123 would trigger DoSomethingElse().

Avoiding expensive mistakes

Before doing anything, it’s worth reading Asterisk’s page on Dial plan Security. You do NOT want someone to call your PBX, then dial outbound to expensive international numbers and run up a big phone bill! In particular, you want to keep the default context as lean as possible! It’s fine to put all your internal extensions in default, but anything that dials outside, should be in a separate context for internal extensions.

Dialling more than one phone at a time

It is possible to have a dial-plan entry ring multiple phones in parallel by separating the endpoints with & symbols, but don’t put spaces around the & characters! So Dial(PJSIP/ep1&PJSIP/ep2&PJSIP/ep3), not Dial(PJSIP/ep1 & PJSIP…). The most obvious use case for this is when someone rings the home number and you want all phones to ring.

Incoming calls in the dial plan

When a call comes in from outside, the number dialled by the outside party (i.e. your number) appears as the extension dialled.

A simple option is to just ring everyone… so if your phone number was, say 0735359696, you’d create a context in your extensions.conf like this:

; Incoming calls
exten => 0735359696,1,Dial(PJSIP/ep1&PJSIP/ep2&…)

… then in pjsip.conf, assign that context to the endpoint:

context = incoming
; … etc

Outgoing calls

Usually you want to be able to ring outbound too. Some guides suggested the pattern _X! can be used to “match all” but I couldn’t get this to work.

Fax over IP

I mentioned that we had a fax machine we wanted to continue using. Whilst we don’t use it every day, it’s nice to know it is there if we want to use it.

Likewise with the dial-up modem. Even if I needed to do reduced-speed for it to work, it’d be better than nothing for situations where I need to use dial-up. It’ll never connect to the Internet again, but that doesn’t make it useless.

There are two ways to do Fax over IP. One is to use G.711u and hope for the best. The other is to use T.38, where the ATA basically “spoofs” the fax modem at the remote end, decodes the symbols being sent back to digital data, encodes that in UDP packets and transmits those over the Internet. The other end then reverses the process.

The good news is there are diagnostic tools out there for testing purposes. You don’t (yet) have to know of someone who has a working fax. Here in Australia, there’s FOLDS-B.

I’d recommend if you’ve got multiple fax-capable devices though, you plug two of them into separate ports on one or more ATAs and try faxing from one extension to the other. I tried calling FOLDS-B directly at first with no luck, then tried setting up Asterisk with an extension that calls SendFax to send a TIFF image, but had no luck — handshaking would be cut short after a second.

Using two internal endpoints saved a lot of phone calls externally and allowed me to “prove” the ATA could work for this.

Once you’ve got things working locally, you’re ready to hit the outside world. You’ll need a fairly complex image to send as it needs to be transmitting for ~70 seconds. I tried a few pages (e.g. this one) on the flatbed scanner of the WF-7510 without much luck… the fax would barely muster 20 seconds of data for them even at “fine” levels.

I had better luck using the 56kbps modem and a copy of efax-gtk. I took RCA’s test card image off Wikipedia as a SVG, loaded that up into The Gimp, rendering the SVG at 600dpi, extending the image size to A4-paper sized, rotated it to portrait mode, then converted it to 1-bit-per-pixel monochrome with patterned dithering and saved it as an uncompressed TIFF.

I then passed this through tiff2ps which gave me this file:

Sending that at 14400bps took 72 seconds. Perfect! Then the reply came back. Fax reception is very much a work-in-progress, so rather than receive on the VoIP line, I decided to use the PSTN to receive the reports. This was accomplished by specifying my PSTN phone number in the fax identity (FOLDS-B evidently doesn’t check this against caller ID).

I tried again, and this time, received the report. FOLDS-B got a garbled mess apparently. The suggestion was to set the speed to 4800bps. In efax this is accomplished by setting the modem capabilities:

       The capabilities of the local hardware and software can be set using a string of 8 digits separated by commas:



       vr  (vertical resolution) =
                0 for 98 lines per inch
                1 for 196 lpi

       br  (bit rate) =
                0 for 2400 bps
                1 for 4800
                2 for 7200
                3 for 9600
                4 for 12000 (V.17)
                5 for 14400 (V.17)

So in this case, I should set the capabilities to 1,1,0,2,0,0,0,0. I tried this, but then found the call just didn’t negotiate either. On a whim again I tried 7200bps (that’s 1,2,0,2,0,0,0,0). Eureka! I got this:

A pretty much “perfect” FOLDS-B report at 7200bps.

The transmission level and SNR are pretty much spot-on. Emboldened by this, I tried again faxing the same image (which I had printed out) from the WF-7510. I chose “Photo” quality and scanned it on the flat-bed. This didn’t work so well — evidently some detail was missed and it only transmitted for 60 seconds so I tried again, and faxed something else for the second sheet.

FOLDS-B wasn’t happy about the second page, but at least it had something to go on. The fax modem in the WF-7510 doesn’t appear to have a speed control other than turning V.34 mode (33.6kbps) on/off. It appears it tried sending at 14400bps. FOLDS-B tells a sorry tale:

An ugly 14400bps transmission

Now it is possible to get near perfect results at 14400bps through an ATA that supports T.38. Maybe a longer transmission on the first page might have helped, but fair to say the speed is not helping matters.

Transmit level is very quiet, and I can’t see a way to adjust this on the WF-7510. Nor can I force it to V.29 (7200bps) mode.

There’s allegedly a “reference” test sheet you can get by “receive polling” 019725112. When I ring this number (with a fax or telephone) I get a recorded message saying the number is not valid. If anyone knows what the correct number is, or how to get a copy of this reference sheet, it’d be greatly appreciated.

So yes, Fax over IP is doable… a lot of pissing about with ATA settings… and you better hope the fax machine itself has lots of knobs and dials. Hylafax + t38modem will probably crack this nut. Then again, so will email.

Configuration Settings

Asterisk Configuration

So whilst my set-up is still a work-in-progress, I figured I’d post what I have here.

Most of these are defaults shipped with OpenBSD 6.6. Specifically of interest would be pjsip.conf and extensions.conf.

pf firewall rules

You’re going to want to pierce your firewall appropriately to expose yourself just enough for your VoIP service to work.

# Define some interfaces

# SIP addresses
sip_provider=  #
sip_endpoints="{, }"

# Allow SIP from router to SIP devices/softphones and vice versa
pass in on $internal proto udp from $sip_endpoints to self port { 5060, 10000:30000 }
pass out on $internal proto udp from self port { 5060, 10000:30000 } to $sip_endpoints

# Allow SIP from Internode to self and vice versa.  This could be
# tightened up a lot further.  Maybe try some calls both ways and log
# the traffic to see which specific ports.  "All UDP ports" is in line
# with Internode recommendations:
pass in on egress inet proto udp from $sip_provider to self
pass out on egress inet proto udp from self to $sip_provider

Grandstream HT814 settings

Some notable settings first before I dump full screenshots…

Ring Cadence Settings

A special thank-you to @scottsip on the Grandstream forums for pointing me to this document from the ITU (Page 4 has the settings for Australia). Also worth mentioning is this Whirlpool forum thread and this review/teardown of the Grandstream HT802. The ones marked (?) I have no idea what they’re for.

  • System Ring Cadence: c=400/200-400/2000;
  • Dial Tone: f1=400@-12,f2=425@-12;
  • Busy Tone: f1=425,c=38/38;
  • Reorder Tone: f1=480,f2=620,c=25/25; (?)
  • Confirmation Tone: f1=350@-11,f2=440@-11,c=100/100-100/100-100/100; (?)
  • Call Waiting Tone: f1=425,c=20/20-20/440;
  • Prompt Tone: f1=350@-17,f2=440@-17,c=0/0; (?)
  • Conference Party Hangup Tone: f1=425@-15,c=600/600;

Notable fax-related settings

I changed lots of these trying to get something working, so if you set this and still don’t get any joy, check the screenshots below.

  • Fax Mode: T.38
  • Re-INVITE After Fax Tone Detected: Enabled
  • Jitter Buffer Type: Fixed
  • Jitter Buffer Length: Low (note, I can get away with this because it’s Ethernet LAN from ATA to Asterisk box)
  • Gain:
    • TX: 0dB
    • RX: 0dB
  • Disable Line Echo Canceller: Yes
  • Disable Network Echo Suppressor: Yes


Grandstream GXP1615 Settings

These are much the same as the HT814 above. The cadence settings use a slightly different syntax (they don’t have the @nn parts) and there are more tone settings. Again, the ones marked with (?) have an unknown purpose.

  • System Ringtone: c=400/200-400/2000;
  • Dial Tone: f1=400,f2=425;
  • Second Dial Tone: f1=450,f2=425; (?)
  • Message Waiting: f1=525;
  • Ring Back Tone: f1=1209,f2=852,c=20/20-20/200;
  • Call-Waiting Tone: f1=425,c=20/20-20/440;
    • Gain: Low
  • Busy Tone: f1=425,c=38/38;
  • Reorder Tone: f1=480,f2=620,c=25/25; (?)
GXP1615 ringtone settings

The steps forward

Things I need to figure out with this system:

  • Feature Codes: these are used to signal events like “transferring calls”, “park”, and other features you might want to initiate in the PBX. They are normally signalled using DTMF tone sequences, however this can give problems if your tones clash with some IVR you’re interacting with. I’m currently researching this area.
  • OPUS support: I mentioned the HT814 supports it, and as it’s an open CODEC I’d like to support it. There is this project which provides OPUS support to Asterisk. I’ll probably look at writing an OpenBSD port for it.
  • Voice mail and IVR… I’m thinking something along the lines of “Dial the year of birth of the person you wish to reach”… then that can ring the relevant phones or offer to leave a message.
  • I’d like to test wideband audio at some point. Work seems to have G.722 on their phones (or maybe its OPUS?), I should see if wideband pass-through in fact works.
Mar 022020

This is just a short note… today we finally made the jump across to the NBN. We’re running hybrid-fibre coax here in The Gap, and right now the HFC NTD is running off mains power (getting that onto solar will be a future project).

I already had my OpenBSD 6.6 router running through the ADSL terminating the PPPoE link. The router itself is a PC Engines APU2, which features 3 Ethernet ports. em0 faces my DMZ and internal network. em1 is presently plugged into the ADSL modem/router, and em2 was spare.

Thus, my interface configuration looked like this:

# /etc/hostname.em0
inet6 alias 2001:44b8:21ac:70f9::fe 64
# … and a stack of !route commands for the internal subnets

# /etc/hostname.em1

# /etc/hostname.pppoe0
# pppoedev em1: ADSL
# pppoedev vlan2: NBN
inet NONE \
        pppoedev em1 authproto chap \
        authname \
        authkey mypassword up

… and of course /etc/pf.conf was configured with appropriate rules for my network. For the NBN, I read up that VLAN #2 was required, so I set up the following:

# /etc/hostname.em2  

# /etc/hostname.vlan2                                                                                                
vnetid 2
parent em2

I then changed /etc/hostname.pppoe0 to point to vlan2 instead of em1. When the NBN NTD got installed, I tried this out… no dice, there was PADI frames being sent, but nada, nothing.

Digging around, I needed to set the transmit priority, so I amended /etc/hostname.vlan2:

# /etc/hostname.vlan2                                                                                                
vnetid 2
parent em2
txprio 1    # ← ADD THIS

Bingo! I was now seeing PPPoE traffic. However I wasn’t out of the woods, nothing behind the router was able to get to the Internet. Turns out pf needs to be told what transmit priority to use. I amended my /etc/pf.conf:

# Scrub incoming traffic
match in all scrub (no-df)

# Set pppoe0 priority
match out on $external set prio 1  # ← ADD THIS

# Block all traffic by default (paranoia)
block log all
#block all

That was sufficient to get traffic working. I’m now getting the following out of SpeedTest.

~25Mbps down / ~18Mbps up on Internode HFC NBN

The link is theoretically supposed to be 50Mbps… but whatever. The primary concern is that it didn’t suddenly drop to 0Mbps when the plug got pulled in September. I’ll check again when things are “quieter” (it’ll be peak periods now), but as far as I’m concerned, this is a matter of ensuring continued service.

It already outperforms the ADSL2+ link (which was about 15Mbps / 2Mbps). Next stop will be to port the old telephone number over, but that can wait another day!

Nov 282019

This is more of a brain dump for yours truly, since as a day job, I’m dealing with OpenThread a lot, and so there’s lots of little tricks out there for building it on various platforms and configurations. The following is not so much a how-to guide, but a quick brain dump of different things I’ve learned over the past year or so of messing with OpenThread.

Verbose builds

To get a print out of every invocation of the toolchain with all flags, specify V=1 on the call to make:

$ make -f examples/Makefile-${PLATFORM} …${ARGS}… V=1

Running one step at a time

To disable the parallel builds when debugging the build system, append BuildJobs=1 to your make call:

$ make -f examples/Makefile-${PLATFORM} …${ARGS}… BuildJobs=1

Building a border router NCP image

General case


For TI CC2538

# Normal CC2538 (e.g. Zolertia Firefly)
$ make -f examples/Makefile-cc2538 BORDER_AGENT=1 BORDER_ROUTER=1 COMMISSIONER=1 UDP_FORWARD=1

# CC2538 + CC2592
$ make -f examples/Makefile-cc2538 BORDER_AGENT=1 BORDER_ROUTER=1 COMMISSIONER=1 UDP_FORWARD=1 CC2592=1

If you want to run the latter image on a WideSky Hub, you’ll need to edit examples/platform/cc2538/uart.c and comment out line 117 before compiling as it uses a simple diode for 5V to 3.3V level shifting, and this requires the internal pull-up to be enabled:

115     // rx pin
117     HWREG(IOC_PA0_OVER)      = IOC_OVERRIDE_DIS; // ← comment out this to allow UART RX to work

For Nordic nRF52840

# Nordic development board (PCA10056) via J2 (near battery)
$ make -f examples/Makefile-nrf52840 BORDER_AGENT=1 BORDER_ROUTER=1 COMMISSIONER=1 UDP_FORWARD=1

# Ditto, but instead using J3 (near middle of bottom edge)
$ make -f examples/Makefile-nrf52840 BORDER_AGENT=1 BORDER_ROUTER=1 COMMISSIONER=1 UDP_FORWARD=1 USB=1

# Nordic dongle (PCA10059)

I’m working on what needs to be done for the Fanstel BT840X… watch this space.

Building certification test images

For CC2538

$ make -f examples/Makefile-cc2538 BORDER_ROUTER=1 COMMISSIONER=1 DHCP6_CLIENT=1 JOINER=1

Running CI tests outside of Travis CI

You will need:

  • Python 3.5 or later. 3.6 recommended.
  • pycryptodome (not pycrypto: if you get an AttributeError referencing AES.MODE_CCM, that’s why!
  • enum34 (for now… I suspect this will disappear once the Python 2.7 requirement is dropped for Android tools)
  • ipaddress
  • pexpect

The test suites work by running the POSIX ot-cli-${TYPE} (where ${TYPE} is ftd or mtd).

Running tests

From the root of the OpenThread tree:

$ make -f examples/Makefile-posix check

Making tests

The majority of the tests lurk under tests/scripts/thread-cert. Don’t be fooled by the name, lots of non-Thread tests live there.

The file wraps pexpect up in an object with methods for calling the various CLI commands or waiting for things to be printed to the console.

The tests themselves are written using using the standard Python unittest framework, with setUp creating a few Node objects ( and tearDown cleaning them up. The network is simulated.

The test files must be executable, and call unittest.main() after checking if the script is called directly.

Logs during the test runs

During the tests, the logs are stashed in build/${CHOST}/tests/scripts/thread-cert and will be named ${SCRIPTNAME}.log. (e.g Also present is the .pcap file (Packet dump).

Logs after the tests

The test suite will report the logs are at tests/scripts/thread-cert/test-suite.log. This is relative to the build directory… so look in build/${CHOST}/tests/scripts/thread-cert/test-suite.log.

make pretty with clang 8.0

Officially this isn’t supported, but you can “fool” OpenThread’s build system into using clang 8.0 anyway:

$ cat ~/bin/clang-format-6.0 

if [ "$1" == "--version" ]; then
        echo "clang-format version 6.0"
        clang-format "$@"

Put that file in your ${PATH}, make it executable, then OpenThread will think you’ve got clang-format version 6 installed. This appears to work without ill effect, so maybe a future release of OpenThread will support it.

May 252019

So recently I was musing about how I might go about expanding the storage on the cluster. This was largely driven by the fact that I was about 80% full, and thus needed to increase capacity somehow.

I also was noting that the 5400RPM HDDs (HGST HTS541010A9E680), now with a bit of load, were starting to show signs of not keeping up. The cases I have can take two 2.5″ SATA HDDs, one spot is occupied by a boot drive (120GB SSD) and the other a HDD.

A few weeks ago, I had a node fail. That really did send the cluster into a spin, since due to space constraints, things weren’t as “redundant” as I would have liked, and with one disk down, I/O throughput which was already rivalling Microsoft Azure levels of slow, really took a bad downward turn.

I hastily bought two NUCs, which I’m working towards deploying… with those I also bought two 120GB M.2 SSDs (for boot drives) and two 2TB HDDs (WD Blues).

It was at that point I noticed that some of the working drives were giving off the odd read error which was throwing Ceph off, causing “inconsistent” placement groups. At that point, I decided I’d actually deploy one of the new drives (the old drive was connected to another node so I had nothing to lose), and I’ll probably deploy the other shortly. The WD Blue 2TB drives are also 5400RPM, but unlike the 1TB Hitachis I was using before, have 128MB of cache vs just 8MB.

That should boost the read performance just a little bit. We’ll see how they go. I figured this isn’t mutually exclusive to the plans of external storage upgrades, I can still buy and mod external enclosures like I planned, but perhaps with a bit more breathing room, the immediate need has passed.

I’ve since ordered another 3 of these drives, two will replace the existing 1TB drives, and a third will go back in the NUC I stole a 2TB drive from.

Thinking about the problem more, one big issue is that I don’t have room inside the case for 3 2.5″ HDDs, and the motherboards I have do not feature mSATA or M.2 SATA. I might cram a PCIe SSD in, but those are pricey.

The 120GB SSD is only there as a boot drive. If I could move that off to some other medium, I could possibly move to a bigger SSD in place of the 120GB SSD, maybe a ~500GB unit. These are reasonably priced. The issue is then where to put the OS.

An unattractive option is to shove a USB stick in and boot off that. There’s no internal USB ports, but there are two front USB ports in the case I could rig up to an internal header so they’re not sticking out like a sore thumb(-drive) begging to be broken off by a side-wards slap. The flash memory in these is usually the cheapest variety, so maybe if I went this route, I’d buy two: one for the root FS, the other for swap/logs.

The other option is a Disk-on-Module. The motherboards provide the necessary DC power connector for running these things, and there’s a chance I could cram one in there. They’re pricey, but not as bad as going NVMe SSDs, and there’s a greater chance of success squeezing this in.

Right now I’ve just bought a replacement motherboard and some RAM for it… this time the 16-core model, and it takes full-size DIMMs. It’ll go back in as a compute node with 32GB RAM (I can take it all the way to 256GB if I want to). Coupled with that and a purchase of some HDDs, I think I’ll let the bank account cool off before I go splurging more. 🙂

May 142019

Well, it had to happen some day, but I was hoping it’d be a few more years off… I’ve had the first node failure on the cluster.

One of my storage nodes decided to keel over this morning, some time between 5 and 8AM… sending the cluster into utter chaos. I tried power cycling the host a few times before finally yanking it from the DIN rail and trying it on the bench supply. After about 10 minutes of pulling SO-DIMMs and general mucking around trying to coax it to POST, I pulled the HDD out, put that in an external dock and connected that to one of the other storage nodes. After all, it was approaching 9AM and I needed to get to work!

A quick bit of work with ceph-bluestore-tool and I had the OSD mounted and running again. The cluster is moaning that it’s lost a monitor daemon… but it’s still got the other two so provided that I can keep O’Toole away (Murphy has already visited), I should be fine for now.

This evening I took a closer look, tried the RAM I had in different slots, even with the RAM removed, there’s no signs of life out of the host itself: I should get beep codes with no RAM installed. I ran my multimeter across the various power rails I could get at: the 5V and 12V rails look fine. The IPMI BMC works, but that’s about as much as I get. I guess once the board is replaced, I might take a closer look at that BMC, see how hackable it is.

I’ve bought a couple of spare nodes which will probably find themselves pressed into monitor node duty, two Intel NUC7I5BNHs have been ordered, and I’ll pick these up later in the week. Basically one is to temporarily replace the downed node until such time as I can procure a more suitable motherboard, and the other is a spare.

I have a M.2 SATA SSD I can drop in along with some DDR4 RAM I bought by mistake, and of course the HDD for that node is sitting in the dock. The NUCs are perfectly fine running between 10.8V right up to 19V — verified on a NUC6CAYS, so no 12V regulator is needed.

The only down-side with these units is the single Ethernet port, however I think this will be fine for monitor node duty, and two additional nodes should mean the storage cluster becomes more resilient.

The likely long-term plan may be an upgrade of one of the compute nodes. For ~$1600, I can get a A2SDi-16C-HLN4F, which sports 16 cores and takes full-size DDR4 DIMMs. I can then rotate the board out of that into the downed node.

The full-size DIMMS are much more readily available in ECC format, so that should make long-term support of this cluster much easier as the supplies of the SO-DIMMs are quickly drying up.

This probably means I should pull my finger out and actually do some of the maintenance I had been planning but put off… largely due to a lack of time. It’s just typical that everything has to happen when you are least free to deal with it.

Jan 282019

My cloud computing cluster like all cloud computing clusters of course needs a storage back-end. There were a number of options I could have chosen, but the one I went with in the end was Ceph, and so far, it’s ran pretty well.

Lately though, I was starting to get some odd crashes out of ceph-osd. I was running release 10.2.3, which is quite dated now, this is one of the earlier Jewel releases. Adding to the fun, I’m running btrfs as my filesystem on the OS and the OSD, and I’m running it all on Gentoo. On top of this, my monitor nodes are my OSDs as well.

Not exactly a “supported” configuration, never mind the hacks done at hardware level.

There was also a nagging issue about too many placement groups in the Ceph cluster. When I first established the cluster, I christened it by dragging a few of my lxc containers off the old server and making them VMs in the cluster. This was done using libvirt and virt-manager. These got thrown into a storage pool called transitional-inst, with a VLAN set aside for the VMs to use. When I threw OpenNebula on, I created another Ceph pool, one for its images. The configuration of these lead to the “too many placement groups” warning, which until now, I just ignored.

This weekend was a long weekend, for controversial reasons… and so I thought I’ll take a snapshot of all my VMs, download those snapshots to a HDD as raw images, then see if I can fix these issues, and migrate to Ceph Luminous (v12.2.10) at the same time.

Backing up

I was going to be doing some nasty things to the cluster, so I thought the first thing to do was to back up all images. This was done by using rbd snap create pool/image@date to create a snapshot of an image, then rbd export pool/image@date /path/to/storage/pool-image.img before blowing away the snapshot with rbd snap rm pool/image@date.

This was done for all images on the Ceph cluster, stashing them on a 4TB hard drive I had bought for the purpose.

Getting things ready

My cluster is actually set up as a distcc cluster, with Apache HTTP server instances sharing out distfiles and binary package repositories, so if I build packages on one, I can have the others fetch the binary packages that it built. I started with a node, and got it to update all packages except Ceph. Made sure everything was up-to-date.

Then, I ran emerge -B =ceph-10.2.10-r2. This was the first step in my migration, I’d move to the absolute latest Jewel release available in Gentoo. Once it built, I told all three storage nodes to install it (emerge -g =ceph-10.2.10-r2). This was followed up by a re-start of the mon daemons on each node (one at a time), then the mds daemons, finally the osd daemons.

Resolving the “too many placement groups” warning

To resolve this, I first researched the problem. An Internet search lead me to this Stack Overflow post. In it, it was suggested the problem could be alleviated by making a new pool with the correct settings, then copying the images over to it and blowing away the old one.

As it happens, I had an easier solution… move the “transitional” images to OpenNebula. I created empty data blocks in OpenNebula for the three images, then used qemu-img convert -p /path/to/image.img rbd:pool/image to upload the images.

It was then a case of creating a virtual machine template to boot them. I put them in a VLAN with the other servers, and when each one booted, edited the configuration with the new TCP/IP settings.

Once all those were moved across, I blew away the old VMs and the old pool. The warning disappeared, and I was left with a HEALTH_OK message out of Ceph.

The Luminous moment

At this point I was ready to try migrating. I had a good read of the instructions beforehand. They seemed simple enough. I prepared as I did before by updating everything on the system except Ceph, then, telling Portage to build a binary package of Ceph itself.

Then I deployed the binary to the three nodes.

First step was to re-start the monitors… this went smoothly, I just did a /etc/init.d/ceph-mon.${HOST} restart on each one individually, and after a brief moment, quorum was re-established. I then deployed a manager daemon to each one — basically I just “copied” my monitor symbolic link, changing mon to mgr, added it to OpenRC’s list, then started them. No problems.

The OSDs though were still running the Jewel release.

I proceeded as before, trying a re-start of the first OSD. After a while it hadn’t come back…

2019-01-27 14:42:59.745860 7f28fac06e00 -1 filestore(/var/lib/ceph/osd/ceph-0) _detect_fs(1197): deprecated btrfs support is not ena

Ohh bugger, so no btrfs support. This is where the fun began. At this point I was a bit flustered and thought I’d have to either migrate these nodes to XFS, or to BlueStore. So immediately I started looking at the BlueStore migration documentation, as I did not want to risk re-starting the other two OSDs and losing access to my data!

A hasty BlueStore migration

So, I started this by doing the ceph osd set out 0 to start my now downed OSD 0 on the path of migration. The fact it was already down didn’t click with me. I then tried running ceph osd safe-to-destroy 0, only to be told Error EINVAL: (22) Invalid argument.

Uhh ohh, this isn’t good. I waited a bit, but also part of me said: there should be a copy of everything on this node, on at least one of the other two nodes. I had configured it to maintain at least two copies of everything, so even if this node went up in smoke, the data should be recoverable.

With great trepidation, I continued and tried destroying the OSD, then creating a BlueStore one in its place… only to have the ceph-volume command blow up. It couldn’t find the keyring, then when I got that sorted out, it was failing to talk to systemd, then when I found the --no-systemd argument, it still failed because of LVM. I therefore realised I needed two things:

  1. I needed the bootstrap-osd keyring that ceph-deploy normally creates.
  2. The lvmetad daemon must be running.

For (1), this is taken care of with the following commands:

# ceph auth add client.bootstrap-osd --cap mon 'profile bootstrap-osd
# mkdir /var/lib/ceph/bootstrap-osd
# ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring

As for (2), install sys-fs/lvm and add lvmetad to your start-up services. Also add lvm, as you’ll want that at boot. (I learned this later.)

After doing that, the following command worked:

ceph-volume lvm create --bluestore --data /dev/sdb \
--osd-id 0 --no-systemd

The --no-systemd is important on Gentoo with OpenRC as there is no systemctl binary. Once I did that, I found I could start my OSD again. Data recovery began at once. The data recovery was an overnight effort — it took with my hardware until 3PM today to migrate all the placement groups over to the newly re-formatted OSD.

Migrating the other nodes

For now, they still run btrfs. In my “ohh crap” state, I didn’t see the little hint given:

2019-01-27 14:40:55.147888 7f8feb7a2e00 -1 *** experimental feature 'btrfs' is not enabled ***
This feature is marked as experimental, which means it
 - is untested
 - is unsupported
 - may corrupt your data
 - may break your cluster is an unrecoverable fashion
To enable this feature, add this to your ceph.conf:
  enable experimental unrecoverable data corrupting features = btrfs

2019-01-27 14:40:55.147901 7f8feb7a2e00 -1 filestore(/var/lib/ceph/osd/ceph-0) _detect_fs(1197): deprecated btrfs support is not enabled
2019-01-27 14:40:55.147906 7f8feb7a2e00 -1 filestore(/var/lib/ceph/osd/ceph-0) mount(1523): error in _detect_fs: (1) Operation not permitted
2019-01-27 14:40:55.147926 7f8feb7a2e00 -1 osd.0 0 OSD:init: unable to mount object store

Not feeling like a 24-hour wait, I did as it told me:

osd pool default size = 2  # Write an object n times.
osd pool default min size = 1 # Allow writing n copy in a degraded state.
osd pool default pg num = 128
osd pool default pgp num = 128
osd crush chooseleaf type = 1
osd max backfills = 10

# Allow btrfs to work:
enable experimental unrecoverable data corrupting features = btrfs

Now, my other OSDs re-started successfully, and I could finally finish off by restarting the metadata daemons and completing the migration. I’m now left with two OSDs with BTRFS and one with BlueStore.

For now, I’ll leave it that way, next week end, I might migrate a second node to BlueStore.

The reboot test

I needed to ensure the nodes would come back without my intervention. So starting with the two BTRFS nodes, I rebooted each one individually. The OSD on that node first went offline, then the monitor, finally the cluster noticed the metadata and manager services had gone. Then, upon successful boot, the services returned.

So far so good. Now the BlueStore node.

First reboot, my OSD didn’t come back. On investigation, I saw the following logs:

2019-01-28 16:25:59.312369 7fd58d4f0e00 -1  ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-0: (2) No such file or
2019-01-28 16:26:14.865883 7fe92f942e00 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-0: (2) No such file or
2019-01-28 16:26:30.419863 7fd4fa026e00 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-0: (2) No such file or directory

/var/lib/ceph/osd/ceph-0 was completely empty! Bugger, do I have to endure those 24 hours again? As it happened, no. I don’t know how the files in that directory disappeared, I did observe a tmpfs pseudovolume mounted at that directory earlier when trying to create the OSD … maybe that didn’t get unmounted before OSD creation, anyway, the files were gone.

A bit of digging revealed a ceph-bluestore-tool utility, with options like repair. At first I tried to wing it using that, but no dice. Then looking at the man page I noticed the sub-command prime-osd-dir. BINGO.

At first I threw the raw device at it, but as it happens, ceph-volume had deployed LVM to the raw disk, then put BlueStore on top of that. Starting lvm got the volume group recognised, so I added that to my boot-up services (see why I mentioned it earlier). It had created a sym-link to the LVM volume in /dev/ceph-${UUID1}/osd-block-${UUID2}.

No idea where the two UUIDs came from, but I tried this:

# ceph-bluestore-tool prime-osd-dir \
    --dev /dev/ceph-d62d0d95-2e13-4c59-834d-03a87b88c85e/osd-block-62b4be3e-3935-4d51-ab5c-dde077f99ea3 \
    --path /var/lib/ceph/osd/ceph-0

That populated the directory with files, so I tried again starting the OSD.

2019-01-28 16:59:23.680039 7fd93fcbee00 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (13) Permission denied
2019-01-28 16:59:23.680082 7fd93fcbee00 -1  ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-0: (2) No such file or directory
2019-01-28 16:59:39.229888 7f4a585b4e00 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (13) Permission denied
2019-01-28 16:59:39.229918 7f4a585b4e00 -1  ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-0: (2) No such file or directory

Ah ha, chown -R ceph:ceph /var/lib/ceph/osd/ceph-0, and all sprang to life. The OSD came up.

Testing the fixes, a second re-boot

Since the OSD now was starting, and working, I did a second re-boot test, only to have history partially repeat itself.

The files were still there this time, but it was failing with a permissions error opening the block device. Sure enough, it was now owned by root.

Changed the permissions, and the OSD came up.

Fixing this was a job for udev:

cat /etc/udev/rules.d/99ceph.rules
SUBSYSTEM=="block", KERNEL=="sda7", OWNER="ceph", GROUP="ceph", MODE="0600"
SUBSYSTEM=="block", ENV{DM_VG_NAME}=="ceph-*", OWNER="ceph", GROUP="ceph", MODE="0600"

The first line is left-over from when /dev/sda7 was my journal. Not sure what I’ll do with this partition now, I’ll think of something (maybe Docker). The second line tells udev to change the permissions on the volume group that Ceph created.

Having done this, I rebooted again. This time, all worked. The OSD came up without my intervention.


So, the pitfalls I ran across in my Jewel-Luminous migration on Gentoo.

btrfs OSDs

I had btrfs volumes for my OSDs, which are now frowned upon and considered experimental. It isn’t necessary to migrate to BlueStore or XFS straight away, but for the OSDs to boot, you will need the following line in your /etc/ceph/ceph.conf before restarting your OSDs:

enable experimental unrecoverable data corrupting features = btrfs

ceph-volume expects the bootstrap-osd key.

To use ceph-volume, it for some reason expects to see the bootstrap-osd key in a hard-coded location. It won’t work with the default admin key.

This bootstrap key can be generated as follows:

# ceph auth add client.bootstrap-osd --cap mon 'profile bootstrap-osd
# mkdir /var/lib/ceph/bootstrap-osd
# ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring

Before creating a BlueStore OSD, make sure lvmetad and lvm are started (and set to start at boot)

You can get away with just lvmetad for the initial creation, but you’ll want lvm running at boot anyway to ensure all the logical volume groups get started at boot before Ceph goes looking for them.

So before attempting OSD creation, ensure LVM is installed, and set to start at boot.

ceph-osd runs as the ceph user

So your udev rules need to reflect that. Luckily, ceph-volume seems to prefer creating LVM volume groups named ceph-${UUID}. I don’t know what decides the UUID value, but thankfully udev supports globbing. The following udev rule (put it in /etc/udev/rules.d/99ceph.rules or wherever seems appropriate) will keep permissions in check:

SUBSYSTEM=="block", ENV{DM_VG_NAME}=="ceph-*", OWNER="ceph", GROUP="ceph", MODE="0600"

(The above should be all on one line.)

Before rebooting a BlueStore node, back up your OSD data directories

Shouldn’t be strictly necessary, but now I’ve been bitten, I’m going to be taking extra care of that data directory on my other two nodes when I migrate them. I don’t fancy playing around with ceph-bluestore-tool frantically trying to get an OSD back up again.