Apr 112020
 

So, for years… decades even, our telephone service has been via the Public Switched Telephone Network. Originally intended for just voice traffic, this later became our Internet connection, using dial-up modems, then using ADSL.

The number itself gained two digits in its life-time: originally 6 digits (late 70s/early 80s), it gained a 0 at some point, then in 1996 a 3 was prepended to all Brisbane numbers.

So yeah, we’ve had our phone a long time, and the underlying technology has remained largely the same for that time period. Even the handset is the same one from all those years ago. Telecom Australia used to pay for it as a “priority service” back then as my father was working for them at the time.

By September, this will change. The National Broadband Network is in our suburb, and a little while back I migrated the ADSL2+ connection over to HFC. After a brief hiccup getting OpenBSD talking to it, we were away. It’s been pretty stable so far. Stable enough now that I haven’t had the ADSL modem connected in weeks.

At the time of migration, I could have migrated the telephone too to NBN phone, however there’s a snag: how do you access NBN phone? The answer is the ISP sends you a pre-configured ATA+Router+WiFi AP (and sometimes there’s an ADSL modem in there too). As I had decided to use my own router, I didn’t have such a device.

I asked Internode about the SIP details, and the situation is this: when they provision a NBN connection, there’s an automated process that configures their VoIP service then stores the credentials and other settings in a ACS server. They have no visibility of these credentials at all. Ordinarily, if I had purchased hardware, they would have provisioned that with credentials that would allow it to authenticate over the TR069 protocol. So without spending $200 on a device I wasn’t going to use, I wasn’t getting these credentials.

There’s another option though: VoIP. The NBN phone is actually a VoIP service, so fundamentally nothing much changes. Keeping the services separate though does give me flexibility in the future. There’s a list of VoIP providers as long as my arm that I can go with.

VoIP is notoriously difficult to set up though, I didn’t want to mess up our only incoming telephone service, so I decided to migrate the NBN phone number over to Internode’s NodePhone VoIP service which would allow me to experiment.

Requirements

So, first thing to consider is what my needs are. We have 4 devices that are plugged into the telephone line at present:

  • Telecom Australia Touchfone 200 wired telephone
  • Telstra 9200a cordless telephone base-station
  • Epson WF-7510 printer/scanner/fax
  • Maestro Jetstream 56kbps modem

Now, we don’t send many faxes (maybe two a year), and the last time that modem was used, it was to dial into a weighbridge system in Rockhampton after my workplace had moved to VoIP.

That weighbridge system was originally built in 1995 on top of computers running SCO OpenServer 5 and using SCO UUCP over dial-up lines running at 1200 baud (some of the nodes were at remote sites with dodgy phone lines). In 2013 the core servers were upgraded with new hardware and Ubuntu 12.04LTS, but the UUCP links remained as the SCO boxes at sites were slowly replaced. mgetty would answer the phone, and certain user accounts would use uucico as the “shell”. For admin purposes, we could log in, and that would give us a BASH prompt.

Any time we had a support issue at work, muggins would be the one to literally “dial in” to that site. While it’s been years since I’ve needed to touch it (and I think now there’s a VPN to that site), I wanted to retain dial-up capability if needed.

As for the faxes… the stuff we’re doing can be done over email. Getting the modem and the fax machine working is a stretch goal.

99% of the traffic will be voice traffic. I’m not sure if it’s possible to use double-adaptors with an ATA, and so for hardware I opted for the Grandstream HT814. These are available domestically (e.g. MyITHub) for under AU$130 and looked to be pretty good bang-per-buck. I just wasn’t sure how well it’d get along with the old T200.

As a contingency plan, I also ordered an IP phone, a Grandstream GXP1615 which is also available locally. That way if the T200 gave me problems, I’d just wire up Ethernet to the old point where the T200 was and put the GXP1615 there.

Since I’ve got two SIP devices, I need to be able to control incoming and outgoing call flows. So that means running a soft-PBX somewhere. Asterisk is the obvious choice here, being a free-software VoIP package with lots of flexibility. OpenBSD 6.6 ships with Asterisk 16.6.2.

Opening the account

Opening the account is simple enough. As I was porting the NBN phone number over, I just needed some details off my last Internode bill. Later when I go to do the house phone some time next financial year, I’ll be after a Telstra bill (assuming Telstra Wholesale have sorted out their CoVID-19 issues by then).

I looked at the hardware options that Internode offered… and again, it was basically the same as what they offer for their Internet service.

One thing I note is that while Internode has been a great ISP (I switched to them in 2012), their credentials as a VSP are still developing somewhat.

Initial connection

Provisioning was much less smooth than my initial ADSL connection (which was practically seamless) or the subsequent move to HFC NBN. The first bump in the road was when they emailed me:

Subject: Internode: Your Internode NodePhone VoIP service xxxxxxxxxx can’t make/receive calls yet [xxxxxxxxx]

Following activation of your NBN service, we ran some tests to make sure things were running smoothly.

We weren’t able to detect that your NodePhone VoIP service (phone number xxxxxxxxxx) has been set up successfully.

Initial contact regarding the telephone service…

Okay, fair enough, on the date I received that email I had only just received the ATA that I had purchased (through another supplier). Evidently they thought I had the hardware already. I found some time and quickly cobbled together a set-up:

First test set-up: siproxd on the border router to the HT814

I did some quick research, rather than doing NAT, I figured siproxd was closer to my eventual goals, so I installed that on the border router as there were fewer knobs and dials to deal with. I configured the HT814 with the settings as best I understood them from Internode’s guides with one difference: I set the Proxy field to my border router’s internal IP address.

This failed with Internode’s server giving me a 404: Not Found response when my HT814 sent its REGISTER request. I took some captures using tshark and reported this back to their helpdesk. I disconnected the ATA since there was no sense in banging on their front door every 20 seconds: it wasn’t working.

They got back to me a few days later and told me I had the right user name, and that my number still wasn’t registered. Plugging the ATA in, same response, 404: Not Found. In exasperation, I tried some changes:

  • Different settings on the HT814 (too many to list)
  • Trying with Twinkle on my laptop connected to the DMZ, both through via siproxd and also shutting down siproxd and setting up NAT.
  • Installing Asterisk on the border router (which was in the plans) and trying to set that up.

All three hit the same problem, 404: Not Found. I figured either I had entered the same wrong details 3 times, or it was definitely their end. So I reported that, unplugged the ATA and waited. This was on the 27th March.

On the 31st March, I get an email from Internode Provisioning to say they would be porting the number soon. After receiving this email, REGISTER worked, I was getting 200: OK in reply. Funnily enough, there was no challenge to credentials, it just saw my end and said “OK, you’re in”.

Calling outbound after this worked: the call trace showed the Internode end challenging the ATA for credentials, but once supplied, it connected the call, all worked. Inbound calls were a different matter though, a recorded (female) voice announced that the number was invalid. We were half-way there.

The next day, that message changed, it now was a recorded (male) voice announcing the number was unavailable, and offering to leave a voice-mail message. Progress. I had not changed my end, I still had T200 → HT814 → siproxd on the border router as my set-up. I was not seeing any incoming traffic being blocked by pf, although I was seeing people playing with sipvicious. I decided to firewall off my SIP ports, only exposing them to Internode’s SIP server.

Later on the help-desk got back to me. Despite their server telling me 200: OK when I sent a REGISTER, the number was still “not registered”. Confusing!

On a whim, I ripped out siproxd again and set up NAT on the border router. BINGO! We registered, and incoming calls worked. Internode use Broadsoft’s BroadWorks platform for their NodePhone VoIP service, and something about the operation of siproxd confused it. It then sent a misleading response leading my devices to think they were registered when they were not!

At this point, it was time to do two things: uninstall siproxd, and start reading up on Asterisk.

Configuring Asterisk

Asterisk is regarded as the “swiss army knife” of VoIP. It can be configured to do a lot. Officially, it is supported on Linux i386 and AMD64 platforms, but there are builds of it for platforms such as the Raspberry Pi.

OpenBSD also build and ship it in their ports. I wanted to avoid the pain of NAT, and I also wanted to avoid the telephone service going off-line if my cluster went haywire. So installation of this on the border router was a no-brainer. Yes, it’s adding a bit more attack surface to that box, but with appropriate firewalling rules, this could be managed.

As for configuration, there were two SIP channel drivers I could use, either the old chan_sip method, or the newer res_pjsip method. I had seen guides that discussed the older method (e.g. OCAU, WP), however in the back of my mind is the question: “how long do Digium plan to keep supporting two methods?” Thus from the outset I decided to use res_pjsip.

Audio CODECs

This is a pretty big aspect of configuration and should not be skimped on. You’ll find even if you buy all your equipment from one supplier, there are differences in what audio CODECs are supported. For instance the HT814 supports the OPUS CODEC (something I’d like to take advantage of eventually) but the GXP1615 does not. Some CODECs require patent licenses (e.g. SILK, SIREN7, G.722.1, G.722.2/AMR-WB), some did require patent licenses but are now “free” (G.729, G.723.1) and some are open-source (OPUS, Speex).

Some also go by multiple names. G.711a is also called PCMA and G.711u is PCMU. Pretty much everything supports these, and depending on the country you’re in, one will be preferred over the other. In my case, G.711a is the preferred option.

Your voice provider is a factor here too. Some only support particular CODECs, some will allow any CODEC. NodePhone VoIP allegedly will allow you to use whatever you like, but calls into and out of their network to non-VoIP targets may be restricted to narrow-band CODECs.

Internode-specific gotcha: G.729

One gotcha I stumbled on the hard way was that sometimes SIP implementations do not play by the rules.

Specifically, I found once I got Asterisk installed, incoming calls would be immediately hung-up on the moment I answered. Mobile or PSTN didn’t matter. I fired up tshark again and dug into the problem. The only clue I had was this error message:

[Apr  5 16:27:40] WARNING[-1][C-00000001] channel.c: Unable to find a codec translation path: (g729) -> (alaw)

Even if I specified only use G.711a (alaw), somehow I’d still get that message. I asked about it on the Asterisk forum. I was seeing this pattern in my SIP traffic:

|Time     | ${PROVIDER}                           |
|         |                   | ${ASTERISK_BOX}   |                   
|0.000000 |         INVITE SDP (g711A g7          |SIP INVITE From: <sip:${MYPSTNNUM}@${PROVIDER}29;user=phone> To:"${PROVIDER_NAME}"<sip:${MYSIPNUM}@${PROVIDER_DOMAIN}> Call-ID:BW061858648050420-1965318496@${PROVIDER}   CSeq:612303213
|         |(5060)   ------------------>  (5060)   |
|0.003443 |         100 Trying|                   |SIP Status 100 Trying
|         |(5060)   <------------------  (5060)   |
|0.025416 |         180 Ringing                   |SIP Status 180 Ringing
|         |(5060)   <------------------  (5060)   |
|0.954015 |         200 OK SDP (g711A g7          |SIP Status 200 OK
|         |(5060)   <------------------  (5060)   |
|0.986287 |         RTP (g711A)                   |RTP, 6 packets. Duration: 0.099s SSRC: 0x4F53507
|         |(27642)  <------------------  (18024)  |
|1.092729 |         RTP (g729)                    |RTP, 3 packets. Duration: 0.041s SSRC: 0xCD74F85
|         |(27642)  ------------------>  (18024)  |
|1.097523 |         ACK       |                   |SIP Request INVITE ACK 200 CSeq:612303213
|         |(5060)   ------------------>  (5060)   |
|1.099552 |         BYE       |                   |SIP Request BYE CSeq:29850
|         |(5060)   <------------------  (5060)   |
|1.149521 |         200 OK    |                   |SIP Status 200 OK
|         |(5060)   ------------------>  (5060)   |

I was reliably informed that this was definitely against the rules. So another help-desk email informing them of the problem. If you’re an Internode NodePhone VoIP customer, and you see the above behaviour, these are your options:

  1. Purchase and Install Digium’s G.729 CODEC: only an option if you are running Asterisk on a i386 or AMD64-based Linux machine.
  2. If, like me, you’re running Asterisk on something else, or you despise proprietary software, there is an open-source G.729 CODEC. On OpenBSD, install the asterisk-g729 package.
  3. Asterisk have added a work-around in later versions of their code. Patches exist for version 17, 16 and 13. It is also included in 13.32.0, 16.9.0 and 17.3.0. OpenBSD 6.7 will likely ship with a version of Asterisk that includes this work-around.
  4. You can also complain to their help-desk. We can work-around their problem, but really, it’s their end that’s doing the wrong thing, they should fix it.

Dial Plans

The other big thing to consider is your dial-plan layout. Every SIP endpoint is assigned a context which is used in extensions.conf to determine what is meant when a particular sequence of digits is entered or how a call should be routed.

The pattern syntax is documented on their Pattern Matching page. Notably, extensions are “literal” if they do not begin with an underscore (_) and a X matches any numeric digit. So an example:

exten => 123456,1,DoSomething()
exten => 123987,1,DoSomethingDifferent()
exten => _123XXX,1,DoSomethingElse()

The non-pattern extensions take precedence. So the number 123456 would exactly match the first extension listed above and would call DoSomething(). The number 123987 would exactly match the second entry and would call DoSomethingDifferent(). Any other 6-digit number beginning with 123 would trigger DoSomethingElse().

Avoiding expensive mistakes

Before doing anything, it’s worth reading Asterisk’s page on Dial plan Security. You do NOT want someone to call your PBX, then dial outbound to expensive international numbers and run up a big phone bill! In particular, you want to keep the default context as lean as possible! It’s fine to put all your internal extensions in default, but anything that dials outside, should be in a separate context for internal extensions.

Dialling more than one phone at a time

It is possible to have a dial-plan entry ring multiple phones in parallel by separating the endpoints with & symbols, but don’t put spaces around the & characters! So Dial(PJSIP/ep1&PJSIP/ep2&PJSIP/ep3), not Dial(PJSIP/ep1 & PJSIP…). The most obvious use case for this is when someone rings the home number and you want all phones to ring.

Incoming calls in the dial plan

When a call comes in from outside, the number dialled by the outside party (i.e. your number) appears as the extension dialled.

A simple option is to just ring everyone… so if your phone number was, say 0735359696, you’d create a context in your extensions.conf like this:

; Incoming calls
[incoming]
exten => 0735359696,1,Dial(PJSIP/ep1&PJSIP/ep2&…)

… then in pjsip.conf, assign that context to the endpoint:

[Provider-endpoint]
type=endpoint
context = incoming
; … etc

Outgoing calls

Usually you want to be able to ring outbound too. Some guides suggested the pattern _X! can be used to “match all” but I couldn’t get this to work.

Fax over IP

I mentioned that we had a fax machine we wanted to continue using. Whilst we don’t use it every day, it’s nice to know it is there if we want to use it.

Likewise with the dial-up modem. Even if I needed to do reduced-speed for it to work, it’d be better than nothing for situations where I need to use dial-up. It’ll never connect to the Internet again, but that doesn’t make it useless.

There are two ways to do Fax over IP. One is to use G.711u and hope for the best. The other is to use T.38, where the ATA basically “spoofs” the fax modem at the remote end, decodes the symbols being sent back to digital data, encodes that in UDP packets and transmits those over the Internet. The other end then reverses the process.

The good news is there are diagnostic tools out there for testing purposes. You don’t (yet) have to know of someone who has a working fax. Here in Australia, there’s FOLDS-B.

I’d recommend if you’ve got multiple fax-capable devices though, you plug two of them into separate ports on one or more ATAs and try faxing from one extension to the other. I tried calling FOLDS-B directly at first with no luck, then tried setting up Asterisk with an extension that calls SendFax to send a TIFF image, but had no luck — handshaking would be cut short after a second.

Using two internal endpoints saved a lot of phone calls externally and allowed me to “prove” the ATA could work for this.

Once you’ve got things working locally, you’re ready to hit the outside world. You’ll need a fairly complex image to send as it needs to be transmitting for ~70 seconds. I tried a few pages (e.g. this one) on the flatbed scanner of the WF-7510 without much luck… the fax would barely muster 20 seconds of data for them even at “fine” levels.

I had better luck using the 56kbps modem and a copy of efax-gtk. I took RCA’s test card image off Wikipedia as a SVG, loaded that up into The Gimp, rendering the SVG at 600dpi, extending the image size to A4-paper sized, rotated it to portrait mode, then converted it to 1-bit-per-pixel monochrome with patterned dithering and saved it as an uncompressed TIFF.

I then passed this through tiff2ps which gave me this file:

Sending that at 14400bps took 72 seconds. Perfect! Then the reply came back. Fax reception is very much a work-in-progress, so rather than receive on the VoIP line, I decided to use the PSTN to receive the reports. This was accomplished by specifying my PSTN phone number in the fax identity (FOLDS-B evidently doesn’t check this against caller ID).

I tried again, and this time, received the report. FOLDS-B got a garbled mess apparently. The suggestion was to set the speed to 4800bps. In efax this is accomplished by setting the modem capabilities:

CAPABILITIES
       The capabilities of the local hardware and software can be set using a string of 8 digits separated by commas:

       vr,br,wd,ln,df,ec,bf,st

       where:

       vr  (vertical resolution) =
                0 for 98 lines per inch
                1 for 196 lpi

       br  (bit rate) =
                0 for 2400 bps
                1 for 4800
                2 for 7200
                3 for 9600
                4 for 12000 (V.17)
                5 for 14400 (V.17)

So in this case, I should set the capabilities to 1,1,0,2,0,0,0,0. I tried this, but then found the call just didn’t negotiate either. On a whim again I tried 7200bps (that’s 1,2,0,2,0,0,0,0). Eureka! I got this:

A pretty much “perfect” FOLDS-B report at 7200bps.

The transmission level and SNR are pretty much spot-on. Emboldened by this, I tried again faxing the same image (which I had printed out) from the WF-7510. I chose “Photo” quality and scanned it on the flat-bed. This didn’t work so well — evidently some detail was missed and it only transmitted for 60 seconds so I tried again, and faxed something else for the second sheet.

FOLDS-B wasn’t happy about the second page, but at least it had something to go on. The fax modem in the WF-7510 doesn’t appear to have a speed control other than turning V.34 mode (33.6kbps) on/off. It appears it tried sending at 14400bps. FOLDS-B tells a sorry tale:

An ugly 14400bps transmission

Now it is possible to get near perfect results at 14400bps through an ATA that supports T.38. Maybe a longer transmission on the first page might have helped, but fair to say the speed is not helping matters.

Transmit level is very quiet, and I can’t see a way to adjust this on the WF-7510. Nor can I force it to V.29 (7200bps) mode.

There’s allegedly a “reference” test sheet you can get by “receive polling” 019725112. When I ring this number (with a fax or telephone) I get a recorded message saying the number is not valid. If anyone knows what the correct number is, or how to get a copy of this reference sheet, it’d be greatly appreciated.

So yes, Fax over IP is doable… a lot of pissing about with ATA settings… and you better hope the fax machine itself has lots of knobs and dials. Hylafax + t38modem will probably crack this nut. Then again, so will email.

Configuration Settings

Asterisk Configuration

So whilst my set-up is still a work-in-progress, I figured I’d post what I have here.

Most of these are defaults shipped with OpenBSD 6.6. Specifically of interest would be pjsip.conf and extensions.conf.

pf firewall rules

You’re going to want to pierce your firewall appropriately to expose yourself just enough for your VoIP service to work.

# Define some interfaces
internal=em0

# SIP addresses
sip_provider=203.2.134.1  # sip.internode.on.net
sip_endpoints="{ 10.0.0.3, 10.0.0.4 }"

# Allow SIP from router to SIP devices/softphones and vice versa
pass in on $internal proto udp from $sip_endpoints to self port { 5060, 10000:30000 }
pass out on $internal proto udp from self port { 5060, 10000:30000 } to $sip_endpoints

# Allow SIP from Internode to self and vice versa.  This could be
# tightened up a lot further.  Maybe try some calls both ways and log
# the traffic to see which specific ports.  "All UDP ports" is in line
# with Internode recommendations:
# https://www.internode.on.net/support/faq/phone_and_voip/nodephone/troubleshooting_nodephone/#Do_I_have_to_open_up_any_ports_i
pass in on egress inet proto udp from $sip_provider to self
pass out on egress inet proto udp from self to $sip_provider

Grandstream HT814 settings

Some notable settings first before I dump full screenshots…

Ring Cadence Settings

A special thank-you to @scottsip on the Grandstream forums for pointing me to this document from the ITU (Page 4 has the settings for Australia). Also worth mentioning is this Whirlpool forum thread and this review/teardown of the Grandstream HT802. The ones marked (?) I have no idea what they’re for.

  • System Ring Cadence: c=400/200-400/2000;
  • Dial Tone: f1=400@-12,f2=425@-12;
  • Busy Tone: f1=425,c=38/38;
  • Reorder Tone: f1=480,f2=620,c=25/25; (?)
  • Confirmation Tone: f1=350@-11,f2=440@-11,c=100/100-100/100-100/100; (?)
  • Call Waiting Tone: f1=425,c=20/20-20/440;
  • Prompt Tone: f1=350@-17,f2=440@-17,c=0/0; (?)
  • Conference Party Hangup Tone: f1=425@-15,c=600/600;

Notable fax-related settings

I changed lots of these trying to get something working, so if you set this and still don’t get any joy, check the screenshots below.

  • Fax Mode: T.38
  • Re-INVITE After Fax Tone Detected: Enabled
  • Jitter Buffer Type: Fixed
  • Jitter Buffer Length: Low (note, I can get away with this because it’s Ethernet LAN from ATA to Asterisk box)
  • Gain:
    • TX: 0dB
    • RX: 0dB
  • Disable Line Echo Canceller: Yes
  • Disable Network Echo Suppressor: Yes

Screenshots

Grandstream GXP1615 Settings

These are much the same as the HT814 above. The cadence settings use a slightly different syntax (they don’t have the @nn parts) and there are more tone settings. Again, the ones marked with (?) have an unknown purpose.

  • System Ringtone: c=400/200-400/2000;
  • Dial Tone: f1=400,f2=425;
  • Second Dial Tone: f1=450,f2=425; (?)
  • Message Waiting: f1=525;
  • Ring Back Tone: f1=1209,f2=852,c=20/20-20/200;
  • Call-Waiting Tone: f1=425,c=20/20-20/440;
    • Gain: Low
  • Busy Tone: f1=425,c=38/38;
  • Reorder Tone: f1=480,f2=620,c=25/25; (?)
GXP1615 ringtone settings

The steps forward

Things I need to figure out with this system:

  • Feature Codes: these are used to signal events like “transferring calls”, “park”, and other features you might want to initiate in the PBX. They are normally signalled using DTMF tone sequences, however this can give problems if your tones clash with some IVR you’re interacting with. I’m currently researching this area.
  • OPUS support: I mentioned the HT814 supports it, and as it’s an open CODEC I’d like to support it. There is this project which provides OPUS support to Asterisk. I’ll probably look at writing an OpenBSD port for it.
  • Voice mail and IVR… I’m thinking something along the lines of “Dial the year of birth of the person you wish to reach”… then that can ring the relevant phones or offer to leave a message.
  • I’d like to test wideband audio at some point. Work seems to have G.722 on their phones (or maybe its OPUS?), I should see if wideband pass-through in fact works.
Mar 022020
 

This is just a short note… today we finally made the jump across to the NBN. We’re running hybrid-fibre coax here in The Gap, and right now the HFC NTD is running off mains power (getting that onto solar will be a future project).

I already had my OpenBSD 6.6 router running through the ADSL terminating the PPPoE link. The router itself is a PC Engines APU2, which features 3 Ethernet ports. em0 faces my DMZ and internal network. em1 is presently plugged into the ADSL modem/router, and em2 was spare.

Thus, my interface configuration looked like this:

# /etc/hostname.em0
inet 10.20.50.254 255.255.255.0
inet6 alias 2001:44b8:21ac:70f9::fe 64
# … and a stack of !route commands for the internal subnets

# /etc/hostname.em1
up

# /etc/hostname.pppoe0
# pppoedev em1: ADSL
# pppoedev vlan2: NBN
inet 0.0.0.0 255.255.255.255 NONE \
        pppoedev em1 authproto chap \
        authname user@example.com \
        authkey mypassword up

… and of course /etc/pf.conf was configured with appropriate rules for my network. For the NBN, I read up that VLAN #2 was required, so I set up the following:

# /etc/hostname.em2  
up

# /etc/hostname.vlan2                                                                                                
vnetid 2
parent em2
up 

I then changed /etc/hostname.pppoe0 to point to vlan2 instead of em1. When the NBN NTD got installed, I tried this out… no dice, there was PADI frames being sent, but nada, nothing.

Digging around, I needed to set the transmit priority, so I amended /etc/hostname.vlan2:

# /etc/hostname.vlan2                                                                                                
vnetid 2
parent em2
txprio 1    # ← ADD THIS
up 

Bingo! I was now seeing PPPoE traffic. However I wasn’t out of the woods, nothing behind the router was able to get to the Internet. Turns out pf needs to be told what transmit priority to use. I amended my /etc/pf.conf:

# Scrub incoming traffic
match in all scrub (no-df)

# Set pppoe0 priority
match out on $external set prio 1  # ← ADD THIS

# Block all traffic by default (paranoia)
block log all
#block all

That was sufficient to get traffic working. I’m now getting the following out of SpeedTest.

~25Mbps down / ~18Mbps up on Internode HFC NBN

The link is theoretically supposed to be 50Mbps… but whatever. The primary concern is that it didn’t suddenly drop to 0Mbps when the plug got pulled in September. I’ll check again when things are “quieter” (it’ll be peak periods now), but as far as I’m concerned, this is a matter of ensuring continued service.

It already outperforms the ADSL2+ link (which was about 15Mbps / 2Mbps). Next stop will be to port the old telephone number over, but that can wait another day!

Jun 242018
 

For years up until last month, I ran a public NTP server which was on pool.ntp.org.  This mostly sat at Stratum II and was synchronised from random stratum I servers.

Last month, I had to knock that on the head because of a sudden spike in Internet traffic.  Through tcpdump on the border router, I identified the first culprit: NTP client traffic.  For whatever reason, I was getting a lot of traffic.  Dumping packets to a file over a 15 minute period and analysing showed about 90% of the traffic was NTP.

For about 3 weeks straight, I had an effectively constant 4~5Mbps incoming stream … a pelican flew into my inbox, carrying a AU$300 bill for the nearly 800GB consumed in May.  (My plan was originally 250GB.  Many thanks for Internode there: they capped the overusage charge at AU$300 and provided an instant upgrade of the plan to 500GB/month.)

The other culprit I tracked down to HackChat’s websockets, which was a big contributor to June’s data usage.  (Thank-you OpenBSD pflow and nfdump!)

So right now, I’m looking at whether I re-join the NTP server pool.  I have the monitoring in place to keep track of data usage now.  For sure the experience has cost me over $400, but that’s still cheaper than university studies … and they didn’t teach me this stuff anyway!

One thought is that I could do away with any external NTP server altogether by using local sources to synchronise time.  With a long-wire antenna, I could sync to WWV.  Usually I can pick it up on 10MHz or 15MHz, but it’s weak, and I’d have to rig up the receiver, the antenna and a suitable decoder.

Then there’s the problem of lightning strikes, I already have a Yaesu FT-897D in the junk box thanks to Thor’s past efforts.

The more practical approach would appear to be using GPS time sync.  You can do it with any NMEA-compatible module, but you get far better results using one that supports PPS.  Some even have a 10PPS option, but I’m not sure if kpps supports that.

As it happens, I could have got a GPS module in the TS-7670.  Here’s where it is on the schematic:

and here are the connections to its UART (bottom two):

and the PPS pin (LCD_D16):

The footprints are there on the board, and I can buy the module.  No problems.  Buying the TS-7670 with the GPS option would have meant buying the top-of-the-line “development” model which would have included WiFi (not needed), an extra 128MB RAM (nice to have, but not essential) and an extra CAN port (not needed).

The other option is to go something else, such as this module.  Either way, I need to watch logic levels, Freescale like their 1.8V, although it looks like the I/O pins can be switched between 1.8V and 3.3V from the GPIO registers (chapter 9 of the datasheet).

Doing this, would allow me to run Stratum 1 within the network, and perhaps to IPv6 NTP clients.  IPv4 clients might get Stratum 2, we’ll see how I feel about it later.

May 192018
 

So before going on the trip, I noticed the router I was using would occasionally drop off the network.  The switch still reported the link as being up, but the router would not respond to pings from the internal network.  If I SSHed into it from outside the network, and tried pinging internal IPs, it failed to ping them.

Something was up.  After much debugging (and some arguments about upgrades), it was decided that the hardware was flakey.  In that discussion, it was recommended that I have a look at PC Engines’ APU2 single board computer.

This is the only x86 computer I have seen with schematics and CoreBoot out-of-the-box, and it happens there’s a local supplier of them.  For sure, this machine is overkill for the job, but it ticks nearly all the boxes.

The only one it didn’t tick was being able to run directly from the battery.  As it happens, the unit only draws about 1.5A, and so a LM1085-12 LDO which can be sourced locally did the trick.  I basically put 100µF capacitors on the input and output, bolted it to a small heatsink and threw it all into a salvaged case.

After hooking it up to a bench supply (disconnected from the APU2) and winding the voltage right up to the PSUs maximum, and observing that the voltage stayed at 12V, I decided to hook it up and see how it went.  I plugged in my null modem cable, and sure enough, I was staring at CoreBoot.

I PXE-booted OpenBSD 6.3 and installed that onto the SD card, this was fairly painless and before long, the machine was booting on its own. I copied across the configuration settings from the old one, set up sniproxy, and I was in business, it was time to issue a `shutdown -p now` to both machines and for them to swap places.

Of course, a nicety of this box is there’s three Ethernet ports, so room for a move to another Internet connection, such as the HFC we’re supposed to be getting in this part of Brisbane (sadly, no thin pieces of glass for us), so in theory, I can run both in parallel and migrate between them.

Sep 132017
 

I have a virtual machine that I set up as a secondary DNS server which runs OpenBSD 6.1.  Today logging into it, I noticed system messages were piling up in /var/mail because I hadn’t configured the mail server to deliver those messages.  Setting up OpenSMTPD was no trouble, but then I had the old mail (thankfully not much) that was still to be delivered.

There are a couple of solutions out there, written in Perl, Python and PHP (urgh!).  I don’t have Python on this box, and the Perl one didn’t seem to work with the mailbox.  So I cooked up my own:

#!/bin/sh

for file in "$@"; do
        grep -n '^From ' ${file} | {
                prev=1
                while read line; do
                        cur=$( echo "${line}" | cut -f 1 -d: )
                        if [ "${prev}" != "${cur}" ]; then
                                sed -ne "${prev},$(( ${cur} - 1 )) p" ${file} > ${prev}.eml
                        fi
                        prev=${cur}
                done
        }
done

If there’s a line in your email body starting with “From “, it may get confused, but it was good enough for the messages that OpenBSD’s daemons send me. I was then able to pipe these individually into sendmail -t to send them on their way.

Aug 202017
 

OpenNebula is running now… I ended up re-loading my VM with Ubuntu Linux and throwing OpenNebula on that.  That works… and I can debug the issue with Gentoo later.

I still have to figure out corosync/heartbeat for two VMs, the one running OpenNebula, and the core router.  For now, the VMs are only set up to run on one node, but I can configure them on the other too… it’s then a matter of configuring libvirt to not start the instances at boot, and setting up the Linux-HA tools to figure out which node gets to fire up which VM.

The VM hosts are still running Gentoo however, and so far I’ve managed to get them to behave with OpenNebula.  A big part was disabling the authentication in libvirt, otherwise polkit generally made a mess of things from OpenNebula’s point of view.

That, and firewalld had to be told to open up ports for VNC/spice… I allocated 5900-6900… I doubt I’ll have that many VMs.

Last weekend I replaced the border router… previously this was a function of my aging web server, but now I have an ex-RAAF-base Advantech UNO-1150G industrial PC which is performing the routing function.  I tried to set it up with Gentoo, and while it worked, I found it wasn’t particularly stable due to limited memory (it only has 256MB RAM).  In the end, I managed to get OpenBSD 6.1/i386 running sweetly, so for now, it’s staying that way.

While the AMD Geode LX800 is no speed demon, a nice feature of this machine is it’s happy with any voltage between 9 and 32V.

The border router was also given the responsibility of managing the domain: I did this by installing ISC BIND9 from ports and copying across the config from Linux.  This seemed to be working, and so I left it.  Big mistake, turns out bind9 didn’t think it was authoritative, and so refused to handle AXFRs with my slaves.

I was using two different slave DNS providers, puck.nether.net and Roller Network, both at the time of subscription being freebies.  Turns out, when your DNS goes offline, puck.nether.net responds by disabling your domain then emailing you about it.  I received that email Friday morning… and so I wound up in a mad rush trying to figure out why BIND9 didn’t consider itself authoritative.

Since I was in a rush, I decided to tell the border router to just port-forward to the old server, which got things going until I could look into it properly.  It took a bit of tinkering with pf.conf, but eventually got that going, and the crisis was averted.  Re-enabling the domains on puck.nether.net worked, and they stayed enabled.

It was at that time I discovered that Roller Network had decided to make their slave DNS a paid offering.  Fair enough, these things do cost money… At first I thought, well, I’ll just pay for an account with them, until I realised their personal plans were US$5/month.  My workplace uses Vultr for hosting instances of their WideSky platform for customers… and aside from the odd hiccup, they’ve been fine.  US$5/month VPS which can run almost anything trumps US$5/month that only does secondary DNS, so out came the debit card for a new instance in their Sydney data centre.

Later I might use it to act as a caching front-end and as a secondary mail exchanger… but for now, it’s a DIY secondary DNS.  I used their ISO library to install an OpenBSD 6.1 server, and managed to nut out nsd to act as a secondary name server.

Getting that going this morning, I was able to figure out my DNS woes on the border router and got that running, so after removing the port forward entries, I was able to trigger my secondary DNS at Vultr to re-transfer the domain and debug it until I got it working.

With most of the physical stuff worked out, it was time to turn my attention to getting virtual instances working.  Up until now, everything running on the VM was through hand-crafted VMs using libvirt directly.  This is painful and tedious… but for whatever reason, OpenNebula was not successfully deploying VMs.  It’d get part way, then barf trying to set up 802.1Q network interfaces.

In the end, I knew OpenNebula worked fine with bridges that were already defined… but I didn’t want to have to hand-configure each VLAN… so I turned to another automation tool in my toolkit… Ansible:

- hosts: compute
  tasks:
  - name: Configure networking
    template: src=compute-net.j2 dest=/etc/conf.d/net
# …
- hosts: compute
  tasks:
# …
  - name: Add symbolic links (instance VLAN interfaces)
    file: src=net.lo dest=/etc/init.d/net.bond0.{{item}} state=link
    with_sequence: start=128 end=193
  - name: Add symbolic links (instance VLAN bridges)
    file: src=net.lo dest=/etc/init.d/net.vlan{{item}} state=link
    with_sequence: start=128 end=193
# …
  - name: Make services start at boot (instance VLAN bridges)
    command: rc-update add net.vlan{{item}} default
    with_sequence: start=128 end=193 

That’s a snippet of the playbook… and it basically creates symbolic links from Gentoo’s net.lo for all the VLAN ports and bridges, then sets them up to start at boot.

In the compute-net.j2 file referenced above, I put in the following to enumerate all the configuration bits.

# Instance VLANs
{% for vlan in range(128,193) %}
config_vlan{{vlan}}="null"
config_bond0_{{vlan}}="null"
rc_net_vlan{{vlan}}_need="net.bond0.{{vlan}}"
{% endfor %}
# …
vlans_bond0="5 8 10{% for vlan in range(128,193) %} {{vlan}} {% endfor %}248 249 250 251 252"
vlans_bond1="253"
# …
# Instance VLANs
{% for vlan in range(128,193) %}
bridge_vlan{{vlan}}="bond0.{{vlan}}"
{% endfor %} 

The start and end ranges are a little off, but it saved a lot of work.

This naturally took a while for OpenRC to bring up… but it worked. Going back to OpenNebula, I told it what bridges to use, and before long I had my first instance… an OpenBSD router to link my personal VLAN to the DMZ.

I spent a bit of time re-working my routing tables after that… in fact, my network is getting big enough now I have to write some details down.  I spent a few hours documenting the effort:

That’s page 1 of about 15… yes my hand is sore… but at least now should I get run over by a bus, others have a fighting chance doing anything with the network without my technical input.