Jun 242018

For years up until last month, I ran a public NTP server which was on pool.ntp.org.  This mostly sat at Stratum II and was synchronised from random stratum I servers.

Last month, I had to knock that on the head because of a sudden spike in Internet traffic.  Through tcpdump on the border router, I identified the first culprit: NTP client traffic.  For whatever reason, I was getting a lot of traffic.  Dumping packets to a file over a 15 minute period and analysing showed about 90% of the traffic was NTP.

For about 3 weeks straight, I had an effectively constant 4~5Mbps incoming stream … a pelican flew into my inbox, carrying a AU$300 bill for the nearly 800GB consumed in May.  (My plan was originally 250GB.  Many thanks for Internode there: they capped the overusage charge at AU$300 and provided an instant upgrade of the plan to 500GB/month.)

The other culprit I tracked down to HackChat’s websockets, which was a big contributor to June’s data usage.  (Thank-you OpenBSD pflow and nfdump!)

So right now, I’m looking at whether I re-join the NTP server pool.  I have the monitoring in place to keep track of data usage now.  For sure the experience has cost me over $400, but that’s still cheaper than university studies … and they didn’t teach me this stuff anyway!

One thought is that I could do away with any external NTP server altogether by using local sources to synchronise time.  With a long-wire antenna, I could sync to WWV.  Usually I can pick it up on 10MHz or 15MHz, but it’s weak, and I’d have to rig up the receiver, the antenna and a suitable decoder.

Then there’s the problem of lightning strikes, I already have a Yaesu FT-897D in the junk box thanks to Thor’s past efforts.

The more practical approach would appear to be using GPS time sync.  You can do it with any NMEA-compatible module, but you get far better results using one that supports PPS.  Some even have a 10PPS option, but I’m not sure if kpps supports that.

As it happens, I could have got a GPS module in the TS-7670.  Here’s where it is on the schematic:

and here are the connections to its UART (bottom two):

and the PPS pin (LCD_D16):

The footprints are there on the board, and I can buy the module.  No problems.  Buying the TS-7670 with the GPS option would have meant buying the top-of-the-line “development” model which would have included WiFi (not needed), an extra 128MB RAM (nice to have, but not essential) and an extra CAN port (not needed).

The other option is to go something else, such as this module.  Either way, I need to watch logic levels, Freescale like their 1.8V, although it looks like the I/O pins can be switched between 1.8V and 3.3V from the GPIO registers (chapter 9 of the datasheet).

Doing this, would allow me to run Stratum 1 within the network, and perhaps to IPv6 NTP clients.  IPv4 clients might get Stratum 2, we’ll see how I feel about it later.

May 192018

So before going on the trip, I noticed the router I was using would occasionally drop off the network.  The switch still reported the link as being up, but the router would not respond to pings from the internal network.  If I SSHed into it from outside the network, and tried pinging internal IPs, it failed to ping them.

Something was up.  After much debugging (and some arguments about upgrades), it was decided that the hardware was flakey.  In that discussion, it was recommended that I have a look at PC Engines’ APU2 single board computer.

This is the only x86 computer I have seen with schematics and CoreBoot out-of-the-box, and it happens there’s a local supplier of them.  For sure, this machine is overkill for the job, but it ticks nearly all the boxes.

The only one it didn’t tick was being able to run directly from the battery.  As it happens, the unit only draws about 1.5A, and so a LM1085-12 LDO which can be sourced locally did the trick.  I basically put 100µF capacitors on the input and output, bolted it to a small heatsink and threw it all into a salvaged case.

After hooking it up to a bench supply (disconnected from the APU2) and winding the voltage right up to the PSUs maximum, and observing that the voltage stayed at 12V, I decided to hook it up and see how it went.  I plugged in my null modem cable, and sure enough, I was staring at CoreBoot.

I PXE-booted OpenBSD 6.3 and installed that onto the SD card, this was fairly painless and before long, the machine was booting on its own. I copied across the configuration settings from the old one, set up sniproxy, and I was in business, it was time to issue a `shutdown -p now` to both machines and for them to swap places.

Of course, a nicety of this box is there’s three Ethernet ports, so room for a move to another Internet connection, such as the HFC we’re supposed to be getting in this part of Brisbane (sadly, no thin pieces of glass for us), so in theory, I can run both in parallel and migrate between them.

Sep 132017

I have a virtual machine that I set up as a secondary DNS server which runs OpenBSD 6.1.  Today logging into it, I noticed system messages were piling up in /var/mail because I hadn’t configured the mail server to deliver those messages.  Setting up OpenSMTPD was no trouble, but then I had the old mail (thankfully not much) that was still to be delivered.

There are a couple of solutions out there, written in Perl, Python and PHP (urgh!).  I don’t have Python on this box, and the Perl one didn’t seem to work with the mailbox.  So I cooked up my own:


for file in "$@"; do
        grep -n '^From ' ${file} | {
                while read line; do
                        cur=$( echo "${line}" | cut -f 1 -d: )
                        if [ "${prev}" != "${cur}" ]; then
                                sed -ne "${prev},$(( ${cur} - 1 )) p" ${file} > ${prev}.eml

If there’s a line in your email body starting with “From “, it may get confused, but it was good enough for the messages that OpenBSD’s daemons send me. I was then able to pipe these individually into sendmail -t to send them on their way.

Aug 202017

OpenNebula is running now… I ended up re-loading my VM with Ubuntu Linux and throwing OpenNebula on that.  That works… and I can debug the issue with Gentoo later.

I still have to figure out corosync/heartbeat for two VMs, the one running OpenNebula, and the core router.  For now, the VMs are only set up to run on one node, but I can configure them on the other too… it’s then a matter of configuring libvirt to not start the instances at boot, and setting up the Linux-HA tools to figure out which node gets to fire up which VM.

The VM hosts are still running Gentoo however, and so far I’ve managed to get them to behave with OpenNebula.  A big part was disabling the authentication in libvirt, otherwise polkit generally made a mess of things from OpenNebula’s point of view.

That, and firewalld had to be told to open up ports for VNC/spice… I allocated 5900-6900… I doubt I’ll have that many VMs.

Last weekend I replaced the border router… previously this was a function of my aging web server, but now I have an ex-RAAF-base Advantech UNO-1150G industrial PC which is performing the routing function.  I tried to set it up with Gentoo, and while it worked, I found it wasn’t particularly stable due to limited memory (it only has 256MB RAM).  In the end, I managed to get OpenBSD 6.1/i386 running sweetly, so for now, it’s staying that way.

While the AMD Geode LX800 is no speed demon, a nice feature of this machine is it’s happy with any voltage between 9 and 32V.

The border router was also given the responsibility of managing the domain: I did this by installing ISC BIND9 from ports and copying across the config from Linux.  This seemed to be working, and so I left it.  Big mistake, turns out bind9 didn’t think it was authoritative, and so refused to handle AXFRs with my slaves.

I was using two different slave DNS providers, puck.nether.net and Roller Network, both at the time of subscription being freebies.  Turns out, when your DNS goes offline, puck.nether.net responds by disabling your domain then emailing you about it.  I received that email Friday morning… and so I wound up in a mad rush trying to figure out why BIND9 didn’t consider itself authoritative.

Since I was in a rush, I decided to tell the border router to just port-forward to the old server, which got things going until I could look into it properly.  It took a bit of tinkering with pf.conf, but eventually got that going, and the crisis was averted.  Re-enabling the domains on puck.nether.net worked, and they stayed enabled.

It was at that time I discovered that Roller Network had decided to make their slave DNS a paid offering.  Fair enough, these things do cost money… At first I thought, well, I’ll just pay for an account with them, until I realised their personal plans were US$5/month.  My workplace uses Vultr for hosting instances of their WideSky platform for customers… and aside from the odd hiccup, they’ve been fine.  US$5/month VPS which can run almost anything trumps US$5/month that only does secondary DNS, so out came the debit card for a new instance in their Sydney data centre.

Later I might use it to act as a caching front-end and as a secondary mail exchanger… but for now, it’s a DIY secondary DNS.  I used their ISO library to install an OpenBSD 6.1 server, and managed to nut out nsd to act as a secondary name server.

Getting that going this morning, I was able to figure out my DNS woes on the border router and got that running, so after removing the port forward entries, I was able to trigger my secondary DNS at Vultr to re-transfer the domain and debug it until I got it working.

With most of the physical stuff worked out, it was time to turn my attention to getting virtual instances working.  Up until now, everything running on the VM was through hand-crafted VMs using libvirt directly.  This is painful and tedious… but for whatever reason, OpenNebula was not successfully deploying VMs.  It’d get part way, then barf trying to set up 802.1Q network interfaces.

In the end, I knew OpenNebula worked fine with bridges that were already defined… but I didn’t want to have to hand-configure each VLAN… so I turned to another automation tool in my toolkit… Ansible:

- hosts: compute
  - name: Configure networking
    template: src=compute-net.j2 dest=/etc/conf.d/net
# …
- hosts: compute
# …
  - name: Add symbolic links (instance VLAN interfaces)
    file: src=net.lo dest=/etc/init.d/net.bond0.{{item}} state=link
    with_sequence: start=128 end=193
  - name: Add symbolic links (instance VLAN bridges)
    file: src=net.lo dest=/etc/init.d/net.vlan{{item}} state=link
    with_sequence: start=128 end=193
# …
  - name: Make services start at boot (instance VLAN bridges)
    command: rc-update add net.vlan{{item}} default
    with_sequence: start=128 end=193 

That’s a snippet of the playbook… and it basically creates symbolic links from Gentoo’s net.lo for all the VLAN ports and bridges, then sets them up to start at boot.

In the compute-net.j2 file referenced above, I put in the following to enumerate all the configuration bits.

# Instance VLANs
{% for vlan in range(128,193) %}
{% endfor %}
# …
vlans_bond0="5 8 10{% for vlan in range(128,193) %} {{vlan}} {% endfor %}248 249 250 251 252"
# …
# Instance VLANs
{% for vlan in range(128,193) %}
{% endfor %} 

The start and end ranges are a little off, but it saved a lot of work.

This naturally took a while for OpenRC to bring up… but it worked. Going back to OpenNebula, I told it what bridges to use, and before long I had my first instance… an OpenBSD router to link my personal VLAN to the DMZ.

I spent a bit of time re-working my routing tables after that… in fact, my network is getting big enough now I have to write some details down.  I spent a few hours documenting the effort:

That’s page 1 of about 15… yes my hand is sore… but at least now should I get run over by a bus, others have a fighting chance doing anything with the network without my technical input.