Mar 122018

Well, in my last post I discussed getting OpenADK to build a dev environment on the TS-7670.  I had gotten Gentoo’s Portage installed, and started building packages.

The original plan was to build everything into /tmp/seed, but that requires that all the dependencies are present in the chroot.  They aren’t.  In the end, I decided to go the ill-advised route of compiling Gentoo over the top of OpenADK.

This is an ugly way to do things, but it so far is bearing fruit.  Initially there were some hiccups, and I had to restore some binaries from my OpenADK build tree.  When Gentoo installed python-exec; that broke Portage and I found I had to unpack a Python 2.7 binary I had built earlier then use that to re-install Portage.  I could then continue.

Right now, it’s grinding away at gcc; which was my nemesis from the beginning.  This time though, it successfully built xgcc and xg++; which means it has compiled itself using the OpenADK-supplied gcc; and now is building itself using its self-built binaries.  I think it does two or three passes at this.

If it gets through this, there’s about 65 packages to go after that.  Mostly small ones.  I should be able to do a ROOT=/tmp/seed emerge -ek @system then tar up /tmp/seed and emerge catalyst.  I have some wrapper scripts around Catalyst that I developed back when I was responsible for doing the MIPS stages.  These have been tweaked to do musl builds, and were used to produce these x86 stages.  The same will work for ARMv5.

It might be another week of grinding away, but we should get there. 🙂

Feb 262018

So, after a longish wait… my laptop finally coughed up an image with a C/C++ compiler and almost all the bits necessary to make Gentoo Portage tick.

Almost everything… wget built, but it segfaults on start-up.  No matter, it seems curl works.  We do have an issue though: Portage no longer supports customising the downloader like it used to, or at least I couldn’t see how to do it, it used to be settings in make.conf.

Thankfully, I know shell scripts, and can make my own wget using the working curl:

bash-4.4# cat > /usr/bin/wget

while [ $# -gt 0 ]; do
    case "$1" in
        -O) OUT="$2"; shift;;
        -t) shift;;
        -T) shift;;
        --passive-ftp) : ;;
        *) break ;;

set -ex
curl --progress-bar -o "${OUT}" "$1"

Okay, it’s a little (a lot) braindead, but it beats downloading the lot by hand!

I was able to get Gentoo installed by hand using these instructions.  I have an old 1TB HDD plugged into a USB dock, formatted with a 10GB swap partition and the rest btrfs.  Sure, it’s only USB 2.0, but I’d sooner just put up with some CPU overhead than wear out my eMMC.

Next step; ROOT=/tmp/seed emerge -ev system

Jan 292018

So, fun and games with the TS-7670.

At present, I have it up and running:

root@ts7670:~# uname -a
Linux ts7670 4.14.15-vrt-ts7670-00031-g1a006273f907-dirty #2 Sun Jan 28 20:21:08 EST 2018 armv5tejl GNU/Linux

That’s booted up into Debian Stretch right now.  debootstrap did its deed a few days ago on the eMMC, and I was able to boot up this new image.  Today I built a new kernel, and tweaked U-Boot to boot from eMMC.

Thus now the unit can boot without any MicroSD cards fitted.

There’s a lot of bit rot to address.  U-Boot was forked from some time in 2014.  I had a crack at rebasing the code onto current U-Boot, but there’s a lot of clean-up work to do just to get it to compile.  Even the kernel needed some fixes to get the newer devicetree sources to build.

As for getting Gentoo working… I have a cross-compiling toolchain that works.  With it, I’ve been able to compile about 99% of a seed stage needed for catalyst.  The 1% that eludes me, is GCC (compiled to run on ARMv5).  GCC 4.9.4 will try to build, but fails near the end… anything newer will barf complaining that my C++ compiler is not working.  Utter bollocks, both AMD64 and ARM toolchains have working C++ compilers, just it’s looking for a binary called “g++” rather than being specific about which one.  I suspect it wants the AMD64 g++, but then if I symlink that to /usr/bin/g++, it throws in ARM CFLAGS, and AMD64 g++ barfs on those.

I’ve explored other options.  I can compile GCC by hand without C++ support, and this works, but you can’t build modern GCC without a C++ compiler … and people wonder why I don’t like C++ on embedded!

buildroot was my next thought, but as it happens, they’ve stripped out the ability to compile a native GCC on the target.

crosstool-ng is the next logical choice, but I’ll have to fiddle with settings to get the compiler to build.

I’ve also had OpenADK suggested, which may be worth a look.  Other options are OpenEmbedded/Yocto, and Cross Linux from Scratch.  I think for the latter, cross is what I’ll get, this stuff can be infuriatingly difficult.

Jan 242018

So, I now have my little battery monitoring computer.  Shipping wound up being a little more than I was expecting… about US$80… but never mind.  It’s here, arrived safely:

>> TS-BOOTROM - built Jan 26 2017 12:29:21
>> Copyright (c) 2013, Technologic Systems
LLCLLLLLLLFLCLLJUncompressing Linux... done, booting the kernel.
/ts/fastboot file present.  Booting to initramfs instead
Booted from eMMC in 3.15s
Initramfs Web Interface: http://ts7670-498476.local
Total RAM: 128MB
# exit
INIT: version 2.88 booting
[info] Using makefile-style concurrent boot in runlevel S.
[ ok ] Starting the hotplug events dispatcher: udevd.
[ ok ] Synthesizing the initial hotplug events...done.
[ ok ] Waiting for /dev to be fully populated...done.
[ ok ] Activating swap...done.
[....] Checking root file system...fsck from util-linux 2.20.1
e2fsck 1.42.5 (29-Jul-2012)
/dev/mmcblk2p2: clean, 48540/117600 files, 282972/469760 blocks
[ ok ] Cleaning up temporary files... /tmp /lib/init/rw.
ts7670-498476 login: root
Linux ts7670-498476 #1 PREEMPT Mon Nov 27 11:05:10 PST 2017 armv5tejl
TS Root Image 2017-11-27

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.


The on-board 2GB eMMC has a version of Debian Wheezy on it.  That’ll be going very soon.  For now, all I’ve done is pop the cover, shove a 8GB MicroSD card into one of the on-board slots, wired up a 12V power brick temporarily to the unit, hooked a USB cable into the console port (/dev/ttyAMA0 is wired up to an on-board CP2103 USB-serial chip) and verified that it is alive.

Next step will be to bootstrap Gentoo.  I could use standard ARMv5 stages, or I can build my own, which I might do.  I’ve done this before for mips64el n64 using glibc.  Modern glibc is a goliath on a machine with 128MB RAM though, so I’ll be looking at either µClibc/µClibc-ng or musl… most likely the latter.

That said, 20 years ago, we had the same computing power in a desktop. 🙂

I have a few options for interfacing to the power meters…

  • I²C, SPI, a number of GPIOs and a spare UART on a 2.54mm header inside the case.
  • Another spare UART on the footprint for the GPS module (which my unit does not have)
  • Two RS-232 serial ports with RTS/CTS control lines, exposed via RJ-45 jacks
  • Two CANbus ports on a single RJ-45 jack
  • RS-485 on a port marked “Modbus”

In theory, I could just skip the LPC810s and hook this up directly to the INA219Bs.  I’d have to double check what the TTL voltage is… Freescale love their 1.8V logic… but shifting that up to 3.3V or 5V is not hard.  The run is a little longer than I’m comfortable running I²C though.

The LPC810s don’t feature CANbus, so I think my original plan of doing Modbus is going to be the winner.  I can either do a single-ended UART using a resistor/diode in parallel to link RX and TX to the one UART line, or use RS-485.

I’m leaning towards the latter, if I decide to buy a little mains energy meter to monitor power, I can use the same RS-485 link to poll that.  I have some RS-485 transceivers coming for that.

For now though, I’ll at least get Debian Stretch going… this should not be difficult, as I’ll just use the images I’ve built for work to get things going.  I’m downloading a Jessie image now:

root@ts7670-498476:~# curl | xzcat | dd of=/dev/mmcblk0 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  113M    0  544k    0     0   114k      0  0:16:48  0:00:04  0:16:44  116k

Once that is done, I can reboot, re-format the eMMC and get debootstrap going.  I might even publish an updated image while I’m at it.

Jan 232018

So, last time I was trying to get Gentoo’s portage to cross-build gcc so that I’d have a C/C++ compiler in my ARMv5 musl environment.

It is literally the last piece of the puzzle.  Once compiled, that is the last step I need before I can throw the shiny new environment onto an ARMv5 VM (or real ARMv5 CPU), do an emerge -e world on it then tar the lot up and throw it at Catalyst.

Building an entire OS on a 454MHz ARMv5 machine with 128MB RAM does not faze me one bit… I used to do it regularly on a (Gateway-branded) Cobalt Qube II server appliance, which sports a 250MHz QED RM5231 and 128MB RAM.  The other compile workhorse I used in those days was an SGI O2; 300MHz RM5200, again 128MB RAM.

Yes, Linux and its userland has bulked up a bit in the last 10 years, but not so much so that a build on these is impossible.

Certainly, native building is easier than cross-compiling.  Cross-compilers have always been a voodoo art for me.  Getting one that will build a Linux kernel or U-Boot, usually isn’t too hard… but get userland involved and it gets complex.  Throw in C++ and complexity skyrockets!

I’m taking OpenADK for a spin now, and in concept, it’s exactly what I remember buildroot used to be.  It’s a tool for generating a fully fledged embedded Linux system with a wide package selection including development tools.  I also find that you have to hold your tongue just right to get stuff to compile.

Selecting a generic arm926ej-s; it succeeded to build a x86-64 hosted cross-toolchain once, but then silently refused to build anything else.  I told it instead to build for a Versatile PB with an arm926ej-s CPU… it failed to build the cross-toolchain, even though it pretty much is the exact same target.

A make cleandirs later, and it happily started building everything, but then hiccupped on permissions, so against my better judgement, I’m running it now with sudo, and things are progressing.  With some luck, I should have something that will give me a working native gcc/g++ for musl on ARMv5.

Jan 172018

I’ve taken the plunge and gotten a TS-7670 ordered in a DIN-rail mount for monitoring the battery.  Not sure what the shipping will be from Arizona to here, but I somehow doubt I’m up for more than AU$300 for this thing.  The unit itself will cost AU$250.

Some will argue that a Raspberry Pi or BeagleBone would be cheaper, and that would be correct, however by the time you’ve added a DIN-rail mount case, an RS-485 control board and a 12V to 5V step-down power converter, you’d be around that figure anyway.  Plus, the Raspberry Pi doesn’t give you schematics.  The BeagleBone does, but is also a more sophisticated beast.

The plan is I’ll spin a version of Gentoo Linux on it… possibly using the musl C library to keep memory usage down as I’ve gone the base model with 128MB RAM.  I’ll re-spin the kernel and U-Boot patches I have for the latest release.

There will be two functions looked after:

  • Access to the IPMI/L2 management network
  • Polling of the two DC power meters (still to be fully designed) via Modbus

It can report to a VM running on one of the hosts.  I believe collectd has the necessary bits and pieces to do this.  Failing that, I’ve written code before that polls Modbus… I write such code for a day job.

Aug 202017

OpenNebula is running now… I ended up re-loading my VM with Ubuntu Linux and throwing OpenNebula on that.  That works… and I can debug the issue with Gentoo later.

I still have to figure out corosync/heartbeat for two VMs, the one running OpenNebula, and the core router.  For now, the VMs are only set up to run on one node, but I can configure them on the other too… it’s then a matter of configuring libvirt to not start the instances at boot, and setting up the Linux-HA tools to figure out which node gets to fire up which VM.

The VM hosts are still running Gentoo however, and so far I’ve managed to get them to behave with OpenNebula.  A big part was disabling the authentication in libvirt, otherwise polkit generally made a mess of things from OpenNebula’s point of view.

That, and firewalld had to be told to open up ports for VNC/spice… I allocated 5900-6900… I doubt I’ll have that many VMs.

Last weekend I replaced the border router… previously this was a function of my aging web server, but now I have an ex-RAAF-base Advantech UNO-1150G industrial PC which is performing the routing function.  I tried to set it up with Gentoo, and while it worked, I found it wasn’t particularly stable due to limited memory (it only has 256MB RAM).  In the end, I managed to get OpenBSD 6.1/i386 running sweetly, so for now, it’s staying that way.

While the AMD Geode LX800 is no speed demon, a nice feature of this machine is it’s happy with any voltage between 9 and 32V.

The border router was also given the responsibility of managing the domain: I did this by installing ISC BIND9 from ports and copying across the config from Linux.  This seemed to be working, and so I left it.  Big mistake, turns out bind9 didn’t think it was authoritative, and so refused to handle AXFRs with my slaves.

I was using two different slave DNS providers, and Roller Network, both at the time of subscription being freebies.  Turns out, when your DNS goes offline, responds by disabling your domain then emailing you about it.  I received that email Friday morning… and so I wound up in a mad rush trying to figure out why BIND9 didn’t consider itself authoritative.

Since I was in a rush, I decided to tell the border router to just port-forward to the old server, which got things going until I could look into it properly.  It took a bit of tinkering with pf.conf, but eventually got that going, and the crisis was averted.  Re-enabling the domains on worked, and they stayed enabled.

It was at that time I discovered that Roller Network had decided to make their slave DNS a paid offering.  Fair enough, these things do cost money… At first I thought, well, I’ll just pay for an account with them, until I realised their personal plans were US$5/month.  My workplace uses Vultr for hosting instances of their WideSky platform for customers… and aside from the odd hiccup, they’ve been fine.  US$5/month VPS which can run almost anything trumps US$5/month that only does secondary DNS, so out came the debit card for a new instance in their Sydney data centre.

Later I might use it to act as a caching front-end and as a secondary mail exchanger… but for now, it’s a DIY secondary DNS.  I used their ISO library to install an OpenBSD 6.1 server, and managed to nut out nsd to act as a secondary name server.

Getting that going this morning, I was able to figure out my DNS woes on the border router and got that running, so after removing the port forward entries, I was able to trigger my secondary DNS at Vultr to re-transfer the domain and debug it until I got it working.

With most of the physical stuff worked out, it was time to turn my attention to getting virtual instances working.  Up until now, everything running on the VM was through hand-crafted VMs using libvirt directly.  This is painful and tedious… but for whatever reason, OpenNebula was not successfully deploying VMs.  It’d get part way, then barf trying to set up 802.1Q network interfaces.

In the end, I knew OpenNebula worked fine with bridges that were already defined… but I didn’t want to have to hand-configure each VLAN… so I turned to another automation tool in my toolkit… Ansible:

- hosts: compute
  - name: Configure networking
    template: src=compute-net.j2 dest=/etc/conf.d/net
# …
- hosts: compute
# …
  - name: Add symbolic links (instance VLAN interfaces)
    file: src=net.lo dest=/etc/init.d/net.bond0.{{item}} state=link
    with_sequence: start=128 end=193
  - name: Add symbolic links (instance VLAN bridges)
    file: src=net.lo dest=/etc/init.d/net.vlan{{item}} state=link
    with_sequence: start=128 end=193
# …
  - name: Make services start at boot (instance VLAN bridges)
    command: rc-update add net.vlan{{item}} default
    with_sequence: start=128 end=193 

That’s a snippet of the playbook… and it basically creates symbolic links from Gentoo’s net.lo for all the VLAN ports and bridges, then sets them up to start at boot.

In the compute-net.j2 file referenced above, I put in the following to enumerate all the configuration bits.

# Instance VLANs
{% for vlan in range(128,193) %}
{% endfor %}
# …
vlans_bond0="5 8 10{% for vlan in range(128,193) %} {{vlan}} {% endfor %}248 249 250 251 252"
# …
# Instance VLANs
{% for vlan in range(128,193) %}
{% endfor %} 

The start and end ranges are a little off, but it saved a lot of work.

This naturally took a while for OpenRC to bring up… but it worked. Going back to OpenNebula, I told it what bridges to use, and before long I had my first instance… an OpenBSD router to link my personal VLAN to the DMZ.

I spent a bit of time re-working my routing tables after that… in fact, my network is getting big enough now I have to write some details down.  I spent a few hours documenting the effort:

That’s page 1 of about 15… yes my hand is sore… but at least now should I get run over by a bus, others have a fighting chance doing anything with the network without my technical input.

Jul 292017

So, I had a go at getting OpenNebula actually running on my little VM.  Earlier I had installed it to the /opt directory by hand, and today, I tried launching it from that directory.

To get the initial user set up, you have to create the ${ONE_LOCATION}/.one/one_auth (in my case; /opt/opennebula/.one/one_auth) with the username and password for your initial user separated by a colon.  The idea here is that is used to initially create the user, you then change the password once you’re successfully logged in.

That got me a little further, but then it still fails… turns out it doesn’t like some options specified by default in the configuration file.   I commented out some options, and that got me a little further again.  oned started, but then went into lala land, accepting connections but then refusing to answer queries from other tools, leaving them to time out.

I’ve since managed to package OpenNebula into a Gentoo Ebuild, which I have now published in a dedicated overlay.  I was able to automate a lot of the install process this way, but I was still no closer.

On a hunch, I tried installing the same ebuild on my laptop.  Bingo… in mere moments, I was staring at the OpenNebula Sunstone UI in my browser, it was working.  The difference?  My laptop is running Gentoo with the standard glibc C library, not musl.  OpenNebula compiled just fine on musl, but perhaps differences in how musl does threads or some such (musl takes a hard-line POSIX approach) is causing a deadlock.

So, I’m rebuilding the VM using glibc now.  We shall see where that gets us.  At least now I have the install process automated. 🙂

Jul 232017

So, the front-end for OpenNebula will be a VM, that migrates between the two compute nodes in a HA arrangement.  Likewise with the core router, and border router, although I am also tossing up trying again with the little Advantech UNO-1150G I have laying around.

For now, I’ve not yet set up the HA part, I’ll come to that.  There are guides for using libvirt with corosync/heartbeat, most also call up DR:BD as the block device for the VM, but we will not be using this as our block device (Rados Block Device) is already redundant.

To host OpenNebula, I’ll use Gentoo with musl-libc since that’ll shrink the footprint down just a little bit.  We’ll run it on a MariaDB back-end.

Since we’re using musl, you’ll want to install layman and the musl overlay as not all packages build against musl out-of-the-box.  Also install gentoolkit, as you’ll need to set USE flags, and euse makes this easy:

# emerge layman
# layman -L
# layman -a musl
# emerge gentoolkit

Now that some basic packages are installed, we need to install OpenNebula’s prerequisites. They tell you in amongst these is xmlrpc-c. BUT, they don’t tell you that it needs support for abyss: and the scons build system they use will just give you a cryptic error saying it couldn’t find xmlrpc. The answer is not, as suggested, to specify the path to xmlrpc-c-config, which happens to be in ${PATH} anyway, as that will net the same result, and break things later when you fix the real issue.

# euse -p dev-util/xmlrpc-c -E abyss

Now we can build the dependencies… this isn’t a full list, but includes everything that Gentoo ships in repositories, the remaining Ruby gems will have to be installed separately.

# emerge --ask dev-lang/ruby dev-db/sqlite dev-db/mariadb \
dev-ruby/sqlite3 dev-libs/xmlrpc-c dev-util/scons \
dev-ruby/json dev-ruby/sinatra dev-ruby/uuidtools \
dev-ruby/curb dev-ruby/nokogiri

With that done, create a user account for OpenNebula:

# useradd -d /opt/opennebula -m -r opennebula

Now you’re set to build OpenNebula itself:

# tar -xzvf opennebula-5.4.0.tar.gz
# cd opennebula-5.4.0
# scons mysql=yes

That’ll run for a bit, but should succeed. At the end:

# ./install -d /opt/opennebula -u opennebula -g opennebula

There’s about where I’m at now… the link in the README for further documentation is a broken link, here is where they keep their current documentation.

Jul 232017

So, having got some instances going… I thought I better sort out the networking issues proper.  While it was working, I wanted to do a few things:

  1. Bring a dedicated link down from my room into the rack directly for redundancy
  2. Define some more VLANs
  3. Sort out the intermittent faults being reported by Ceph

I decided to tackle (1) first.  I have two 8-port Cisco SG-200 switches linked via a length of Cat5E that snakes its way from our study, through the ceiling cavity then comes up through a small hole in the floor of my room, near where two brush-tail possums call home.

I drilled a new hole next to where the existing cable entered, then came the fun of trying to feed the new cable along side the old one.  First attempt had the cable nearly coil itself just inside the cavity.  I tried to make a tool to grab the end of it, but it was well and truly out of reach.  I ended up getting the job done by taping the cable to a section of fibreglass tubing, feeding that in, taping another section of tubing to that, feed that in, etc… but then I ran out of tubing.

Luckily, a rummage around, and I found some rigid plastic that I was able to tape to the tubing, and that got me within a half-metre of my target.  Brilliant, except I forgot to put a leader cable through for next time didn’t I?

So more rummaging around for a length of suitable nylon rope, tape the rope to the Cat5E, haul the Cat5E out, then grab another length of rope and tape that to the end and use the nylon rope to haul everything back in.

The rope should be handy for when I come to install the solar panels.

I had one 16-way patch panel, so wound up terminating the rack-end with that, and just putting a RJ-45 on the end in my room and plugging that directly into the switch.  So on the shopping list will be some RJ-45 wall jacks.

The cable tester tells me I possibly have brown and white-brown switched, but never mind, I’ll be re-terminating it properly when I get the parts, and that pair isn’t used anyway.

The upshot: I now have a nice 1Gbps ring loop between the two SG-200s and the LGS326 in the rack.  No animals were harmed in the formation of this ring, although two possums were mildly inconvenienced.  (I call that payback for the times they’ve held the Marsupial Olympics at 2AM when I’m trying to sleep!)

Having gotten the physical layer sorted out, I was able to introduce the upstairs SG-200 to the new switch, then remove the single-port LAG I had defined on the downstairs SG-200.  A bit more tinkering going, and I had a nice redundant set-up: setting my laptop to ping one of the instances in the cluster over WiFi, I could unplug my upstairs trunk, wait a few seconds, plug it back in, wait some more, unplug the downstairs trunk, wait some more again, then plug in back in again, and not lose a single ICMP packet.

I moved my two switches and my AP over to the new management VLAN I had set up, along side the IPMI interfaces on the nodes.  The SG-200s were easy, aside from them insisting on one port being configured with a PVID equal to the management VLAN (I guess they want to ensure you don’t get locked out), it all went smoothly.

The AP though, a Cisco WAP4410N… not so easy.  In their wisdom, and unlike the SG-200s, the management VLAN settings page is separate from the IP interface page, so you can’t change both at the same time.  I wound up changing the VLAN, only to find I had locked myself out of it.  Much swearing at the cantankerous AP and wondering how could someone overlook such a fundamental requirement!  That, and the switch where the AP plugs in, helpfully didn’t add the management VLAN to the right port like I asked of it.

Once that was sorted out, I was able to configure an IP on the old subnet and move the AP across.

That just left dealing with the intermittent issues with Ceph.  My original intention with the cluster was to use 802.3AD so each node had two 2Gbps links.  Except: the LGS326-AU only supports 4 LAGs.  For me to do this, I need 10!

Thankfully, the bonding support in the Linux kernel has several other options available.  Switching from 802.3ad to balance-tlb, resolved the issue.

slaves_bond0="enp0s20f0 enp0s20f1"
slaves_bond1="enp0s20f2 enp0s20f3"
rc_net_bond0_need="net.enp0s20f0 net.enp0s20f1"
rc_net_bond1_need="net.enp0s20f2 net.enp0s20f3"

I am now currently setting up a core router instance (with OpenBSD 6.1) and a OpenNebula instance (with Gentoo AMD64/musl libc).