linux

Demise of classic hardware

So, this year I had a new-year’s resolution of sorts… when we first started this “work from home” journey due to China’s “gift”, I just temporarily set up on the dinner table, which was of course, meant to be another few months.

Well, nearly 2 years later, we’re still working from home, and work has expanded to the point that a move to the office, on any permanent basis, is pretty much impossible now unless the business moves to a bigger building. With this in mind, I decided I’d clear off the dinner table, and clean up my room sufficiently to set up my workstation in there.

That meant re-arranging some things, but for the most part, I had the space already. So some stuff I wasn’t using got thrown into boxes to be moved into the garage. My CD collection similarly got moved into the garage (I have it on the computer, but need to retain the physical discs as they represent my “personal use license”), and lo and behold, I could set up my workstation.

The new workspace

One of my colleagues spotted the Indy and commented about the old classic SGI logo. Some might notice there’s also an O2 lurking in the shadows. Those who have known me for a while, will remember I did help maintain a Linux distribution for these classic machines, among others, and had a reasonable collection of my own:

My Indy, O2 and Indigo2 R10000
The Octane, booting up

These machines were all eBay purchases, as is the Sun monitor pictured (it came with the Octane). Sadly, fast forward a number of years, and these machines are mostly door stops and paperweights now.

The Octane’s and Indigo2’s demises

The Octane died when I tried to clean it out with a vacuum cleaner, without realising the effect of static electricity generated by the vacuum cleaner itself. I might add mine was a particularly old unit: it had a 175MHz R10000 CPU, and I remember the Linux kernel didn’t recognise the power management circuitry in the PSU without me manually patching it.

The Indigo2 mysteriously stopped working without any clear reason why, I’ve never actually tried to diagnose the issue.

That left the Indy and the O2 as working machines. I haven’t fired them up in a long time until today. I figured, what the hell, do they still run?

Trying the Indy and O2 out

Plug in the Indy, hit the power button… nothing. Dead as a doornail. Okay, put it aside… what about the O2?

I plug it in, shuffle it closer to the monitor so I can connect it. ‘Lo and behold:

The O2 lives!

Of course, the machine was set up to use a serial console as its primary interface, and boot-up was running very slow.

Booting up… very slowly…

It sat there like that for a while, figuring the action was happening on a serial port, I went to go get my null modem cable, only to find a log-in prompt by the time I got back.

Next was remembering what password I was using when I last ran this machine. We had the OpenSSL heartbleed vulnerability happen since then, and at about that time, I revoked all OpenPGP keys and changed all passwords, so it isn’t what I use today. I couldn’t get in as root, but my regular user account worked, and I was able to change the root password via sudo.

Remembering my old log-in credentials, from 22 years ago it seems

The machine soon crashed after that. I tried rebooting, this time I tweaked some PROM settings (and yes, I was rusty remembering how to do it) to be able to see what was going. (I had the null modem cable in hand, but didn’t feel like trying to blindly find the serial port at the back of my desktop.)

Changing PROM settings
The subsequent boot, and crash

Evidently, I had a dud disk. This did not surprise me in the slightest. I also noticed the PSU fan was not spinning, possibly seized after years of non-use.

Okay, there were two disks installed in this machine, both 80-pin SCA SCSI drives. Which one was it? I took a punt and tried the one furtherest away from the I/O ports.

Success, she boots now

I managed to reset the root password, before the machine powered itself off (possibly because of overheating). I suspect the machine will need the dust blown out of it (safely! — not using the method that killed the Octane!), and the HDDs will need replacements. The guilty culprit was this one (which I guessed correctly first go):

a 4GB HDD was a big drive back in 1998!

The computer I’m typing this on has a HDD that stores 1000 of these drives. Today, there are modern alternatives, such as SCSI2SD that could get this machine running fully if needed. The tricky bit would be handling the 80-pin hot-swap interface. There’d be some hardware hacking needed to connect the two, but AU$145 plus an adaptor seems like a safer bet than trying some random used HDD.

So, replacement for the HDDs, a clean-out, and possibly a new fan or two, and that machine will be back to “working” state. Of course the Linux landscape has moved on since then, Debian no longer support the MIPS4 ISA that the RM5200 CPU understands, Gentoo still could work on this though, and maybe OpenBSD still support this too. In short, this machine is enough of a “go-er” that it should not be sent to land-fill… yet.

Turning my attention back to the Indy

So the Indy was my first SGI machine. Bought to better understand the MIPS processor architecture, and perhaps gain enough understanding to try and breathe life into a Cobalt Qube II server appliance (remember those?), it did teach me a lot about Linux and how things vary between platforms.

I figured I might as well pop the cover and see if there’s anything “obviously” wrong. The procedure I was rusty on, but I recalled there was a little catch on the back of the case that needed to be release before the cover slid off. So I lugged the 20″ CRT off the top of the machine, pulled the non-functioning Indy out, and put it on the table to inspect further.

Upon trying to pop the cover (gently!), the top of the case just exploded. Two pieces of the top cover go flying, and the RF shield parts company with the cover’s underside.

The RF shield parted company with the underside of the lid

I was left with a handful of small plastic fragments that were the heat-set posts holding the RF shield to the inside of the lid.

Some of the fragments that once held the RF shield in place

Clearly, the plastic has become brittle over the years. These machines were released in 1993, I think this might be a 1994-model as it has a slightly upgraded R4600 CPU in it.

As to the machine itself, I had a quick sticky-beak, there didn’t seem to be any immediately obvious things, but to be honest, I didn’t do a very thorough check. Maybe there’s some corrosion under the motherboard I didn’t spot, maybe it’s just a blown fuse in the PSU, who knows?

The inside of the Indy

This particular machine had 256MB RAM (a lot for its day), 8-bit Newport XL graphics, the “Indy Presenter” LCD interface (somewhere, we have the 15″ monitor it came with — sadly the connecting cable has some damaged conductors), and the HDD is a 9.1GB HDD I added some time back.

Where to now?

I was hanging on to these machines with the thinking that someone who was interested in experimenting with RISC machines might want them — find them a new home rather than sending them to landfill. I guess that’s still an option for the O2, as it still boots: so long as its remaining HDD doesn’t die it’ll be fine.

For the others, there’s the possibility of combining bits to make a functional frankenmachine from lots of parts. The Indy will need a new PROM battery if someone does manage to breathe life into it.

The Octane had two SCSI interfaces, one of which was dead — a problem that was known-of before I even acquired it. The PROM would bitch and moan about the dead SCSI interface for a good two minutes before giving up and dumping you in the boot menu. Press 1, and it’d hand over to arcload, which would boot a Linux kernel from a disk on the working controller. Linux would see the dead SCSI controller, and proceed to ignore it, booting just fine as if nothing had happened.

The Indigo2 R10000 was always the red-hedded step child: an artefact of the machine’s design. The IP22 design (Indy and Indigo2) was never designed with the intent of being used with a R10000 CPU, and the speculative execution features played havoc with caching on this design. The Octane worked fine because it was designed from the outset to run this CPU. The O2 could be made to work because of a quirk of its hardware design, but the Indigo2 was not so flexible, so kernel-space code had to hack around the problem in software.

I guess I’d still like to see the machines go to a good home, but no idea who that’d be in this day and age. Somewhere, I have a partial disc set of Irix 6.5, and there’s also a 20″ SGI GDM5410 monitor (not the Sun monitor pictured above) that, at last check, did still work.

It’ll be a sad day when these go to the tip.

6LoWHAM: Digging into LinBPQ

So today I was meant to be helping re-build a deck, but that got postponed to next weekend.  Thus, I had an extra free day I wasn’t counting on.

I wound up looking at LinBPQ in detail, to see if I can get it to run.  I downloaded the sources, and sure enough, they do compile on my x86-64 laptop, but does it work?  Not a chance.  Starts parsing the configuration file, then boompa, SEGFAULT.

I run the binary through gdb, and see this:

GNU gdb (Gentoo 8.1 p1) 8.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/stuartl/projects/6lowham/linbpq/linbpq...done.
(gdb) r
Starting program: /home/stuartl/projects/6lowham/linbpq/linbpq 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
G8BPQ AX25 Packet Switch System Version 6.0.17.1 November 2018
Copyright � 2001-2018 John Wiseman G8BPQ
Current Directory is /var/lib/linbpq

Configuration file Preprocessor.
Using Configuration file /var/lib/linbpq/bpq32.cfg
Conversion (probably) successful


Program received signal SIGSEGV, Segmentation fault.
0x00005555555f8a7b in Start () at cMain.c:1190
1190                    *(ptr3++) = *(ptr2++);
(gdb) bt full
#0  0x00005555555f8a7b in Start () at cMain.c:1190
        cfg = 0x555555b91c40
        ptr1 = 0x555555ba60c0
        PORT = 0x5555558f6aa0 
        FULLPORT = 0x558f7928
        NEXTPORT = 0x5555558f6de0 <DATAAREA+832>
        EXTPORT = 0x7ffff6eb7953 <_IO_file_overflow+291>
        APPL = 0x5555558f49e0 
        ROUTE = 0x559085e8
        DEST = 0x870b07e2ddd5f300
        CMD = 0x5555558d79e0 
        PortSlot = 2
        ptr2 = 0x555555ba6849 "K4MSL Test station \r"
        ptr3 = 0x55912549 
        ptr4 = 0x5555558d7183 <COMMANDS+1667> "         \003"
        CWPTR = 0x5555558f6b18 <DATAAREA+120>
        i = 0
        n = 119
        int3 = 1435466024
#1  0x000055555563e35c in main (argc=1, argv=0x7fffffffe518) at LinBPQ.c:598
        i = 1
        user = 0x0
        conn = 0x7ffff7ffa298
        STAT = {st_dev = 140737354131120, st_ino = 140737488347784, st_nlink = 140737488347780, st_mode = 4160741648, 
          st_uid = 32767, st_gid = 4143745959, __pad0 = 32767, st_rdev = 140737488348192, st_size = 140737488347784, 
          st_blksize = 1700966438, st_blocks = 26577600, st_atim = {tv_sec = 140737354113688, tv_nsec = 140737488348000}, 
          st_mtim = {tv_sec = 140737354113448, tv_nsec = 140737488347780}, st_ctim = {tv_sec = 140737488347984, 
            tv_nsec = 140737354131160}, __glibc_reserved = {1, 4150715120, 0}}
        PORTVEC = 0x7ffff7ffe6b0

Ookay then… so invalid pointers, what fun!  More to the point, have a close look at the underlined addresses… I’m beginning to understand why it was called BPQ32.

The culprit for this wound up being little gems like this:

			//	Round to word boundary (for ARM5 etc)

			int3 = (int)ptr3;
			int3 += 3;
			int3 &= 0xfffffffc;
			ptr3 = (UCHAR *)int3;

There were a few other instances of this, and variations on the theme too, but one way or the other, linbpq basically assumes that all pointers are 32-bits, and so are ints.

Four hours later, I finally had something that started, but there are probably lots of landmines for anyone running the binary to inadvertently stomp on.  The code is pointer-arithmetic city!  Much of the time, code is casting pointers to unsigned int, or back again.  If I submitted code like that at work, they’d have me hauled ’round the back of the building and shot!

I’m left wondering if it’s worth getting to understand, or should I shove it in a VM, write some code based on my understanding of the protocols, do some integration testing with it, then abandon LinBPQ for something I can have confidence in.

The use and re-use of certain variables makes me wonder if the code is actually a port from the DOS-based BPQCode which was likely written in 8086 assembler.  This would make a lot of sense as to why I’m seeing the sorts of software coding patterns I’m seeing in that code.  The logic seems to have been ported to C just enough to get it to compile and work like the assembly version.

Reasonable enough… but there’s a lot of technical debt there still waiting to be paid back.  On paper, there’s a lot of benefit in using LinBPQ as the back-end, and I am thankful that John Wiseman made the decision to release the code under the GPLv3 so that I can at least investigate the possibility of using that code here.

I’ve thrown what I’ve got up on Github for now, and there’s a Gentoo overlay for installing it.  Add the overlay and run emerge linbpq, and you should find yourself with an installation of LinBPQ that just needs some OpenRC scripts and some work with an editor on /var/lib/linbpq/bpq32.cfg to get going.

If I get further on the code front, I might look at some init scripts, both OpenRC and systemd ones, then I can produce a few Debian binaries so you can run apt-get install linbpq on your Raspberry Pi and have a packet station going quickly.

Rescuing a btrfs partition from a failed “move”

Today it seems, the IT gremlins have been out to get me.  At my work I have a desktop computer (personal hardware) consisting of a Rysen 7 1700, 16GB RAM, a 240GB Intel M.2 SATA SSD (540 series) and a 4TB Western Digital HDD.

The machine has been, pretty reliable, not rock-solid, in particular, compiling gcc sometimes segfaulted for reasons unknown (the RAM checks out okay according to memtest86), but for what I was doing, it mostly ran fine.  I put up with the minor niggles with the view of solving those another day.  Today though, I come in and find X has crashed.

Okay, no big deal, re-start the display manager, except that crashed too.

Hmm, okay, log in under my regular user account and try startx:  No dice, there’s no room on /.

Ahh, that might explain a few things, we clean up some log files, truncate a 500MB file, manage to free up 50GB (!).

The machine dual-boots two OSes: Debian 9 and Gentoo.  It’s been running the latter for about 12 months now, I used Debian 9 to get things rolling so I could use the machine at work (did try Ubuntu 16.04, but it didn’t like my machine), and later, used that to get Gentoo running before switching over.  So there was a 40GB partition on the SSD that had a year-old install of Debian that wasn’t being used.  I figured I’d ditch it, and re-locate my Gentoo partition to occupy that space.

So I pull out an Ubuntu 18.04 disc, boot that up, and get gparted going.  It’s happily copying, until WHAM, I was hit with an I/O error:

Failed re-location of partition (click to enlarge)

Clicking any of the three buttons resulted in the same message.  Brilliant.  I had just copied over the first 15GB of the partition, so the Debian install would be hosed (I was deleting it anyway), but my Gentoo root partition should still be there intact at its old location.  Of course the partition table was updated, so no rolling back there.  At this point, I couldn’t do anything with the SSD, it had completely stalled, and I just had to cut my losses and kill gparted.

I managed to make some room on the 4TB drive shuffling some partitions around so I could install Ubuntu 18.04 there.  My /home partition was btrfs on the 4TB drive (first partition), the rest of that drive was LVM.  I just shrank my /home down by 40GB and slipped it in there.  The boot-loader didn’t install (no EFI partition), but who cares, I know enough of grub to boot from the DVD and bootstrap the machine that way.  At first it wouldn’t boot because in their wisdom, they created the root partition with a @ subvolume.  I worked around that by making the @ subvolume the default.

Then there was momentary panic when the /home partition I had specified lacked my usual files.  Turned out, they had created a @home subvolume on my existing /home partition.  Why? Who knows?  Debian/Ubuntu seem to do strange things with btrfs which do nothing but complicate matters and I do not understand the reasoning.  Editing /etc/fstab to remove the subvolume argument for /home and re-booting fixed that.

I set up a LVM volume that would receive a DD dump of the mangled partition to see what could be saved.  GNU’s ddrescue managed to recover most of the raw partition, and so now I just had to find where the start was.  If I had the output of fdisk -l before I started, I’d be right, but I didn’t have that foresight.  (Maybe if I had just formatted a LVM volume and DD’d the root fs before involving gparted?  Never mind!)

I figured there’d be some kind of magic bytes I could “grep” for.  Something that would tell me “BTRFS was here”.  Sure enough, the information is stashed in the superblock.  At 0x00010040 from the start of the partition, I should see the magic bytes 5f 42 47 52 66 53 5f 4d.  I just needed to grep for these.  To speed things up I made an educated guess on the start-location.  The screenshot says the old partition was about 37.25GB in size, so that was a hint to maybe try skipping that bit and see what could be seen.

Sure enough, I found what looked to be the superblock:

root@vk4msl-ws:~# dd if=/dev/mapper/scratch-rootbackup skip=38100 count=200 bs=1M | hexdump -C | grep '5f 42 48 52 66 53 5f 4d'
02e10040  5f 42 48 52 66 53 5f 4d  9d 30 0d 02 00 00 00 00  |_BHRfS_M.0......|
06e00040  5f 42 48 52 66 53 5f 4d  9d 30 0d 02 00 00 00 00  |_BHRfS_M.0......|
200+0 records in
200+0 records out

Some other probes seem to confirm this, my quarry seemed to start 38146MB into the now-merged partition.  I start copying that to a new LVM volume with the hope of being able to mount it:

root@vk4msl-ws:~# dd if=/dev/mapper/scratch-rootbackup of=/dev/mapper/scratch-gentoo--root bs=1M skip=38146

Whilst waiting for this to complete, I double-checked my findings, by inspecting the other fields. From the screenshot, I know my filesystem UUID was 6513-682e-7182-4474-89e6-c0d1c71866ad. Looking at the superblock, sure enough I see that listed:

root@vk4msl-ws:~# dd if=/dev/scratch/gentoo-root bs=$(( 0x10000 )) skip=1 count=1 | hexdump -C
1+0 records in
1+0 records out
00000000  5f f9 98 90 00 00 00 00  00 00 00 00 00 00 00 00  |_...............|
65536 bytes (66 kB, 64 KiB) copied, 0.000116268 s, 564 MB/s
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  65 13 68 2e 71 82 44 74  89 e6 c0 d1 c7 19 66 ad  |e.h.q.Dt......f.|
00000030  00 00 01 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
00000040  5f 42 48 52 66 53 5f 4d  9d 30 0d 02 00 00 00 00  |_BHRfS_M.0......|
00000050  00 00 32 da 32 00 00 00  00 00 02 00 00 00 00 00  |..2.2...........|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Looks promising! After an agonising wait, the dd finishes. I can check the filesystem:

root@vk4msl-ws:~# btrfsck /dev/scratch/gentoo-root 
Checking filesystem on /dev/scratch/gentoo-root
UUID: 6513682e-7182-4474-89e6-c0d1c71966ad
checking extents
checking free space cache
block group 111690121216 has wrong amount of free space
failed to load free space cache for block group 111690121216
block group 161082245120 has wrong amount of free space
failed to load free space cache for block group 161082245120
checking fs roots
checking csums
checking root refs
found 107544387643 bytes used, no error found
total csum bytes: 99132872
total tree bytes: 6008504320
total fs tree bytes: 5592694784
total extent tree bytes: 271663104
btree space waste bytes: 1142962475
file data blocks allocated: 195274670080
 referenced 162067775488

Okay, it complained that the free space was wrong (which I’ll blame on gparted prematurely growing the partition), but the data is there!  This is confirmed by mounting the volume and doing a ls:

root@vk4msl-ws:~# mount /dev/scratch/gentoo-root /mnt/
root@vk4msl-ws:~# ls /mnt/ -l
total 4
drwxr-xr-x 1 root root 1020 Oct  7 14:13 bin
drwxr-xr-x 1 root root   18 Jul 21  2017 boot
drwxr-xr-x 1 root root   16 May 28 10:29 dbus-1
drwxr-xr-x 1 root root 1686 May 31  2017 dev
drwxr-xr-x 1 root root 3620 Oct 19 18:53 etc
drwxr-xr-x 1 root root    0 Jul 14  2017 home
lrwxrwxrwx 1 root root    5 Sep 17 09:20 lib -> lib64
drwxr-xr-x 1 root root 1156 Oct  7 13:59 lib32
drwxr-xr-x 1 root root 4926 Oct 13 05:13 lib64
drwxr-xr-x 1 root root   70 Oct 19 11:52 media
drwxr-xr-x 1 root root   28 Apr 23 13:18 mnt
drwxr-xr-x 1 root root  336 Oct  9 07:27 opt
drwxr-xr-x 1 root root    0 May 31  2017 proc
drwx------ 1 root root  390 Oct 22 06:07 root
drwxr-xr-x 1 root root   10 Jul  6  2017 run
drwxr-xr-x 1 root root 4170 Oct  9 07:57 sbin
drwxr-xr-x 1 root root   10 May 31  2017 sys
drwxrwxrwt 1 root root 6140 Oct 22 06:07 tmp
drwxr-xr-x 1 root root  304 Oct 19 18:20 usr
drwxr-xr-x 1 root root  142 May 17 12:36 var
root@vk4msl-ws:~# cat /mnt/etc/gentoo-release 
Gentoo Base System release 2.4.1

Yes, I’ll be backing this up properly RIGHT NOW. But, my data is back, and I’ll be storing this little data recovery technique for next time.

The real lesson here is:

  1. KEEP RELIABLE BACKUPS! You never know when something will fail.
  2. Catch the copy process before it starts overwriting your source data! If there’s no overlap between the old and new locations, you’re fine, but if there is and it starts overwriting the start of your original volume, it’s all over red rover! You might be lucky with a superblock back-up, but don’t bet on it!
  3. Make note of the filesystem type and its approximate location. The fact that I knew roughly where to look, and what sort of filesystem I was looking for meant I could look for magic bytes that say “I’m a BTRFS filesystem”. The magic bytes for EXT4, XFS, etc will differ, but the same concepts are there, you just have to look up the documentation on how your particular filesystem structures its data.

Solar Cluster: Kernel driver now up on Github

So, I’m happy enough with the driver now that I’ll collapse down the commits and throw it up onto the Github repository.  I might take another look at kernel 4.18, but for now, you’ll find them on the ts7670-4.14.67 branch.

Two things I observe about this voltage monitor:

  1. The voltage output is not what you’d call, accurate.  I think it’s only a 10-bit ADC, which is still plenty good enough for this application, but the reading I think is “high” by about 50mV.
  2. There’s significant noise on the reading, with noticeable quantisation steps.

Owing to these, and to thwart the possibility of using this data in side-channel attacks using power analysis, I’ve put a 40-sample moving-average filter on the “public” data.

Never the less, it’s a handy party trick, and not one I expected these devices to be able to do.  My workplace manages a big fleet of these single-board computers in the residential towers at Barangaroo where they spend all day polling Modbus and M-Bus meters.  In the event we’re at all suspicious about DC power supplies though, it’s a simple matter to load this kernel tree (they already run U-Boot) and configure collectd (which is also installed).

I also tried briefly switching off the mains power to see that I was indeed reading the battery voltage and not just a random number that looked like the voltage.  That yielded an interesting effect:

You can see where I switched the mains supply off, and back on again.  From about 8:19PM the battery voltage predictably fell until about 8:28PM where it was at around 12.6V.

Then it did something strange, it rose about 100mV before settling at 12.7V.  I suspect if I kept it off all night it’d steadily decrease: the sun has long set.  I’ve turned the mains charger back on now, as you can see by the step-rise shortly after 8:44PM.

The bands on the above chart are the alert zones.  I’ll get an email if the battery voltage strays outside of that safe region of 12-14.6V.  Below 12V, and I run the risk of deep-cycling the batteries.  Above 14.6V, and I’ll cook them!

The IPMI BMCs on the nodes already sent me angry emails when the battery got low, so in that sense, Grafana duplicates that, but does so with pretty charts.  The BMCs don’t see when the battery gets too high though, for the simple matter that what they see is regulated by LDOs.

Solar Cluster: Getting the battery voltage into Grafana

I’ve succeeded in getting a working battery monitor kernel module. This is basically taking the application note by Technologic Systems and spinning that into a power supply class driver that reports the voltage via sysfs.

As it happens, the battery module in collectd does not see this as a “battery”, something I’ll look at later. For now the exec plug-in works well enough. This feeds through eventually to an InfluxDB database with Grafana sitting on top.

https://netmon.longlandclan.id.au/d/IyZP-V2mk/battery-voltage?orgId=1

Solar Cluster: Kernel driver debugging

So, I successfully last night, parted the core bits out of ts_wdt.c and make ts-mcu-core.c.  This is a multi-function device, and serves to provide a shared channel for the two drivers that’ll use it.

Tonight, I took a stab at writing the PSU part of it.  Suffice to say, I’ve got work to do:

[  158.712960] Unable to handle kernel NULL pointer dereference at virtual address 00000005
[  158.721328] pgd = c3854000
[  158.724089] [00000005] *pgd=4384f831, *pte=00000000, *ppte=00000000
[  158.730629] Internal error: Oops: 1 [#3] ARM
[  158.734947] Modules linked in: 8021q garp mrp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables flexcan can_dev
[  158.755812] CPU: 0 PID: 2059 Comm: cat Tainted: G      D         4.14.67-vrt-ts7670+ #3
[  158.763840] Hardware name: Freescale MXS (Device Tree)
[  158.769008] task: c68f3a20 task.stack: c3846000
[  158.773598] PC is at ts_mcu_transfer+0x1c/0x48
[  158.778073] LR is at 0x3
[  158.780630] pc : []    lr : [<00000003>]    psr: 60000013
[  158.786918] sp : c3847e44  ip : 00000000  fp : 014000c0
[  158.792165] r10: c5035000  r9 : c5305900  r8 : c777b428
[  158.797412] r7 : c0a7fa80  r6 : c777b400  r5 : c5035000  r4 : c3847e6c
[  158.803961] r3 : c3847e58  r2 : 00000001  r1 : c3847e4c  r0 : c07b8c68
[  158.810512] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  158.817671] Control: 0005317f  Table: 43854000  DAC: 00000051
[  158.823440] Process cat (pid: 2059, stack limit = 0xc3846190)
[  158.829212] Stack: (0xc3847e44 to 0xc3848000)
[  158.833611] 7e40:          c05bd150 00000001 00010000 00000004 c3847e48 00000150 c5035000
[  158.841833] 7e60: c777b420 c05bcaa4 c4f70c60 c777e070 c0a7fa80 c05bca20 00000fff c07ad4b0
[  158.850051] 7e80: c777b428 c04e46dc c4f52980 00001000 00000fff c01b1994 c4f52980 c4f70c60
[  158.858268] 7ea0: c3847ec8 ffffe000 00000000 c3847f88 00000001 c0164eac c3847fb0 c4f529b0
[  158.866487] 7ec0: 00020000 b6e3d000 00000000 00000000 c4f73f70 00000800 00000000 c01b0f60
[  158.874703] 7ee0: 00020000 c4f70c60 ffffe000 c3847f88 00000000 00000000 00000000 c013eb84
[  158.882918] 7f00: 000291ac 00000000 00000000 c0009344 00000077 b6e3c000 00000022 00000022
[  158.891135] 7f20: c686bdc0 c0117838 000b6e3c c3847f80 00022000 c686be14 b6e3c000 00000000
[  158.899354] 7f40: 00000000 00022000 b6e3d000 00020000 c4f70c60 ffffe000 c3847f88 c013ed0c
[  158.907571] 7f60: 00000022 00000000 000b6e3c c4f70c60 c4f70c60 b6e3d000 00020000 c000a9e4
[  158.915786] 7f80: c3846000 c013f2d8 00000000 00000000 00000000 00000000 00000000 00000000
[  158.924002] 7fa0: 00000003 c000a820 00000000 00000000 00000003 b6e3d000 00020000 00000000
[  158.932217] 7fc0: 00000000 00000000 00000000 00000003 00020000 00000000 00000001 00000000
[  158.940434] 7fe0: be8a62c0 be8a62ac b6eb77c4 b6eb6b9c 60000010 00000003 00000000 00000000
[  158.948691] [] (ts_mcu_transfer) from [] (ts_psu_get_prop+0x38/0xb0)
[  158.956847] [] (ts_psu_get_prop) from [] (power_supply_show_property+0x84/0x220)
[  158.966036] [] (power_supply_show_property) from [] (dev_attr_show+0x1c/0x48)
[  158.974974] [] (dev_attr_show) from [] (sysfs_kf_seq_show+0x84/0xf0)
[  158.983129] [] (sysfs_kf_seq_show) from [] (seq_read+0xcc/0x4f4)
[  158.990930] [] (seq_read) from [] (__vfs_read+0x1c/0x11c)
[  158.998117] [] (__vfs_read) from [] (vfs_read+0x88/0x158)
[  159.005304] [] (vfs_read) from [] (SyS_read+0x3c/0x90)
[  159.012232] [] (SyS_read) from [] (ret_fast_syscall+0x0/0x28)
[  159.019766] Code: e52de004 e281300c e590e004 e25cc001 (e1dee0b2) 
[  159.026278] ---[ end trace 2807dc313991fd87 ]---

The good news is the machine didn’t crash.c

Solar Cluster: Battery voltage kernel module

So, I had a brief look after getting kernel 4.18.5 booting… sure enough the problem was I had forgotten the watchdog, although I did see btrfs trigger a deadlock warning, so I may not be out of the woods yet.  I’ve posted the relevant kernel output to the linux-btrfs list.

Anyway, as it happens, that watchdog driver looks like it’ll need some re-factoring as a multi-function device.  At the moment, ts-wdt.c claims it based on this binding.

If I try to add a second driver, they’ll clash, and I expect the same if I try to access it via userspace.  So the sensible thing to do here, is to add a ts-companion.c MFD driver here, then re-factor ts-wdt.c to use it.  From there, I can write a ts-psu.c module which will go right here.

I think I’ll definitely be digging into those older sources to remind myself how that all worked.

Solar Cluster: Battery monitor PC now running Gentoo

So, after some argument, and a bit of sitting on a concrete floor with the netbook, I managed to get Gentoo loaded onto the TS-7670.  Right now it’s running off the MicroSD card, I’ll get things right, then shift it across to eMMC.

ts7670 ~ # emerge --info
Portage 2.3.40 (python 3.5.5-final-0, default/linux/musl/arm/armv7a, gcc-6.4.0, musl-1.1.19, 4.14.15-vrt-ts7670-00031-g1a006273f907-dirty armv5tejl)
=================================================================
System uname: Linux-4.14.15-vrt-ts7670-00031-g1a006273f907-dirty-armv5tejl-ARM926EJ-S_rev_5_-v5l-with-gentoo-2.4.1
KiB Mem:      111532 total,     13136 free
KiB Swap:    4194300 total,   4191228 free
Timestamp of repository gentoo: Fri, 17 Aug 2018 16:45:01 +0000
Head commit of repository gentoo: 563622899f514c21f5b7808cb50f6e88dbd7d7de
sh bash 4.4_p12
ld GNU ld (Gentoo 2.30 p2) 2.30.0
app-shells/bash:          4.4_p12::gentoo
dev-lang/perl:            5.24.3-r1::gentoo
dev-lang/python:          2.7.14-r1::gentoo, 3.5.5::gentoo
dev-util/pkgconfig:       0.29.2::gentoo
sys-apps/baselayout:      2.4.1-r2::gentoo
sys-apps/openrc:          0.34.11::gentoo
sys-apps/sandbox:         2.13::musl
sys-devel/autoconf:       2.69-r4::gentoo
sys-devel/automake:       1.15.1-r2::gentoo
sys-devel/binutils:       2.30-r2::gentoo
sys-devel/gcc:            6.4.0-r1::musl
sys-devel/gcc-config:     1.8-r1::gentoo
sys-devel/libtool:        2.4.6-r3::gentoo
sys-devel/make:           4.2.1::gentoo
sys-kernel/linux-headers: 4.13::musl (virtual/os-headers)
sys-libs/musl:            1.1.19::gentoo
Repositories:

gentoo
    location: /usr/portage
    sync-type: rsync
    sync-uri: rsync://virtatomos.longlandclan.id.au/gentoo-portage
    priority: -1000
    sync-rsync-verify-jobs: 1
    sync-rsync-extra-opts: 
    sync-rsync-verify-metamanifest: yes
    sync-rsync-verify-max-age: 24

ACCEPT_KEYWORDS="arm"
ACCEPT_LICENSE="* -@EULA"
CBUILD="arm-unknown-linux-musleabi"
CFLAGS="-Os -pipe -march=armv5te -mtune=arm926ej-s -mfloat-abi=soft"
CHOST="arm-unknown-linux-musleabi"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-Os -pipe -march=armv5te -mtune=arm926ej-s -mfloat-abi=soft"
DISTDIR="/home/portage/distfiles"
ENV_UNSET="DBUS_SESSION_BUS_ADDRESS DISPLAY PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"
FCFLAGS="-O2 -pipe -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard"
FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync multilib-strict news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard"
GENTOO_MIRRORS=" http://virtatomos.longlandclan.id.au/portage http://mirror.internode.on.net/pub/gentoo http://ftp.swin.edu.au/gentoo http://mirror.aarnet.edu.au/pub/gentoo"
INSTALL_MASK="charset.alias"
LANG="en_AU.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
USE="arm bindist cli crypt cxx dri fortran iconv ipv6 modules ncurses nls nptl openmp pam pcre readline seccomp ssl tcpd unicode xattr zlib" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon plan sheets stage words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="musl" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="libinput keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-6 php7-0" POSTGRES_TARGETS="postgres9_5 postgres10" PYTHON_SINGLE_TARGET="python3_6" PYTHON_TARGETS="python2_7 python3_6" RUBY_TARGETS="ruby23" USERLAND="GNU" VIDEO_CARDS="dummy fbdev v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CC, CPPFLAGS, CTARGET, CXX, EMERGE_DEFAULT_OPTS, LC_ALL, LINGUAS, MAKEOPTS, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

I still have to update the kernel.  I actually did get kernel 4.18 to boot, but I forgot to add in support for the watchdog, so U-Boot tickled it, then the watchdog got hungry and kicked the reset half way through the boot sequence.

Rolling back to my older 4.14 kernel works.  I’ll try again with 4.18.5 in a moment.  Failing that, I have also brought the 4.14 patches up to 4.14.69 which is the latest LTS release of the kernel.

I’ve started looking at the power supply sysfs device class, with a view to exposing the supply voltage via sysfs.  The thinking here is that collectd supports reading this via the “battery” module (and realistically, it is a battery that is being measured: two 105Ah AGMs).

Worst case is I do something a little proprietary and deal with it in user space.  I’ll have to dig up the Linux kernel tree I did for Jacques Electronics all those years ago, as that had some examples of interfacing sysfs to a Cypress PSOC device that was acting as an I²C slave.  Rather than using an off-the-shelf solution, they programmed up a MCU that did power management, touchscreen sensing, keypad sensing, RGB LED control and others, all in one chip.  (Fun to try and interface that to the Linux kernel.)

Technologic Systems appear to have done something similar.  The device ID 0x78 implies a 10-bit device, but I think they’re just squatting on that 7-bit address.  They hail 0x78 then read out 4 bytes, which the last two bytes are the supply voltage ADC readings.  They do their own byte swapping before scaling the value to get mV.

Solar Cluster: Gentoo Linux MUSL for ARMv5 finally built

It’s taken several months and had a few false starts, but at long last I have some stage tarballs for Gentoo Linux MUSL for ARMv5 processors.  I’m not the only one wanting such a port, even looking for my earlier thread on the matter, I stumbled on this post.  (Google translate is hopeless with Russian, but I can get the gist of what’s being said.)

This was natively built on the TS-7670 using an external hard drive connected over USB 2.0 as both swap and the chroot.  It took two passes to clean everything up and get something that’s in a “release-able” state.

I think my next steps now will be:

  • Build an updated kernel … I might see if I can expose that I²C register via a sysfs file or something that collectd can pick up whilst I’m at it.  I have the kernel sources and bootloader sources.
  • Prepare the 32GB MicroSD card I bought a few weeks back with the needed partitions and load Gentoo onto that.
  • Install the MicroSD card and boot off it.
  • Back up the eMMC
  • Re-format the eMMC and copy the MicroSD card to it.

It’s supposed to be wet this weekend, so it sounds like a good project for indoors.

Progress reporting in Gentoo

I have a bad habit where it comes to updating systems, I tend to do it less frequently than I should, and that can sometimes snowball like it has for my mail server.  Even if it’s a fresh install, sometimes there’s a large number of packages that need installing.

Now Portage does report where it’s up to, but often that has long scrolled past the buffer on your terminal.  You can look at /var/log/emerge.log for this information, but sometimes it’s nice to just see a percentage progress and a pseudo graphical representation.

With this in mind, I cooked up a little script which just tails /var/log/emerge.log and displays a progress bar along with the last message reported. The script is quite short:

#!/bin/bash

shopt -s checkwinsize

stdbuf -o L tail -n 0 -F /var/log/emerge.log | while read line; do
	changed=0
	eval $( echo ${line} | \
		sed -ne '/[0-9]\+ of [0-9]\+/ { s:^.*(\([0-9]\+\) of \([0-9]\+\)).*$:done=\1 total=\2 changed=1:; p; }' )

	if [ "${changed}" = 1 ]; then
		case "${line}" in
			*"::: completed emerge"*)
				;;
			*)
				done=$(( ${done} - 1 ))
				;;
		esac

		percent=$(( ( ${done}*100 ) / ${total} ))
		width=$(( ${COLUMNS:-80} - 8 ))
		progress=$(( ( ${done}*${width} ) / ${total} ))
		remain=$(( ${width} - ${progress} ))

		progressbar="$( for n in $( seq 1 ${progress} ); do echo -n '#'; done )"
		remainbar="$( for n in $( seq 1 ${remain} ); do echo -n ':'; done )"

		printf '\033[2A\033[2K%s\n\033[2K\033[1G[\033[1m%s\033[0m%s] \033[1m%3d%%\033[0m\n' \
			"${line:0:${COLUMNS:-80}}" "$progressbar" "$remainbar" "$percent"
	else
		printf '\033[2A\033[2K%s\n\n' "${line:0:${COLUMNS:-80}}"
	fi

	if echo "${line}" | grep -q '*** terminating.'; then
		exit
	fi
done

What’s it look like?

It works well with GNU Screen as seen above.