Oct 222018
 

Today it seems, the IT gremlins have been out to get me.  At my work I have a desktop computer (personal hardware) consisting of a Rysen 7 1700, 16GB RAM, a 240GB Intel M.2 SATA SSD (540 series) and a 4TB Western Digital HDD.

The machine has been, pretty reliable, not rock-solid, in particular, compiling gcc sometimes segfaulted for reasons unknown (the RAM checks out okay according to memtest86), but for what I was doing, it mostly ran fine.  I put up with the minor niggles with the view of solving those another day.  Today though, I come in and find X has crashed.

Okay, no big deal, re-start the display manager, except that crashed too.

Hmm, okay, log in under my regular user account and try startx:  No dice, there’s no room on /.

Ahh, that might explain a few things, we clean up some log files, truncate a 500MB file, manage to free up 50GB (!).

The machine dual-boots two OSes: Debian 9 and Gentoo.  It’s been running the latter for about 12 months now, I used Debian 9 to get things rolling so I could use the machine at work (did try Ubuntu 16.04, but it didn’t like my machine), and later, used that to get Gentoo running before switching over.  So there was a 40GB partition on the SSD that had a year-old install of Debian that wasn’t being used.  I figured I’d ditch it, and re-locate my Gentoo partition to occupy that space.

So I pull out an Ubuntu 18.04 disc, boot that up, and get gparted going.  It’s happily copying, until WHAM, I was hit with an I/O error:

Failed re-location of partition (click to enlarge)

Clicking any of the three buttons resulted in the same message.  Brilliant.  I had just copied over the first 15GB of the partition, so the Debian install would be hosed (I was deleting it anyway), but my Gentoo root partition should still be there intact at its old location.  Of course the partition table was updated, so no rolling back there.  At this point, I couldn’t do anything with the SSD, it had completely stalled, and I just had to cut my losses and kill gparted.

I managed to make some room on the 4TB drive shuffling some partitions around so I could install Ubuntu 18.04 there.  My /home partition was btrfs on the 4TB drive (first partition), the rest of that drive was LVM.  I just shrank my /home down by 40GB and slipped it in there.  The boot-loader didn’t install (no EFI partition), but who cares, I know enough of grub to boot from the DVD and bootstrap the machine that way.  At first it wouldn’t boot because in their wisdom, they created the root partition with a @ subvolume.  I worked around that by making the @ subvolume the default.

Then there was momentary panic when the /home partition I had specified lacked my usual files.  Turned out, they had created a @home subvolume on my existing /home partition.  Why? Who knows?  Debian/Ubuntu seem to do strange things with btrfs which do nothing but complicate matters and I do not understand the reasoning.  Editing /etc/fstab to remove the subvolume argument for /home and re-booting fixed that.

I set up a LVM volume that would receive a DD dump of the mangled partition to see what could be saved.  GNU’s ddrescue managed to recover most of the raw partition, and so now I just had to find where the start was.  If I had the output of fdisk -l before I started, I’d be right, but I didn’t have that foresight.  (Maybe if I had just formatted a LVM volume and DD’d the root fs before involving gparted?  Never mind!)

I figured there’d be some kind of magic bytes I could “grep” for.  Something that would tell me “BTRFS was here”.  Sure enough, the information is stashed in the superblock.  At 0x00010040 from the start of the partition, I should see the magic bytes 5f 42 47 52 66 53 5f 4d.  I just needed to grep for these.  To speed things up I made an educated guess on the start-location.  The screenshot says the old partition was about 37.25GB in size, so that was a hint to maybe try skipping that bit and see what could be seen.

Sure enough, I found what looked to be the superblock:

root@vk4msl-ws:~# dd if=/dev/mapper/scratch-rootbackup skip=38100 count=200 bs=1M | hexdump -C | grep '5f 42 48 52 66 53 5f 4d'
02e10040  5f 42 48 52 66 53 5f 4d  9d 30 0d 02 00 00 00 00  |_BHRfS_M.0......|
06e00040  5f 42 48 52 66 53 5f 4d  9d 30 0d 02 00 00 00 00  |_BHRfS_M.0......|
200+0 records in
200+0 records out

Some other probes seem to confirm this, my quarry seemed to start 38146MB into the now-merged partition.  I start copying that to a new LVM volume with the hope of being able to mount it:

root@vk4msl-ws:~# dd if=/dev/mapper/scratch-rootbackup of=/dev/mapper/scratch-gentoo--root bs=1M skip=38146

Whilst waiting for this to complete, I double-checked my findings, by inspecting the other fields. From the screenshot, I know my filesystem UUID was 6513-682e-7182-4474-89e6-c0d1c71866ad. Looking at the superblock, sure enough I see that listed:

root@vk4msl-ws:~# dd if=/dev/scratch/gentoo-root bs=$(( 0x10000 )) skip=1 count=1 | hexdump -C
1+0 records in
1+0 records out
00000000  5f f9 98 90 00 00 00 00  00 00 00 00 00 00 00 00  |_...............|
65536 bytes (66 kB, 64 KiB) copied, 0.000116268 s, 564 MB/s
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  65 13 68 2e 71 82 44 74  89 e6 c0 d1 c7 19 66 ad  |e.h.q.Dt......f.|
00000030  00 00 01 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
00000040  5f 42 48 52 66 53 5f 4d  9d 30 0d 02 00 00 00 00  |_BHRfS_M.0......|
00000050  00 00 32 da 32 00 00 00  00 00 02 00 00 00 00 00  |..2.2...........|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Looks promising! After an agonising wait, the dd finishes. I can check the filesystem:

root@vk4msl-ws:~# btrfsck /dev/scratch/gentoo-root 
Checking filesystem on /dev/scratch/gentoo-root
UUID: 6513682e-7182-4474-89e6-c0d1c71966ad
checking extents
checking free space cache
block group 111690121216 has wrong amount of free space
failed to load free space cache for block group 111690121216
block group 161082245120 has wrong amount of free space
failed to load free space cache for block group 161082245120
checking fs roots
checking csums
checking root refs
found 107544387643 bytes used, no error found
total csum bytes: 99132872
total tree bytes: 6008504320
total fs tree bytes: 5592694784
total extent tree bytes: 271663104
btree space waste bytes: 1142962475
file data blocks allocated: 195274670080
 referenced 162067775488

Okay, it complained that the free space was wrong (which I’ll blame on gparted prematurely growing the partition), but the data is there!  This is confirmed by mounting the volume and doing a ls:

root@vk4msl-ws:~# mount /dev/scratch/gentoo-root /mnt/
root@vk4msl-ws:~# ls /mnt/ -l
total 4
drwxr-xr-x 1 root root 1020 Oct  7 14:13 bin
drwxr-xr-x 1 root root   18 Jul 21  2017 boot
drwxr-xr-x 1 root root   16 May 28 10:29 dbus-1
drwxr-xr-x 1 root root 1686 May 31  2017 dev
drwxr-xr-x 1 root root 3620 Oct 19 18:53 etc
drwxr-xr-x 1 root root    0 Jul 14  2017 home
lrwxrwxrwx 1 root root    5 Sep 17 09:20 lib -> lib64
drwxr-xr-x 1 root root 1156 Oct  7 13:59 lib32
drwxr-xr-x 1 root root 4926 Oct 13 05:13 lib64
drwxr-xr-x 1 root root   70 Oct 19 11:52 media
drwxr-xr-x 1 root root   28 Apr 23 13:18 mnt
drwxr-xr-x 1 root root  336 Oct  9 07:27 opt
drwxr-xr-x 1 root root    0 May 31  2017 proc
drwx------ 1 root root  390 Oct 22 06:07 root
drwxr-xr-x 1 root root   10 Jul  6  2017 run
drwxr-xr-x 1 root root 4170 Oct  9 07:57 sbin
drwxr-xr-x 1 root root   10 May 31  2017 sys
drwxrwxrwt 1 root root 6140 Oct 22 06:07 tmp
drwxr-xr-x 1 root root  304 Oct 19 18:20 usr
drwxr-xr-x 1 root root  142 May 17 12:36 var
root@vk4msl-ws:~# cat /mnt/etc/gentoo-release 
Gentoo Base System release 2.4.1

Yes, I’ll be backing this up properly RIGHT NOW. But, my data is back, and I’ll be storing this little data recovery technique for next time.

The real lesson here is:

  1. KEEP RELIABLE BACKUPS! You never know when something will fail.
  2. Catch the copy process before it starts overwriting your source data! If there’s no overlap between the old and new locations, you’re fine, but if there is and it starts overwriting the start of your original volume, it’s all over red rover! You might be lucky with a superblock back-up, but don’t bet on it!
  3. Make note of the filesystem type and its approximate location. The fact that I knew roughly where to look, and what sort of filesystem I was looking for meant I could look for magic bytes that say “I’m a BTRFS filesystem”. The magic bytes for EXT4, XFS, etc will differ, but the same concepts are there, you just have to look up the documentation on how your particular filesystem structures its data.
Aug 252018
 

So, after some argument, and a bit of sitting on a concrete floor with the netbook, I managed to get Gentoo loaded onto the TS-7670.  Right now it’s running off the MicroSD card, I’ll get things right, then shift it across to eMMC.

ts7670 ~ # emerge --info
Portage 2.3.40 (python 3.5.5-final-0, default/linux/musl/arm/armv7a, gcc-6.4.0, musl-1.1.19, 4.14.15-vrt-ts7670-00031-g1a006273f907-dirty armv5tejl)
=================================================================
System uname: Linux-4.14.15-vrt-ts7670-00031-g1a006273f907-dirty-armv5tejl-ARM926EJ-S_rev_5_-v5l-with-gentoo-2.4.1
KiB Mem:      111532 total,     13136 free
KiB Swap:    4194300 total,   4191228 free
Timestamp of repository gentoo: Fri, 17 Aug 2018 16:45:01 +0000
Head commit of repository gentoo: 563622899f514c21f5b7808cb50f6e88dbd7d7de
sh bash 4.4_p12
ld GNU ld (Gentoo 2.30 p2) 2.30.0
app-shells/bash:          4.4_p12::gentoo
dev-lang/perl:            5.24.3-r1::gentoo
dev-lang/python:          2.7.14-r1::gentoo, 3.5.5::gentoo
dev-util/pkgconfig:       0.29.2::gentoo
sys-apps/baselayout:      2.4.1-r2::gentoo
sys-apps/openrc:          0.34.11::gentoo
sys-apps/sandbox:         2.13::musl
sys-devel/autoconf:       2.69-r4::gentoo
sys-devel/automake:       1.15.1-r2::gentoo
sys-devel/binutils:       2.30-r2::gentoo
sys-devel/gcc:            6.4.0-r1::musl
sys-devel/gcc-config:     1.8-r1::gentoo
sys-devel/libtool:        2.4.6-r3::gentoo
sys-devel/make:           4.2.1::gentoo
sys-kernel/linux-headers: 4.13::musl (virtual/os-headers)
sys-libs/musl:            1.1.19::gentoo
Repositories:

gentoo
    location: /usr/portage
    sync-type: rsync
    sync-uri: rsync://virtatomos.longlandclan.id.au/gentoo-portage
    priority: -1000
    sync-rsync-verify-jobs: 1
    sync-rsync-extra-opts: 
    sync-rsync-verify-metamanifest: yes
    sync-rsync-verify-max-age: 24

ACCEPT_KEYWORDS="arm"
ACCEPT_LICENSE="* -@EULA"
CBUILD="arm-unknown-linux-musleabi"
CFLAGS="-Os -pipe -march=armv5te -mtune=arm926ej-s -mfloat-abi=soft"
CHOST="arm-unknown-linux-musleabi"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-Os -pipe -march=armv5te -mtune=arm926ej-s -mfloat-abi=soft"
DISTDIR="/home/portage/distfiles"
ENV_UNSET="DBUS_SESSION_BUS_ADDRESS DISPLAY PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"
FCFLAGS="-O2 -pipe -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard"
FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync multilib-strict news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard"
GENTOO_MIRRORS=" http://virtatomos.longlandclan.id.au/portage http://mirror.internode.on.net/pub/gentoo http://ftp.swin.edu.au/gentoo http://mirror.aarnet.edu.au/pub/gentoo"
INSTALL_MASK="charset.alias"
LANG="en_AU.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
USE="arm bindist cli crypt cxx dri fortran iconv ipv6 modules ncurses nls nptl openmp pam pcre readline seccomp ssl tcpd unicode xattr zlib" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon plan sheets stage words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="musl" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="libinput keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-6 php7-0" POSTGRES_TARGETS="postgres9_5 postgres10" PYTHON_SINGLE_TARGET="python3_6" PYTHON_TARGETS="python2_7 python3_6" RUBY_TARGETS="ruby23" USERLAND="GNU" VIDEO_CARDS="dummy fbdev v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CC, CPPFLAGS, CTARGET, CXX, EMERGE_DEFAULT_OPTS, LC_ALL, LINGUAS, MAKEOPTS, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

I still have to update the kernel.  I actually did get kernel 4.18 to boot, but I forgot to add in support for the watchdog, so U-Boot tickled it, then the watchdog got hungry and kicked the reset half way through the boot sequence.

Rolling back to my older 4.14 kernel works.  I’ll try again with 4.18.5 in a moment.  Failing that, I have also brought the 4.14 patches up to 4.14.69 which is the latest LTS release of the kernel.

I’ve started looking at the power supply sysfs device class, with a view to exposing the supply voltage via sysfs.  The thinking here is that collectd supports reading this via the “battery” module (and realistically, it is a battery that is being measured: two 105Ah AGMs).

Worst case is I do something a little proprietary and deal with it in user space.  I’ll have to dig up the Linux kernel tree I did for Jacques Electronics all those years ago, as that had some examples of interfacing sysfs to a Cypress PSOC device that was acting as an I²C slave.  Rather than using an off-the-shelf solution, they programmed up a MCU that did power management, touchscreen sensing, keypad sensing, RGB LED control and others, all in one chip.  (Fun to try and interface that to the Linux kernel.)

Technologic Systems appear to have done something similar.  The device ID 0x78 implies a 10-bit device, but I think they’re just squatting on that 7-bit address.  They hail 0x78 then read out 4 bytes, which the last two bytes are the supply voltage ADC readings.  They do their own byte swapping before scaling the value to get mV.

Aug 222018
 

It’s taken several months and had a few false starts, but at long last I have some stage tarballs for Gentoo Linux MUSL for ARMv5 processors.  I’m not the only one wanting such a port, even looking for my earlier thread on the matter, I stumbled on this post.  (Google translate is hopeless with Russian, but I can get the gist of what’s being said.)

This was natively built on the TS-7670 using an external hard drive connected over USB 2.0 as both swap and the chroot.  It took two passes to clean everything up and get something that’s in a “release-able” state.

I think my next steps now will be:

  • Build an updated kernel … I might see if I can expose that I²C register via a sysfs file or something that collectd can pick up whilst I’m at it.  I have the kernel sources and bootloader sources.
  • Prepare the 32GB MicroSD card I bought a few weeks back with the needed partitions and load Gentoo onto that.
  • Install the MicroSD card and boot off it.
  • Back up the eMMC
  • Re-format the eMMC and copy the MicroSD card to it.

It’s supposed to be wet this weekend, so it sounds like a good project for indoors.

May 052018
 

So, at long last, I finally saw this in my chroot‘s /var/log/emerge.log:

1524887925: Started emerge on: Apr 28, 2018 03:58:45
1524887926:  *** emerge --oneshot sys-devel/gcc::musl
1524888211:  >>> emerge (1 of 1) sys-devel/gcc-7.3.0 to /
1524888212:  === (1 of 1) Cleaning (sys-devel/gcc-7.3.0::/root/musl/sys-devel/gcc/gcc-7.3.0.ebuild)
1524888307:  === (1 of 1) Compiling/Packaging (sys-devel/gcc-7.3.0::/root/musl/sys-devel/gcc/gcc-7.3.0.ebuild)
1525472690:  === (1 of 1) Merging (sys-devel/gcc-7.3.0::/root/musl/sys-devel/gcc/gcc-7.3.0.ebuild)
1525472838:  >>> AUTOCLEAN: sys-devel/gcc:7.3.0
1525473358:  === (1 of 1) Post-Build Cleaning (sys-devel/gcc-7.3.0::/root/musl/sys-devel/gcc/gcc-7.3.0.ebuild)
1525473358:  ::: completed emerge (1 of 1) sys-devel/gcc-7.3.0 to /
1525473360:  *** Finished. Cleaning up...
1525473373:  *** exiting successfully.

That’s 6 days, 18 hours and 32 minutes, of solid compiling. BUT WE GOT THERE!

What’s left? This:

Calculating dependencies... done!
[ebuild     U  ] sys-libs/musl-1.1.19 [1.1.18]
[binary   R    ] sys-libs/zlib-1.2.11-r1
[binary   R    ] app-arch/xz-utils-5.2.3
[ebuild     U  ] sys-libs/ncurses-6.1-r2 [6.0-r1]
[binary   R    ] sys-libs/readline-7.0_p3
[binary   R    ] virtual/libintl-0-r2
[binary   R    ] dev-lang/python-exec-2.4.5
[binary   R    ] virtual/libiconv-0-r2
[binary   R    ] sys-apps/gentoo-functions-0.12
[binary   R    ] dev-libs/libpcre-8.41-r1
[binary   R    ] sys-apps/sed-4.2.2
[binary   R    ] app-arch/bzip2-1.0.6-r8
[binary   R    ] dev-libs/gmp-6.1.2
[binary   R    ] app-shells/bash-4.4_p12
[binary   R    ] sys-apps/file-5.32
[binary   R    ] sys-devel/gnuconfig-20170101
[binary   R    ] dev-libs/mpfr-3.1.6
[binary   R    ] app-misc/c_rehash-1.7-r1
[binary   R    ] app-misc/mime-types-9
[binary   R    ] app-arch/tar-1.29-r3
[binary   R    ] app-arch/gzip-1.8
[binary   R    ] dev-libs/mpc-1.0.3
[binary   R    ] sys-devel/gcc-config-1.8-r1
[binary   R    ] app-misc/editor-wrapper-4
[binary   R    ] sys-apps/less-529
[binary   R    ] sys-apps/debianutils-4.8.3
[binary   R    ] net-libs/libmnl-1.0.4
[binary   R    ] sys-libs/libseccomp-2.3.2
[binary   R    ] dev-libs/popt-1.16-r2
[binary   R    ] sys-libs/e2fsprogs-libs-1.43.6
[binary   R    ] sys-devel/binutils-config-5-r4
[binary   R    ] dev-libs/libffi-3.2.1
[binary   R    ] virtual/libffi-3.0.13-r1
[binary   R    ] sys-apps/sysvinit-2.88-r9
[binary   R    ] sys-apps/opentmpfiles-0.1.3
[binary   R    ] virtual/tmpfiles-0
[binary   R    ] app-text/manpager-1
[binary   R    ] sys-libs/cracklib-2.9.6-r1
[binary   R    ] sys-apps/install-xattr-0.5
[binary   R    ] app-editors/nano-2.8.7
[binary   R    ] app-portage/elt-patches-20170815
[binary   R    ] sys-devel/m4-1.4.17
[binary   R    ] app-arch/unzip-6.0_p21-r2
[binary   R    ] sys-devel/autoconf-wrapper-13
[binary   R    ] sys-devel/bison-3.0.4-r1
[binary   R    ] sys-devel/flex-2.6.4-r1
[binary   R    ] dev-libs/libltdl-2.4.6
[binary   R    ] sys-devel/automake-wrapper-10
[binary   R    ] app-text/sgml-common-0.6.3-r6
[binary   R    ] dev-libs/libgpg-error-1.27-r1
[ebuild  N     ] dev-lang/perl-5.24.3-r1  USE="-berkdb -debug -doc -gdbm -ithreads"
[ebuild  N     ] sys-kernel/linux-headers-4.13  USE="-headers-only"
[ebuild  N     ] virtual/perl-Data-Dumper-2.160.0-r1
[ebuild  N     ] virtual/perl-Test-Harness-3.360.100_rc-r3
[ebuild  N     ] perl-core/File-Temp-0.230.400-r1
[ebuild  N     ] virtual/perl-File-Temp-0.230.400-r5
[ebuild  N     ] perl-core/File-Path-2.130.0
[ebuild  N     ] virtual/perl-File-Path-2.130.0
[binary   R    ] virtual/os-headers-0
[ebuild  N     ] sys-devel/autoconf-2.69-r4  USE="-emacs"
[ebuild  N     ] sys-apps/attr-2.4.47-r2  USE="-nls -static-libs"
[ebuild   R    ] sys-apps/coreutils-8.28-r1
[ebuild     U  ] app-admin/eselect-1.4.12 [1.4.8]
[ebuild     U  ] app-eselect/eselect-python-20171204 [20160516]
[ebuild     U  ] sys-devel/patch-2.7.6-r1 [2.7.5]
[ebuild  N     ] sys-apps/shadow-4.5  USE="cracklib xattr -acl -audit -nls -pam (-selinux) -skey"
[binary   R    ] virtual/shadow-0
[ebuild  N     ] virtual/perl-ExtUtils-MakeMaker-7.100.200_rc-r4
[ebuild  N     ] sys-libs/libcap-2.24-r2  USE="-pam -static-libs"
[ebuild  N     ] dev-perl/Text-Unidecode-1.270.0
[ebuild  N     ] dev-perl/libintl-perl-1.240.0-r2
[ebuild  N     ] sys-apps/help2man-1.47.4  USE="-nls"
[ebuild  N     ] sys-devel/automake-1.15.1-r2  USE="{-test}"
[ebuild  N     ] sys-devel/libtool-2.4.6-r3  USE="-vanilla"
[ebuild  N     ] dev-libs/expat-2.2.5  USE="unicode -examples -static-libs"
[ebuild   R    ] sys-process/psmisc-22.21-r3
[ebuild  N     ] sys-libs/gdbm-1.13-r2  USE="readline -berkdb -exporter -nls -static-libs"
[ebuild  N     ] sys-apps/groff-1.22.2  USE="-X -examples" L10N="-ja"
[ebuild  N     ] dev-libs/libelf-0.8.13-r2  USE="-debug -nls"
[ebuild  N     ] virtual/libelf-2
[ebuild  N     ] dev-libs/libgcrypt-1.8.1  USE="-doc -static-libs"
[ebuild  N     ] dev-perl/XML-Parser-2.440.0
[ebuild  N     ] virtual/perl-File-Spec-3.630.100_rc-r4
[ebuild  N     ] dev-perl/Unicode-EastAsianWidth-1.330.0-r1
[ebuild  N     ] sys-apps/texinfo-6.3  USE="-nls -static"
[ebuild  N     ] dev-libs/iniparser-3.1-r1  USE="-doc -examples -static-libs"
[ebuild  N     ] app-portage/portage-utils-0.64  USE="-nls -static"
[ebuild  N     ] dev-libs/openssl-1.0.2o  USE="asm sslv3 tls-heartbeat zlib -bindist -gmp -kerberos -rfc3779 -sctp -sslv2 -static-libs {-test} -vanilla"
[binary  N     ] dev-lang/python-2.7.14-r1  USE="ipv6 ncurses readline ssl (threads) (wide-unicode) xml (-berkdb) -build -doc -examples -gdbm -hardened -libressl -sqlite -tk -wininst"
[binary  N     ] sys-apps/openrc-0.34.11  USE="ncurses netifrc unicode -audit -debug -newnet -pam (-prefix) (-selinux) -static-libs" 
[ebuild  N     ] net-misc/netifrc-0.5.1
[binary   R    ] sys-apps/grep-3.0
[binary   R    ] sys-apps/findutils-4.6.0-r1
[binary   R    ] sys-apps/kbd-2.0.4
[ebuild  N     ] sys-apps/busybox-1.28.0  USE="ipv6 static -debug -livecd -make-symlinks -math -mdev -pam -savedconfig (-selinux) -sep-usr -syslog (-systemd)"
[binary   R    ] virtual/service-manager-0
[binary   R    ] sys-devel/binutils-2.29.1-r1
[ebuild  N     ] sys-apps/net-tools-1.60_p20161110235919  USE="arp hostname ipv6 -nis -nls -plipconfig (-selinux) -slattach -static" 
[binary   R    ] sys-apps/gawk-4.1.4
[binary   R    ] virtual/editor-0
[binary   R    ] sys-devel/make-4.2.1
[binary   R    ] sys-process/procps-3.3.12-r1
[binary   R    ] virtual/dev-manager-0-r1
[binary   R    ] sys-apps/which-2.21
[ebuild  N     ] net-misc/iputils-20171016_pre  USE="arping filecaps ipv6 openssl ssl -SECURITY_HAZARD -caps -clockdiff -doc -gcrypt
 (-idn) -libressl -nettle -rarpd -rdisc -static -tftpd -tracepath -traceroute"
[binary   R    ] virtual/pager-0
[binary   R    ] sys-apps/diffutils-3.5
[binary   R    ] sys-apps/baselayout-2.4.1-r2
[binary   R    ] virtual/libc-1
[binary   R   ~] sys-devel/gcc-7.3.0
[binary   R    ] virtual/pkgconfig-0-r1
[ebuild  N     ] dev-lang/python-3.5.5  USE="ipv6 ncurses readline ssl (threads) xml -build -examples -gdbm -hardened -libressl -sqlite {-test} -tk -wininst"
[ebuild  N     ] app-misc/ca-certificates-20170717.3.36.1  USE="-cacert -insecure_certs"
[ebuild  N     ] sys-apps/util-linux-2.30.2-r1  USE="cramfs ncurses readline suid unicode -build -caps -fdformat -kill -nls -pam -python (-selinux) -slang -static-libs (-systemd) {-test} -tty-helpers -udev" PYTHON_SINGLE_TARGET="python3_5 -python2_7 -python3_4 -python3_6" PYTHON_TARGETS="python2_7 python3_5 -python3_4 -python3_6"
[ebuild     U  ] app-misc/pax-utils-1.2.3 [1.1.7]
[ebuild     U  ] sys-apps/sandbox-2.13 [2.10-r4]
[ebuild     U  ] net-misc/rsync-3.1.3 [3.1.2-r2]
[ebuild  N     ] net-firewall/iptables-1.6.1-r3  USE="ipv6 -conntrack -netlink -nftables -pcap -static-libs"
[ebuild     U  ] dev-libs/libpipeline-1.4.2 [1.4.0]
[ebuild  N     ] sys-apps/man-db-2.7.6.1-r2  USE="gdbm manpager zlib -berkdb -nls (-selinux) -static-libs"
[ebuild     U  ] sys-apps/kmod-24 [23] PYTHON_TARGETS="-python3_6%"
[ebuild  N     ] dev-python/pyblake2-1.1.0  PYTHON_TARGETS="python2_7 python3_5 (-pypy) -python3_4 -python3_6"
[ebuild  N     ] net-misc/openssh-7.5_p1-r4  USE="hpn pie ssl -X -X509 -audit -bindist -debug -kerberos -ldap -ldns -libedit -libressl -livecd -pam -sctp (-selinux) -skey -ssh1 -static {-test}"
[ebuild  N     ] dev-util/gtk-doc-am-1.25-r1
[ebuild  N     ] dev-libs/libxml2-2.9.7  USE="ipv6 readline -debug -examples -icu -lzma -python -static-libs {-test}" PYTHON_TARGETS="python2_7 python3_5 -python3_4 -python3_6"
[ebuild  N     ] sys-devel/gettext-0.19.8.1  USE="cxx ncurses openmp -acl -cvs -doc -emacs -git -java (-nls) -static-libs"
[ebuild  N     ] app-text/build-docbook-catalog-1.19.1
[ebuild  N     ] dev-libs/libxslt-1.1.30-r2  USE="crypt -debug -examples -python -static-libs" PYTHON_TARGETS="python2_7"
[ebuild  N     ] app-text/docbook-xsl-stylesheets-1.79.1-r2  USE="-ruby"
[ebuild  N     ] app-text/docbook-xml-dtd-4.1.2-r6
[ebuild  N     ] dev-util/intltool-0.51.0-r2
[ebuild  N     ] dev-libs/glib-2.52.3  USE="mime xattr -dbus -debug (-fam) (-selinux) -static-libs -systemtap {-test} -utils" PYTHON_TARGETS="python2_7"
[ebuild  N     ] x11-misc/shared-mime-info-1.9  USE="{-test}"
[ebuild  N     ] dev-python/setuptools-36.7.2  USE="{-test}" PYTHON_TARGETS="python2_7 python3_5 (-pypy) (-pypy3) -python3_4 -python3_6"
[ebuild  N     ] dev-python/certifi-2017.4.17  PYTHON_TARGETS="python2_7 python3_5 (-pypy) (-pypy3) -python3_4 -python3_6"
[ebuild  N     ] dev-python/pyxattr-0.5.5  USE="-doc {-test}" PYTHON_TARGETS="python2_7 python3_5 (-pypy) -python3_4"
[ebuild  N     ] sys-apps/portage-2.3.24-r1  USE="(ipc) native-extensions xattr -build -doc -epydoc -gentoo-dev (-rsync-verify) (-selinux)" PYTHON_TARGETS="python2_7 python3_5 (-pypy) -python3_4 -python3_6"
[ebuild  N     ] app-admin/perl-cleaner-2.25
[binary   R    ] virtual/man-0-r1
[binary   R    ] virtual/modutils-0
[ebuild  N     ] sys-fs/e2fsprogs-1.43.6  USE="-fuse (-nls) -static-libs"
[ebuild     U  ] virtual/package-manager-1 [0]
[ebuild  N     ] sys-apps/iproute2-4.14.1-r2  USE="iptables ipv6 -atm -berkdb -minimal (-selinux)"
[binary   R    ] virtual/ssh-0
[ebuild  N     ] net-misc/wget-1.19.1-r2  USE="ipv6 pcre ssl zlib -debug -gnutls -idn -libressl -nls -ntlm -static {-test} -uuid"
[ebuild   R    ] dev-util/pkgconfig-0.29.2  USE="-internal-glib*"

!!! The following binary packages have been ignored due to non matching USE:

    =dev-util/pkgconfig-0.29.2 internal-glib
    =sys-apps/attr-2.4.47-r2 nls
    =sys-apps/man-db-2.7.6.1-r2 nls
    =dev-libs/libelf-0.8.13-r2 nls
    =sys-apps/shadow-4.5 -linguas_cs -linguas_da -linguas_de -linguas_es -linguas_fi -linguas_fr -linguas_hu -linguas_id -linguas_it -linguas_ja -linguas_ko -linguas_pl -linguas_pt_BR -linguas_ru -linguas_sv -linguas_tr -linguas_zh_CN -linguas_zh_TW nls

NOTE: The --binpkg-respect-use=n option will prevent emerge
      from ignoring these binary packages if possible.
      Using --binpkg-respect-use=y will silence this warning.

I think that’s broken the back of the job.  Of course when I come to running Catalyst, I’ll have to do it all over again, but at least now the environment is clean.

Apr 282018
 

So, a few weeks ago I installed a new battery charger, and tweaked it so that the solar did most of the leg work during the day, and the charger kept the batteries topped up at night.

I also discussed the addition of a new industrial PC to perform routing and system monitoring functions… which was to run Gentoo Linux/musl. For now, that little PC is still running Debian Stretch, but for 45 days, it was rock solid. The addition of this box, and taking on the role of router to the management network meant I could finally achieve one of my long-term goals for the project: decommissioning the old server.

The old server is still set up with all my data and software… but now the back-up cron job calls /sbin/poweroff when it’s done, and the BIOS is set to wake the machine up in the evening ready to receive a back-up late at night.

In its place, a virtual machine clone of the box, handles my email and all the old functions of that server. This was all done just prior to my father and I leaving for a 3 week holiday in the Snowy Mountains.

I did have a couple of hiccups with Ceph OSDs crashing … but basically re-starting the daemons (done remotely whilst travelling through Cowra) got everything back up. A bit of placement group cleaning, and everything was back online again. I had another similar hiccup coming out of Maitland, but once again, re-starting the daemons fixed it. No idea why it crashed, that’s something I’ll have to investigate.

Other than that, the cluster itself has run well.

One thing that did momentarily kill the industrial PC though: I wandered down to the rack with a small bus-powered 2.5″ HDD with the intent of re-starting my Gentoo builds. This HDD had the same content as the 3.5″ HDD I had plugged in before. I figured being bus powered, I would not be dependent on mains, and it could just chug away to its heart’s content.

No such luck, the moment I plugged that drive in, the little machine took great umbrage to the spinning rust now vacuuming the electrons away from its core functions, and shut down abruptly. I’ve now brought my 3.5″ drive and dock down, plugged that into the wall, and have my builds resuming. If power goes off, hopefully the machine either handles the loss of swap gracefully. If it does crash, the watchdog will take care of it.

Thus, I have the little TS-7670 first attempting a build of gcc, to see how we go. Finger’s crossed our power should remain up. There was at least one outage in the time we were away, but hopefully we should get though this next build!

The next step I think should be to add some control of the mains charger to allow the batteries to be boosted to full charge overnight. The thinking is a simple diode-OR arrangement. Many comparators such as the LM393 have an open-collector output, which gives us this for free.

The theory is this.

The battery bank powers a simple circuit which runs of a 5V regulator. That regulator powers a dual comparator IC and provides a reference voltage. The comparator draws bugger all power, so I’m happy to use a linear PSU here. It’s mainly there as a voltage reference.

Precision isn’t really the aim here, so adjustable pots will make life easier.

The voltages from the battery bank and the solar panel are fed through voltage dividers to bring the voltages down to below 5V, then those voltages are individually fed into separate pots that control the hysteresis. I can adjust all points of the system.

The idea is that should the batteries get too low, or the sun go down, one or the other (or both) comparators will go low and pull down on R2. If the batteries are high and the sun is up, nothing pulls on R2 so the REMOTE+ pin on the HEP-600C-12 is allowed to float to +5V, turning off the mains charger.

The advantage of this is there’s no programming of a microcontroller, it’s just analogue electronics. The LM393s are pretty hardy things, the datasheet says they’ll run at 36V and can accept a maximum voltage of VCC-1.5V; so if I run at 5V, 3.5V is my recommended maximum. The adjustment pots should let me set a threshold voltage that avoids going above this.

I mainly need 5V for the HEP-600C-12, and for providing that stable known voltage reference. The LM78C05 should be fine for this.

Once I’ve done that, I should be able to wind that charger back up to its factory setting of 14.4V, which will mean that overnight the batteries will be charged back to full charge.

Mar 172018
 

Last night, I got home, having made a detour on my way into work past Jaycar Wooloongabba to replace the faulty PSU.
It was a pretty open-and-shut case, we took it out of the box, plugged it in, and sure enough, no fan.  After the saleswoman asked the advice of a co-worker, it was confirmed that the fan should be running.
It took some digging, but they found a replacement, and so it was boxed up (in the box I supplied, they didn’t have one), and I walked out the door with PSU No. 3.
I had to go straight to work, so took the PSU with me, and that evening, I loaded it into the top box to transport home on the bicycle.
I get home, and it’s first thing on my mind.  I unlock the top box, get it out, and still decked out in my cycling gear, helmet and all (needed the headlight to see down the back of the rack anyway), I get to work.
I put the ring lugs on, plug it into the wall socket and flick the switch.
Nothing.
Toggle the switch on the front, still nothing.
Tried the other socket on the outlet, unplugging the load, still nothing.  Did the 10km trip from Milton to The Gap kill it?
Frustrated, I figure I’ll switch a light on.  Funny… no lights.
I wander into the study… sure enough, the router, modem and switch are dead as doornails.  Wander out to the MDB outside, saw the main breaker was still on, and tried hitting the test button.  Nothing.
I wander back inside, switching the bike helmet for my old hard hat, since it looks as if I’ll need the headlight a bit longer, then take a sticky beak down the road to see if anyone else is facing the same issue.
Sure enough, I look down the street, everyone’s out.
So there goes my second attempt at bootstrapping Gentoo, and my old server’s uptime.
The power did return about an hour or so later.  The PSU was fine, you don’t think of the mains being out as the cause of your problems.
I’ll re-start my build, but I’m not going to lose another build to failing power.  Nope, had enough of that for a joke.
I could have rigged up a UPS to the TS-7670, but I already have one, and it’s in the very rack where it’ll get installed anyway.  Thus, no time like the present to install it.
I’ll have to configure the switch to present the right VLANs to the TS-7670, but once I do that, it’ll be able to take over the role of routing between the management VLAN and the main network.
I didn’t want to do this in a VM because that means exposing the hosts and the VMs to the management VLAN, meaning anyone who managed to compromise a host would have direct access to the BMCs on the other nodes.
This is not a network with high bandwidth demands, and so the TS-7670 with its 100Mbps Ethernet (built into the SoC; not via USB) is an ideal machine for this task.
Having done this, all that’s left to do is to create a 2GB dual-core VM which will receive the contents of the old server, then that server can be shut down, after 8 years of good service.  I’ll keep it around for storing the on-site backups, but now I can keep it asleep and just wake it up with Wake-on-LAN when I want to make a back-up.
This should make a dint in our electricity bill!
Other changes…

  • Looks like we’ll be upgrading the solar with the addition of another 120W panel.
  • I will be hooking up my other network switches, the ADSL router and ADSL modem up to the battery bank on the cluster, just got to get some suitable cable for doing so.
  • I have no faith in this third PSU, so already, I have a MeanWell HEP-600C coming.  We’ll wire up a suicide lead to it, and that can replace the Powertech MP-3089 + Redarc BCDC1225, as the MeanWell has a remote on/off feature I can use to control it.
Mar 152018
 

Perhaps literally… it has bitten the dust.  Although I wouldn’t call its installed location, dusty.  Once again, the fan in the mains power supply has carked it.

Long-term followers of this project may remember that the last PSU failed the same way.

The reason has me miffed.  All I did with the replacement, was take the PSU out of its box, loosen the two nuts for the terminals, slip the ring lugs for my power lead over the terminals, returned the nuts, plugged it in and turned it on.

While it is running 24×7, there is nothing in the documentation to say this PSU can’t run that way.  This is what the installation looks like.

If it were dusty, I’d expect to be seeing hardware failures in my nodes.

This PSU is barely 4 months old, and earlier this week, the fan started making noises, and requiring percussive maintenance to get started. Tonight, it failed. Completely, no taps on the case will convince it to go.

Now, I need to keep things running until the weekend. I need it to run without burning the house down.

Many moons ago, my father bought a 12V fan for the caravan. Cheap and nasty. It has a slider switch to select between two speeds; “fast” and “slow”, which would be better named “scream like a banshee” and “scream slightly less like a banshee”. The speed reduction is achieved by passing current through a 10W resistor, and achieves maybe a 2% reduction in motor RPM. As you can gather, it proved to be a rather unwelcome room mate, and has seen its last day in the caravan.

This fan, given it runs off 12V, has proven quite handy with the cluster. I’ve got my SB-50 “load” socket hanging out the front of the cluster. A little adaptor to bring that out to a cigarette lighter socket, and I can run it off the cluster batteries. When a build job has gotten a node hot and bothered, sitting this down the bottom of the cluster and aiming it at a node has cooled things down well.

Tonight, it has another task … to try and suck the hot air out of the PSU.

That’s the offending power supply.  A PowerTech MP-3089.  It powers the RedARC BCDC-1225 right above it.  And you can see my kludge around the cooling problem.  Not great, but it should hold for the next 24 hours.

Tomorrow, I think we’ll call past Aspley and pick up another replacement.  I’m leery of another now, but I literally have no choice … I need it now.  Sadly, >250W 12V switchmode PSUs are somewhat rare beasts here in Brisbane.  Altronics don’t sell them that big.  The grinning glasses are no more, and I’m not risking it with the Xantrex charger again.

Long term, I’m already looking at the MeanWell SP-480-12.  This is a PSU module, and will need its own case and mains wiring… but I have no faith in the MP-3089 to not fail and cremate my home of 34 years.

The nice feature of the SP-480-12 is that it does have a remote +12V power-off feature.  Presumably I can drive this with a comparator/output MOSFET, so that when the battery voltage drops below some critical threshold, it kicks in, and when it rises above a high set-point, it drops out.  Simple control, with no MCU involved.  I don’t see a reason to get more fancy than that on the control side, anything more is a liability.

On other news, my gcc build on the TS-7670 failed … so much for the wait.  We’ll try another version and see how we go.

Mar 132018
 

So the house got momentarily power-cycled this morning… I’m at work, minding my own business, next thing the access point emails me this:

Mar 13 09:04:23 Syslogd start up

Now, it only does that for two reasons.  Either someone told it to reboot (not I), or it got hard reset.  Sure enough, log into the old server, and it’s reporting an uptime of 15 minutes.  I get home this evening, and clocks all around are on the blink … literally.

The cluster course is going, power outage?  What power outage?

I did consider wiring up the ADSL modem, router, study switch, and the TS-7670 up to the cluster’s power rails, but haven’t gotten around to doing that.  Alas, I’m not quite there yet.

In any case, even if the TS-7670 had been powered from the solar, I’d have still have temporarily lost the build as the HDD dock I have the hard drive sitting in is mains powered.  It also doesn’t remember its state after a power cycling.  I’d have re-started the build from work, but the HDD remained off when the power came back on.

Never mind.  The downside is now I get to re-start a multi-day build.  The good news though, is that knowing the ebuild file that Portage picked out for compiling gcc; I can resume where it left off.  In this case, it’s using an ebuild from the musl overlay; /root/musl/sys-devel/gcc/gcc-6.4.0-r1.ebuild.

ebuild /root/musl/sys-devel/gcc/gcc-6.4.0-r1.ebuild package will preserve the current working tree and will resume where it was, hopefully without incident.  I’ll be left with a .tbz2; which will be picked up when I run emerge –keep-going -ekv @system.

Mar 122018
 

Well, in my last post I discussed getting OpenADK to build a dev environment on the TS-7670.  I had gotten Gentoo’s Portage installed, and started building packages.

The original plan was to build everything into /tmp/seed, but that requires that all the dependencies are present in the chroot.  They aren’t.  In the end, I decided to go the ill-advised route of compiling Gentoo over the top of OpenADK.

This is an ugly way to do things, but it so far is bearing fruit.  Initially there were some hiccups, and I had to restore some binaries from my OpenADK build tree.  When Gentoo installed python-exec; that broke Portage and I found I had to unpack a Python 2.7 binary I had built earlier then use that to re-install Portage.  I could then continue.

Right now, it’s grinding away at gcc; which was my nemesis from the beginning.  This time though, it successfully built xgcc and xg++; which means it has compiled itself using the OpenADK-supplied gcc; and now is building itself using its self-built binaries.  I think it does two or three passes at this.

If it gets through this, there’s about 65 packages to go after that.  Mostly small ones.  I should be able to do a ROOT=/tmp/seed emerge -ek @system then tar up /tmp/seed and emerge catalyst.  I have some wrapper scripts around Catalyst that I developed back when I was responsible for doing the MIPS stages.  These have been tweaked to do musl builds, and were used to produce these x86 stages.  The same will work for ARMv5.

It might be another week of grinding away, but we should get there. 🙂

Feb 262018
 

So, after a longish wait… my laptop finally coughed up an image with a C/C++ compiler and almost all the bits necessary to make Gentoo Portage tick.

Almost everything… wget built, but it segfaults on start-up.  No matter, it seems curl works.  We do have an issue though: Portage no longer supports customising the downloader like it used to, or at least I couldn’t see how to do it, it used to be settings in make.conf.

Thankfully, I know shell scripts, and can make my own wget using the working curl:

bash-4.4# cat > /usr/bin/wget
#!/bin/bash

OUT=
while [ $# -gt 0 ]; do
    case "$1" in
        -O) OUT="$2"; shift;;
        -t) shift;;
        -T) shift;;
        --passive-ftp) : ;;
        *) break ;;
    esac
    shift
done

set -ex
curl --progress-bar -o "${OUT}" "$1"

Okay, it’s a little (a lot) braindead, but it beats downloading the lot by hand!

I was able to get Gentoo installed by hand using these instructions.  I have an old 1TB HDD plugged into a USB dock, formatted with a 10GB swap partition and the rest btrfs.  Sure, it’s only USB 2.0, but I’d sooner just put up with some CPU overhead than wear out my eMMC.

Next step; ROOT=/tmp/seed emerge -ev system