Solar Cluster: Thank goodness for good monitoring

A few months back now, I had the misfortune of overshooting my Internet quota, and winding up with a AU$380 bill for the month (and that was capped… in truth it was more like AU$3000).  In fact, it happened a couple of times until I finally nailed down the cause.

Part of it was NTP traffic (seems lots of cowboys write SNTP clients now and point them at, some was the Spambot Hunter Project and related activity.  In short, I invested some money into upping the quota, and some time into better monitoring.

I wanted to do the monitoring anyway to keep an eye on operations, as well as things like the solar panel voltages, etc.  Since I got it in place, I’ve been able to get much faster notifications of when things go awry.  Much sooner than the 120% quota usage alarm that Internode sends you.

I’m glad I did that now, last night I left a few tabs open on the site.  I noticed this evening they were still trying to load something and got suspicious… then I saw this:

Double checking, sure enough, something on one of those pages made Chromium get its knickers into a twist, and chew through all that data.

It took me a bit of tinkering to get the right query to extract the above chart.  Essentially there was a sustained 1.5MB/sec download for over 21 hours which would account for the 113.1GB that Internode recorded.

It’s a bit co-incidental that the usage dropped the moment I re-started Chromium.  Not sure why it was continually re-loading pages, but never mind.

The above data is collected using a combination of collectd and InfluxDB, with Grafana doing the dashboarding and alarms, and a small Perl script pulling the usage data off Internode’s API.

Solar Cluster: Kernel driver now up on Github

So, I’m happy enough with the driver now that I’ll collapse down the commits and throw it up onto the Github repository.  I might take another look at kernel 4.18, but for now, you’ll find them on the ts7670-4.14.67 branch.

Two things I observe about this voltage monitor:

  1. The voltage output is not what you’d call, accurate.  I think it’s only a 10-bit ADC, which is still plenty good enough for this application, but the reading I think is “high” by about 50mV.
  2. There’s significant noise on the reading, with noticeable quantisation steps.

Owing to these, and to thwart the possibility of using this data in side-channel attacks using power analysis, I’ve put a 40-sample moving-average filter on the “public” data.

Never the less, it’s a handy party trick, and not one I expected these devices to be able to do.  My workplace manages a big fleet of these single-board computers in the residential towers at Barangaroo where they spend all day polling Modbus and M-Bus meters.  In the event we’re at all suspicious about DC power supplies though, it’s a simple matter to load this kernel tree (they already run U-Boot) and configure collectd (which is also installed).

I also tried briefly switching off the mains power to see that I was indeed reading the battery voltage and not just a random number that looked like the voltage.  That yielded an interesting effect:

You can see where I switched the mains supply off, and back on again.  From about 8:19PM the battery voltage predictably fell until about 8:28PM where it was at around 12.6V.

Then it did something strange, it rose about 100mV before settling at 12.7V.  I suspect if I kept it off all night it’d steadily decrease: the sun has long set.  I’ve turned the mains charger back on now, as you can see by the step-rise shortly after 8:44PM.

The bands on the above chart are the alert zones.  I’ll get an email if the battery voltage strays outside of that safe region of 12-14.6V.  Below 12V, and I run the risk of deep-cycling the batteries.  Above 14.6V, and I’ll cook them!

The IPMI BMCs on the nodes already sent me angry emails when the battery got low, so in that sense, Grafana duplicates that, but does so with pretty charts.  The BMCs don’t see when the battery gets too high though, for the simple matter that what they see is regulated by LDOs.

Solar Cluster: Getting the battery voltage into Grafana

I’ve succeeded in getting a working battery monitor kernel module. This is basically taking the application note by Technologic Systems and spinning that into a power supply class driver that reports the voltage via sysfs.

As it happens, the battery module in collectd does not see this as a “battery”, something I’ll look at later. For now the exec plug-in works well enough. This feeds through eventually to an InfluxDB database with Grafana sitting on top.

Solar Cluster: Battery monitor PC now running Gentoo

So, after some argument, and a bit of sitting on a concrete floor with the netbook, I managed to get Gentoo loaded onto the TS-7670.  Right now it’s running off the MicroSD card, I’ll get things right, then shift it across to eMMC.

ts7670 ~ # emerge --info
Portage 2.3.40 (python 3.5.5-final-0, default/linux/musl/arm/armv7a, gcc-6.4.0, musl-1.1.19, 4.14.15-vrt-ts7670-00031-g1a006273f907-dirty armv5tejl)
System uname: Linux-4.14.15-vrt-ts7670-00031-g1a006273f907-dirty-armv5tejl-ARM926EJ-S_rev_5_-v5l-with-gentoo-2.4.1
KiB Mem:      111532 total,     13136 free
KiB Swap:    4194300 total,   4191228 free
Timestamp of repository gentoo: Fri, 17 Aug 2018 16:45:01 +0000
Head commit of repository gentoo: 563622899f514c21f5b7808cb50f6e88dbd7d7de
sh bash 4.4_p12
ld GNU ld (Gentoo 2.30 p2) 2.30.0
app-shells/bash:          4.4_p12::gentoo
dev-lang/perl:            5.24.3-r1::gentoo
dev-lang/python:          2.7.14-r1::gentoo, 3.5.5::gentoo
dev-util/pkgconfig:       0.29.2::gentoo
sys-apps/baselayout:      2.4.1-r2::gentoo
sys-apps/openrc:          0.34.11::gentoo
sys-apps/sandbox:         2.13::musl
sys-devel/autoconf:       2.69-r4::gentoo
sys-devel/automake:       1.15.1-r2::gentoo
sys-devel/binutils:       2.30-r2::gentoo
sys-devel/gcc:            6.4.0-r1::musl
sys-devel/gcc-config:     1.8-r1::gentoo
sys-devel/libtool:        2.4.6-r3::gentoo
sys-devel/make:           4.2.1::gentoo
sys-kernel/linux-headers: 4.13::musl (virtual/os-headers)
sys-libs/musl:            1.1.19::gentoo

    location: /usr/portage
    sync-type: rsync
    sync-uri: rsync://
    priority: -1000
    sync-rsync-verify-jobs: 1
    sync-rsync-verify-metamanifest: yes
    sync-rsync-verify-max-age: 24

CFLAGS="-Os -pipe -march=armv5te -mtune=arm926ej-s -mfloat-abi=soft"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-Os -pipe -march=armv5te -mtune=arm926ej-s -mfloat-abi=soft"
FCFLAGS="-O2 -pipe -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard"
FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync multilib-strict news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
USE="arm bindist cli crypt cxx dri fortran iconv ipv6 modules ncurses nls nptl openmp pam pcre readline seccomp ssl tcpd unicode xattr zlib" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon plan sheets stage words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="musl" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="libinput keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-6 php7-0" POSTGRES_TARGETS="postgres9_5 postgres10" PYTHON_SINGLE_TARGET="python3_6" PYTHON_TARGETS="python2_7 python3_6" RUBY_TARGETS="ruby23" USERLAND="GNU" VIDEO_CARDS="dummy fbdev v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"

I still have to update the kernel.  I actually did get kernel 4.18 to boot, but I forgot to add in support for the watchdog, so U-Boot tickled it, then the watchdog got hungry and kicked the reset half way through the boot sequence.

Rolling back to my older 4.14 kernel works.  I’ll try again with 4.18.5 in a moment.  Failing that, I have also brought the 4.14 patches up to 4.14.69 which is the latest LTS release of the kernel.

I’ve started looking at the power supply sysfs device class, with a view to exposing the supply voltage via sysfs.  The thinking here is that collectd supports reading this via the “battery” module (and realistically, it is a battery that is being measured: two 105Ah AGMs).

Worst case is I do something a little proprietary and deal with it in user space.  I’ll have to dig up the Linux kernel tree I did for Jacques Electronics all those years ago, as that had some examples of interfacing sysfs to a Cypress PSOC device that was acting as an I²C slave.  Rather than using an off-the-shelf solution, they programmed up a MCU that did power management, touchscreen sensing, keypad sensing, RGB LED control and others, all in one chip.  (Fun to try and interface that to the Linux kernel.)

Technologic Systems appear to have done something similar.  The device ID 0x78 implies a 10-bit device, but I think they’re just squatting on that 7-bit address.  They hail 0x78 then read out 4 bytes, which the last two bytes are the supply voltage ADC readings.  They do their own byte swapping before scaling the value to get mV.

Solar Cluster: Measuring the battery voltage

So, I was just updating the project details for this project, and I happened to see this blog post about reading the DC voltage input on the TS-7670v2.

I haven’t yet gotten around to finishing the power meters that I was building which would otherwise be reading these values directly, but they were basically going to connect via Modbus to the TS-7670v2 anyway.  One of its roles, aside from routing between the physical management network (IPMI and switch console access), was to monitor the battery.

I will have to explore this.  Collectd doesn’t have a general-purpose I²C module, but it does have one for barometer modules, so with a bit of work, I could make one to measure the voltage input which would tell me what the battery is doing.

Solar Cluster: Battery monitor computer ordered

I’ve taken the plunge and gotten a TS-7670 ordered in a DIN-rail mount for monitoring the battery.  Not sure what the shipping will be from Arizona to here, but I somehow doubt I’m up for more than AU$300 for this thing.  The unit itself will cost AU$250.

Some will argue that a Raspberry Pi or BeagleBone would be cheaper, and that would be correct, however by the time you’ve added a DIN-rail mount case, an RS-485 control board and a 12V to 5V step-down power converter, you’d be around that figure anyway.  Plus, the Raspberry Pi doesn’t give you schematics.  The BeagleBone does, but is also a more sophisticated beast.

The plan is I’ll spin a version of Gentoo Linux on it… possibly using the musl C library to keep memory usage down as I’ve gone the base model with 128MB RAM.  I’ll re-spin the kernel and U-Boot patches I have for the latest release.

There will be two functions looked after:

  • Access to the IPMI/L2 management network
  • Polling of the two DC power meters (still to be fully designed) via Modbus

It can report to a VM running on one of the hosts.  I believe collectd has the necessary bits and pieces to do this.  Failing that, I’ve written code before that polls Modbus… I write such code for a day job.