grafana

Solar Cluster: Thank goodness for good monitoring

A few months back now, I had the misfortune of overshooting my Internet quota, and winding up with a AU$380 bill for the month (and that was capped… in truth it was more like AU$3000).  In fact, it happened a couple of times until I finally nailed down the cause.

Part of it was NTP traffic (seems lots of cowboys write SNTP clients now and point them at pool.ntp.org), some was the #Hackaday.io Spambot Hunter Project and related activity.  In short, I invested some money into upping the quota, and some time into better monitoring.

I wanted to do the monitoring anyway to keep an eye on operations, as well as things like the solar panel voltages, etc.  Since I got it in place, I’ve been able to get much faster notifications of when things go awry.  Much sooner than the 120% quota usage alarm that Internode sends you.

I’m glad I did that now, last night I left a few tabs open on the Hackaday.io site.  I noticed this evening they were still trying to load something and got suspicious… then I saw this:

Double checking, sure enough, something on one of those pages made Chromium get its knickers into a twist, and chew through all that data.

It took me a bit of tinkering to get the right query to extract the above chart.  Essentially there was a sustained 1.5MB/sec download for over 21 hours which would account for the 113.1GB that Internode recorded.

It’s a bit co-incidental that the usage dropped the moment I re-started Chromium.  Not sure why it was continually re-loading pages, but never mind.

The above data is collected using a combination of collectd and InfluxDB, with Grafana doing the dashboarding and alarms, and a small Perl script pulling the usage data off Internode’s API.

Solar Cluster: Kernel driver now up on Github

So, I’m happy enough with the driver now that I’ll collapse down the commits and throw it up onto the Github repository.  I might take another look at kernel 4.18, but for now, you’ll find them on the ts7670-4.14.67 branch.

Two things I observe about this voltage monitor:

  1. The voltage output is not what you’d call, accurate.  I think it’s only a 10-bit ADC, which is still plenty good enough for this application, but the reading I think is “high” by about 50mV.
  2. There’s significant noise on the reading, with noticeable quantisation steps.

Owing to these, and to thwart the possibility of using this data in side-channel attacks using power analysis, I’ve put a 40-sample moving-average filter on the “public” data.

Never the less, it’s a handy party trick, and not one I expected these devices to be able to do.  My workplace manages a big fleet of these single-board computers in the residential towers at Barangaroo where they spend all day polling Modbus and M-Bus meters.  In the event we’re at all suspicious about DC power supplies though, it’s a simple matter to load this kernel tree (they already run U-Boot) and configure collectd (which is also installed).

I also tried briefly switching off the mains power to see that I was indeed reading the battery voltage and not just a random number that looked like the voltage.  That yielded an interesting effect:

You can see where I switched the mains supply off, and back on again.  From about 8:19PM the battery voltage predictably fell until about 8:28PM where it was at around 12.6V.

Then it did something strange, it rose about 100mV before settling at 12.7V.  I suspect if I kept it off all night it’d steadily decrease: the sun has long set.  I’ve turned the mains charger back on now, as you can see by the step-rise shortly after 8:44PM.

The bands on the above chart are the alert zones.  I’ll get an email if the battery voltage strays outside of that safe region of 12-14.6V.  Below 12V, and I run the risk of deep-cycling the batteries.  Above 14.6V, and I’ll cook them!

The IPMI BMCs on the nodes already sent me angry emails when the battery got low, so in that sense, Grafana duplicates that, but does so with pretty charts.  The BMCs don’t see when the battery gets too high though, for the simple matter that what they see is regulated by LDOs.

Solar Cluster: Getting the battery voltage into Grafana

I’ve succeeded in getting a working battery monitor kernel module. This is basically taking the application note by Technologic Systems and spinning that into a power supply class driver that reports the voltage via sysfs.

As it happens, the battery module in collectd does not see this as a “battery”, something I’ll look at later. For now the exec plug-in works well enough. This feeds through eventually to an InfluxDB database with Grafana sitting on top.

https://netmon.longlandclan.id.au/d/IyZP-V2mk/battery-voltage?orgId=1