kernel-hacking

Solar Cluster: Kernel driver now up on Github

So, I’m happy enough with the driver now that I’ll collapse down the commits and throw it up onto the Github repository.  I might take another look at kernel 4.18, but for now, you’ll find them on the ts7670-4.14.67 branch.

Two things I observe about this voltage monitor:

  1. The voltage output is not what you’d call, accurate.  I think it’s only a 10-bit ADC, which is still plenty good enough for this application, but the reading I think is “high” by about 50mV.
  2. There’s significant noise on the reading, with noticeable quantisation steps.

Owing to these, and to thwart the possibility of using this data in side-channel attacks using power analysis, I’ve put a 40-sample moving-average filter on the “public” data.

Never the less, it’s a handy party trick, and not one I expected these devices to be able to do.  My workplace manages a big fleet of these single-board computers in the residential towers at Barangaroo where they spend all day polling Modbus and M-Bus meters.  In the event we’re at all suspicious about DC power supplies though, it’s a simple matter to load this kernel tree (they already run U-Boot) and configure collectd (which is also installed).

I also tried briefly switching off the mains power to see that I was indeed reading the battery voltage and not just a random number that looked like the voltage.  That yielded an interesting effect:

You can see where I switched the mains supply off, and back on again.  From about 8:19PM the battery voltage predictably fell until about 8:28PM where it was at around 12.6V.

Then it did something strange, it rose about 100mV before settling at 12.7V.  I suspect if I kept it off all night it’d steadily decrease: the sun has long set.  I’ve turned the mains charger back on now, as you can see by the step-rise shortly after 8:44PM.

The bands on the above chart are the alert zones.  I’ll get an email if the battery voltage strays outside of that safe region of 12-14.6V.  Below 12V, and I run the risk of deep-cycling the batteries.  Above 14.6V, and I’ll cook them!

The IPMI BMCs on the nodes already sent me angry emails when the battery got low, so in that sense, Grafana duplicates that, but does so with pretty charts.  The BMCs don’t see when the battery gets too high though, for the simple matter that what they see is regulated by LDOs.

Solar Cluster: Getting the battery voltage into Grafana

I’ve succeeded in getting a working battery monitor kernel module. This is basically taking the application note by Technologic Systems and spinning that into a power supply class driver that reports the voltage via sysfs.

As it happens, the battery module in collectd does not see this as a “battery”, something I’ll look at later. For now the exec plug-in works well enough. This feeds through eventually to an InfluxDB database with Grafana sitting on top.

https://netmon.longlandclan.id.au/d/IyZP-V2mk/battery-voltage?orgId=1

Solar Cluster: Kernel driver debugging

So, I successfully last night, parted the core bits out of ts_wdt.c and make ts-mcu-core.c.  This is a multi-function device, and serves to provide a shared channel for the two drivers that’ll use it.

Tonight, I took a stab at writing the PSU part of it.  Suffice to say, I’ve got work to do:

[  158.712960] Unable to handle kernel NULL pointer dereference at virtual address 00000005
[  158.721328] pgd = c3854000
[  158.724089] [00000005] *pgd=4384f831, *pte=00000000, *ppte=00000000
[  158.730629] Internal error: Oops: 1 [#3] ARM
[  158.734947] Modules linked in: 8021q garp mrp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables flexcan can_dev
[  158.755812] CPU: 0 PID: 2059 Comm: cat Tainted: G      D         4.14.67-vrt-ts7670+ #3
[  158.763840] Hardware name: Freescale MXS (Device Tree)
[  158.769008] task: c68f3a20 task.stack: c3846000
[  158.773598] PC is at ts_mcu_transfer+0x1c/0x48
[  158.778073] LR is at 0x3
[  158.780630] pc : []    lr : [<00000003>]    psr: 60000013
[  158.786918] sp : c3847e44  ip : 00000000  fp : 014000c0
[  158.792165] r10: c5035000  r9 : c5305900  r8 : c777b428
[  158.797412] r7 : c0a7fa80  r6 : c777b400  r5 : c5035000  r4 : c3847e6c
[  158.803961] r3 : c3847e58  r2 : 00000001  r1 : c3847e4c  r0 : c07b8c68
[  158.810512] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  158.817671] Control: 0005317f  Table: 43854000  DAC: 00000051
[  158.823440] Process cat (pid: 2059, stack limit = 0xc3846190)
[  158.829212] Stack: (0xc3847e44 to 0xc3848000)
[  158.833611] 7e40:          c05bd150 00000001 00010000 00000004 c3847e48 00000150 c5035000
[  158.841833] 7e60: c777b420 c05bcaa4 c4f70c60 c777e070 c0a7fa80 c05bca20 00000fff c07ad4b0
[  158.850051] 7e80: c777b428 c04e46dc c4f52980 00001000 00000fff c01b1994 c4f52980 c4f70c60
[  158.858268] 7ea0: c3847ec8 ffffe000 00000000 c3847f88 00000001 c0164eac c3847fb0 c4f529b0
[  158.866487] 7ec0: 00020000 b6e3d000 00000000 00000000 c4f73f70 00000800 00000000 c01b0f60
[  158.874703] 7ee0: 00020000 c4f70c60 ffffe000 c3847f88 00000000 00000000 00000000 c013eb84
[  158.882918] 7f00: 000291ac 00000000 00000000 c0009344 00000077 b6e3c000 00000022 00000022
[  158.891135] 7f20: c686bdc0 c0117838 000b6e3c c3847f80 00022000 c686be14 b6e3c000 00000000
[  158.899354] 7f40: 00000000 00022000 b6e3d000 00020000 c4f70c60 ffffe000 c3847f88 c013ed0c
[  158.907571] 7f60: 00000022 00000000 000b6e3c c4f70c60 c4f70c60 b6e3d000 00020000 c000a9e4
[  158.915786] 7f80: c3846000 c013f2d8 00000000 00000000 00000000 00000000 00000000 00000000
[  158.924002] 7fa0: 00000003 c000a820 00000000 00000000 00000003 b6e3d000 00020000 00000000
[  158.932217] 7fc0: 00000000 00000000 00000000 00000003 00020000 00000000 00000001 00000000
[  158.940434] 7fe0: be8a62c0 be8a62ac b6eb77c4 b6eb6b9c 60000010 00000003 00000000 00000000
[  158.948691] [] (ts_mcu_transfer) from [] (ts_psu_get_prop+0x38/0xb0)
[  158.956847] [] (ts_psu_get_prop) from [] (power_supply_show_property+0x84/0x220)
[  158.966036] [] (power_supply_show_property) from [] (dev_attr_show+0x1c/0x48)
[  158.974974] [] (dev_attr_show) from [] (sysfs_kf_seq_show+0x84/0xf0)
[  158.983129] [] (sysfs_kf_seq_show) from [] (seq_read+0xcc/0x4f4)
[  158.990930] [] (seq_read) from [] (__vfs_read+0x1c/0x11c)
[  158.998117] [] (__vfs_read) from [] (vfs_read+0x88/0x158)
[  159.005304] [] (vfs_read) from [] (SyS_read+0x3c/0x90)
[  159.012232] [] (SyS_read) from [] (ret_fast_syscall+0x0/0x28)
[  159.019766] Code: e52de004 e281300c e590e004 e25cc001 (e1dee0b2) 
[  159.026278] ---[ end trace 2807dc313991fd87 ]---

The good news is the machine didn’t crash.c

Solar Cluster: Battery voltage kernel module

So, I had a brief look after getting kernel 4.18.5 booting… sure enough the problem was I had forgotten the watchdog, although I did see btrfs trigger a deadlock warning, so I may not be out of the woods yet.  I’ve posted the relevant kernel output to the linux-btrfs list.

Anyway, as it happens, that watchdog driver looks like it’ll need some re-factoring as a multi-function device.  At the moment, ts-wdt.c claims it based on this binding.

If I try to add a second driver, they’ll clash, and I expect the same if I try to access it via userspace.  So the sensible thing to do here, is to add a ts-companion.c MFD driver here, then re-factor ts-wdt.c to use it.  From there, I can write a ts-psu.c module which will go right here.

I think I’ll definitely be digging into those older sources to remind myself how that all worked.

Solar Cluster: Battery monitor computer ordered

I’ve taken the plunge and gotten a TS-7670 ordered in a DIN-rail mount for monitoring the battery.  Not sure what the shipping will be from Arizona to here, but I somehow doubt I’m up for more than AU$300 for this thing.  The unit itself will cost AU$250.

Some will argue that a Raspberry Pi or BeagleBone would be cheaper, and that would be correct, however by the time you’ve added a DIN-rail mount case, an RS-485 control board and a 12V to 5V step-down power converter, you’d be around that figure anyway.  Plus, the Raspberry Pi doesn’t give you schematics.  The BeagleBone does, but is also a more sophisticated beast.

The plan is I’ll spin a version of Gentoo Linux on it… possibly using the musl C library to keep memory usage down as I’ve gone the base model with 128MB RAM.  I’ll re-spin the kernel and U-Boot patches I have for the latest release.

There will be two functions looked after:

  • Access to the IPMI/L2 management network
  • Polling of the two DC power meters (still to be fully designed) via Modbus

It can report to a VM running on one of the hosts.  I believe collectd has the necessary bits and pieces to do this.  Failing that, I’ve written code before that polls Modbus… I write such code for a day job.