Dec 162020
 

Well, this month has been a funny one. When we moved to the NBN back in March, we went from having a 500GB a month quota, to a 100GB a month, with a link speed of 50Mbps.

That seemed, at the time, like a reasonable compromise, since much of the time, my typical usage has been around 60~70GB a month. There’s no Netflicks subscriptions here, but my father does hit YouTube rather hard, and I lately have been downloading music (legally) from time to time.

This year has also seen me working from home, and doing a lot of Slack and Zoom calls. Zoom in particular, is pricey quota-wise, since everyone insists on running webcams. Despite this, the extra Internet use has been manageable. Couple of times we got around 90GB, maybe sailing close to the 100GB, but never over. This is what it looked like last month:

November’s Internet quota usage

This month, that changed:

Internet usage this month

Now, the start of the month data got missed because of a glitch between collectd and the Internode quota monitoring script I have. Two of the spikes can be attributed to:

  • the arrival of a Windows 10-based laptop doing its out-of-box updates (~4GB)
  • my desktop doing its 3-monthly OS updates (~5GB)

That isn’t enough to account for why things have nearly doubled though. A few prospects were in my mind:

  • a web-based script going haywire in a browser (this has happened, and cost me dearly, before)
  • genuine local user Internet activity increases
  • website traffic increases
  • server or workstation compromise

Looking over the netflow data

Now, last time I had this happen, I did two things:

  • I set up collectd/influxdb/Grafana to be able to monitor my Internet usage and quota
  • I set up nfcapd on the border router to monitor my usage

This is pretty easy to set up in OpenBSD, and well worth doing.

I keep about 30 days’ worth of netflow data on the border router. So naturally, I haul that back to my workstation and run nfdump over it to see what jumps out.

Looking through the list of “flows”, one target identified was a development machine hosted at Vultr… checking the IP address, revealed it was one of the WideSky test instances my workplace uses, about 5GB of HTTP requests and about 4GB of VPN traffic — admittedly the couple of WideSky hubs I have here have the logging settings cranked high.

That though doesn’t explain it. The bulk of the traffic was scattered amongst a number of hosts. I didn’t see it until I tried aggregating it by /16 subnet:

RC=0 stuartl@rikishi /tmp $ nfdump -R /tmp/nfcapd -A srcip,dstip -o long6 -O bytes 'net 114.119.0.0/16'  
Date first seen          Duration Proto                             Src IP Addr:Port                                 Dst IP Addr:Port     Flags Tos  Packets    Bytes Flows
2020-11-27 23:11:30.000 1630599.000 0                             150.101.176.226:0     ->                         114.119.146.185:0     ........   0    4.7 M    6.8 G  2535
2020-11-22 13:02:41.000 2099541.000 0                             150.101.176.226:0     ->                         114.119.133.234:0     ........   0    4.3 M    6.1 G  2376
2020-11-18 14:38:42.000 2439079.000 0                             150.101.176.226:0     ->                         114.119.140.107:0     ........   0    3.8 M    5.4 G  2418
2020-11-20 10:43:58.000 2280070.000 0                             150.101.176.226:0     ->                          114.119.141.52:0     ........   0    3.7 M    5.3 G  2421
2020-11-21 22:34:35.000 2151244.000 0                             150.101.176.226:0     ->                         114.119.159.109:0     ........   0    3.4 M    4.9 G  2446
2020-11-24 00:11:52.000 1972657.000 0                             150.101.176.226:0     ->                          114.119.136.13:0     ........   0    3.4 M    4.8 G  2399
2020-11-25 04:24:32.000 1870854.000 0                             150.101.176.226:0     ->                         114.119.136.215:0     ........   0    3.3 M    4.8 G  2473
2020-11-24 15:49:55.000 1916848.000 0                             150.101.176.226:0     ->                           114.119.151.0:0     ........   0    3.0 M    4.4 G  2435
2020-11-27 20:15:43.000 1641316.000 0                             150.101.176.226:0     ->                         114.119.129.181:0     ........   0    2.6 M    3.7 G  2426
2020-11-27 21:38:37.000 1636635.000 0                             150.101.176.226:0     ->                          114.119.159.16:0     ........   0    2.5 M    3.6 G  2419
2020-11-27 23:11:30.000 1630599.000 0                             114.119.146.185:0     ->                         150.101.176.226:0     ........   0    4.1 M  175.9 M  2535
…
2020-11-19 22:02:04.000     0.000 0                             150.101.176.226:0     ->                         114.119.138.111:0     ........   0        3      132     1
2020-11-25 03:37:11.000     0.000 0                             150.101.176.226:0     ->                          114.119.152.27:0     ........   0        3      132     1
2020-12-06 19:59:49.000     0.000 0                             150.101.176.226:0     ->                         114.119.151.153:0     ........   0        3      132     1
2020-11-22 08:23:11.000     0.000 0                             150.101.176.226:0     ->                          114.119.130.23:0     ........   0        3      132     1
2020-11-25 15:43:47.000     0.000 0                             150.101.176.226:0     ->                         114.119.128.219:0     ........   0        3      132     1
2020-11-24 09:05:13.000     0.000 0                             150.101.176.226:0     ->                          114.119.140.85:0     ........   0        3      132     1
Summary: total flows: 56059, total bytes: 51.7 G, total packets: 65.0 M, avg bps: 150213, avg pps: 23, avg bpp: 794
Time window: 2020-11-13 11:01:52 - 2020-12-16 20:19:41
Total flows processed: 39077053, Blocks skipped: 0, Bytes read: 2698309352
Sys: 3.744s flows/second: 10436251.9 Wall: 15.108s flows/second: 2586482.6 

51.7GB in a month!!! Drilling further, I noted it was mostly targeted at TCP ports 80 and 443, and UDP port 53. Web traffic, in other words. Reverse look-up on a randomly selected IP showed the reverse pointer petalbot-xxx-xxx-xxx-xxx.aspiegel.com, and indeed, in server logs for various sites I host, I saw PetalBot in the user agent.

Plucking some petals off PetalBot

So, I needed to put the brakes on this somehow. I’m fine with them indexing my site, just they should have some consideration and restraint about how quickly they do so.

Thus, I amended pf.conf:

# Rate-limited "friends"
ratelimit_dst4="{ 114.119.0.0/16 }"
#ratelimit_dst6="{ }"

# Traffic shaping queues
queue root on $external  bandwidth 25M max 25M
queue slow parent root   bandwidth 256K max 512K
queue bulk parent root   bandwidth 25M default

# …

# Rate-limit certain targets
pass out on egress proto { tcp, udp, icmp } from any to $ratelimit_dst4 modulate state (pflow) set queue slow
#pass out on egress proto { tcp, udp, icmp6 } from any to $ratelimit_dst6 modulate state (pflow) set queue slow

So, the first line defines the root queue on my external interface, and sets the upload bandwidth for 25Mbps (next month, I will be dropping my speed to 25Mbps in favour of an “unlimited” quota).

Then, I define a queue which is restricted to 256kbps (peak 512kbps), and define all traffic going to a specific list of networks, to use that queue. PetalBot should now see a mere 512kbps at most from this end, which should severely crimp how quickly it can guzzle my quota, whilst still permitting it to index my site.

Yesterday, PetalBot chewed through 8GB… let’s see what it does tomorrow.