Recently, I noticed my network monitoring was down… I hadn’t worried about it because I had other things to keep me busy, and thankfully, my network monitoring, whilst important, isn’t mission critical.
I took a look at it today. The symptom was an odd one, influxd
was running, it was listening on the back-up/RPC port 8088, but not 8086 for queries.
It otherwise was generating logs as if it were online. What gives?
Tried some different settings, nothing… nada… zilch. Nothing would make it listen to port 8086.
Tried updating to 1.8 (was 1.1), still nothing.
Tried manually running it as root
… sure enough, if I waited long enough, it started on its own, and did begin listening on port 8086. Hmmm, I wonder. I had a look at the init scripts:
#!/bin/bash -e
/usr/bin/influxd -config /etc/influxdb/influxdb.conf $INFLUXD_OPTS &
PID=$!
echo $PID > /var/lib/influxdb/influxd.pid
PROTOCOL="http"
BIND_ADDRESS=$(influxd config | grep -A5 "\[http\]" | grep '^ bind-address' | cut -d ' ' -f5 | tr -d '"')
HTTPS_ENABLED_FOUND=$(influxd config | grep "https-enabled = true" | cut -d ' ' -f5)
HTTPS_ENABLED=${HTTPS_ENABLED_FOUND:-"false"}
if [ $HTTPS_ENABLED = "true" ]; then
HTTPS_CERT=$(influxd config | grep "https-certificate" | cut -d ' ' -f5 | tr -d '"')
if [ ! -f "${HTTPS_CERT}" ]; then
echo "${HTTPS_CERT} not found! Exiting..."
exit 1
fi
echo "$HTTPS_CERT found"
PROTOCOL="https"
fi
HOST=${BIND_ADDRESS%%:*}
HOST=${HOST:-"localhost"}
PORT=${BIND_ADDRESS##*:}
set +e
max_attempts=10
url="$PROTOCOL://$HOST:$PORT/health"
result=$(curl -k -s -o /dev/null $url -w %{http_code})
while [ "$result" != "200" ]; do
sleep 1
result=$(curl -k -s -o /dev/null $url -w %{http_code})
max_attempts=$(($max_attempts-1))
if [ $max_attempts -le 0 ]; then
echo "Failed to reach influxdb $PROTOCOL endpoint at $url"
exit 1
fi
done
set -e
Ahh right, so start the server, check every second to see if it’s up, and if not, just abort and let systemd
restart the whole shebang. Because turning the power on-off-on-off-on-off is going to make it go faster, right?
I changed max_attempts
to 360
and the sleep
to 10
.
Having fixed this, I am now getting data back into my system.
Recent Comments