cisco-wap4410n

Adventures with UniFi controllers and APs

We’ve had WiFi in one form or another for some years on this network. Originally it started with an interest in the Brisbane Mesh metropolitan area network which more-or-less imploded around 2006 or so. Back then, I think I had one of the few WiFi access points in The Gap. 2.4GHz was basically microwave ovens and not much else. The same is not true today.

WiFi networks in my local area, 2.4GHz isn’t as quiet as it once was.

Since then, the network has changed a bit: from a little D-Link 802.11b AP, we moved to a Prism54g WiFi card (that I still have) with hostapd, using OpenVPN to provide security. That got replaced by a Telstra-branded Netcomm WiFi router which I figured out supported WPA-Enterprise, so I went down the rabbit hole of setting up FreeRADIUS, and we ran that until a lightning strike blew it up. The next consumer AP that replaced it was a miserable failure, so it’s been business APs since then.

Initially a Cisco WAP4410N, which was a great little AP… worked reliably for years, but about 12 months ago I noticed it was dropping packets occasionally and getting a bit intermittent. Thinking that maybe the device is past its prime, I bought a replacement: a WAP150, which proved to be a bit disappointing. Range wasn’t as good compared to the WAP4410N, and I soon found myself moving the WAP150 downstairs to service the network there and re-instating the WAP4410N.

In particular, one feature I liked about the two Cisco units is they support 802.1Q VLANs, with the ability to assign a different WiFi SSID to each. The 4410N could do 4 SSIDs, the 150 8. This is a feature that consumer APs don’t do, and it is a handy feature here as it enables me to have a “work” LAN (with VPN to my workplace) and a “home” LAN which everybody else uses.

Years ago, our Internet usage was over a 512kbps/128kbps ADSL link, and it was mostly Internet browsing… so intermittent packet loss wasn’t a big deal… one AP did just fine. Now with the move to NBN, our telephone service is a VoIP service, and I’m finding that WiFi IP phones are very picky about APs. We have three IP phones and an ATA… the ATA (Grandstream HT814) is Ethernet of course, as is one of the IP phones (Grandstream GXP1615), but the other two IP phones are WiFi (Aristel Wi-Fi Genius X1+ and Grandstream WP810).

The Aristel device in particular, was really choppy… and the first one sent out seemed to be a DoA, with poor performance even when right beside the AP. A replacement was provided under RMA, and this one performed much better, but still suffered intermittent loss. The Grandstream WP810 in general worked, but there were noticeable dead spots in a few areas around the house.

The final straw with the existing pair of APs came at the last Brisbane WICEN meeting, conducted over Zoom… both APs seem to suffer a problem where they started dropping packets and glitching badly. A power-cycle “fixes” the problem, but the issue returns after a week or two. Clearly, they were no longer up to snuff.

The replacements

APs

I procured the following replacements:

I went the long-range one for upstairs since it’s in a high spot (sitting atop a stereo speaker on a top shelf in my room) so would be able to “radiate” over a long distance to hopefully reach down the drive way and into the back-yard.

The other one is to fill in dead spots downstairs, and since it’s going to be pretty much sitting at waist level, there’s no point in it being “long range”.

The devices I bought were purchased through mWave (here and here), as they had them in stock at the time.

Power injectors

These are 48V passive PoE devices… so to make them go, you need a separate power injector. The “standard” Ubiquiti power injector was out-of-stock, but I wanted these to work on 12V anyway, so I looked around for a suitable option. Core Electronics do have some step-up converters which work great for 24V devices, but the range available doesn’t quite reach 48V. I did find though that Telco Antennas sell these 48V PoE injectors. (They also sell the APs here and here, but were out-of-stock at the time of purchase.)

Admittedly, they’re 10/100Mbps only, which means you don’t quite get the full throughput out of the WiFi6 APs, but meh, it’s good enough… if the IP phones need more than 100Mbps, they’ll run up against the 25Mbps limit of the NBN link!

Controller

These APs, unlike the Cisco devices they’re replacing (and everything else I’ve used prior), these have no built-in management interface, they talk to a network controller device… normally the UniFi Cloud Key. I had a run-in with the first generation of these at the Stirling’s Crossing Endurance Centre. For a big network, the idea of a central device does make a lot of sense (that site has 5 UAP-AC-Ms and 3 8-port PoE switches), but for a two-AP network like mine it seemed overkill.

One thing I learned, is these things positively DO NOT like being power-cycled! Repeated power-cycling corrupts the database in very short order, and you find yourself restoring configurations from a back-up soon after. So I was squeamish about buying one of these. The second generation version has its own back-up battery, but reports suggest they can be just as unreliable. In any case, they were out of stock everywhere, and I didn’t want to spring the extra cash for the “plus” model (that has a HDD… not much use to me) or the Dream Machine router.

I did consider using a Raspberry Pi 3, in fact that was my original plan… I had one spare, and so started down the path of setting it up as a UniFi controller… however, ran into two road blocks:

  • UniFi controller at this time requires Java v8… Debian Bullseye ships with v11 minimum
  • UniFi controller needs MongoDB 3.4, which isn’t available on Debian Bullsye on ARM64

I could compile MongoDB, but Java is a whole other issue, and lots of people have complained loudly about this very limitation. If there was one big gripe I’ve got, this would be it.

I did some further research: Ubuntu 20.04 does offer a Java 8 runtime, and on AMD64, I can use existing binaries for MongoDB. I looked around and purchased this small-form-factor PC. Windows 10 went bye byes once I managed to hit F1 at the right point in the BIOS set-up, and Ubuntu 20.04 was PXE-loaded. I could then follow the standard instructions to install via APT. The controller seems to be working fine using OpenJDK JRE v8. I’d recommend this over the licensing quagmire that is using Oracle JRE.

Installation

With a controller, and all the requisite bits, things went smoothly. I found at first, the controller insisted on using 192.168.1.0/24 addresses to talk to the APs… so wound up setting that up in the netplan config. I later found that the UniFi controller won’t let you set a network subnet address unless you turn off Auto Scale Network.

Setting the network subnet is not possible until “Auto Scale Network” is disabled.

So maybe from here-on-in, new APs will appear in the correct subnet, but to be honest, it’s no big deal either way, unless an AP has an untimely end, I shouldn’t need to buy new ones for a while!

Auto-negotiation quirks with Cisco switches

One oddity I noticed was the upstairs (U6 LR) AP was reluctant to communicate via Ethernet, instead funnelling its traffic via the downstairs AP. While it’s handy they can do that, means I don’t necessarily need to worry about powering the upstairs switches in a power outage, the AP should be able to use its Ethernet back-end.

The downstairs one was having no problems, and the set-up was similar: switch port → PoE injector → AP, via short cables. I tried a few different cables with no change. Logged into the switch and had a look, it was set to auto-negotiate, which was working fine downstairs. The downstairs switch is a Netgear GS748T, whereas the one upstairs is a Cisco SG200-08 (not the P version that does PoE).

I found I could log into the AP over SSH (you can provide your SSH key via the UniFi controller)… so I logged in as root and had a look around. They run Linux with (a sadly tainted due to ubnthal.ko and ubnt_common.ko) kernel 4.4, and a Busybox/musl environment with an ARM64 CPU. (Well, the U6 LRs are ARM64, the U6 Lites are MediaTek MT7621s… mipsel32r2 with kernel 5.4.0 and not tainted.) ip told me that eth0 was up, and that the AP’s IP address was assigned to br0 which was also up. brctl told me that eth0 was enslaved by br0. Curiously, /sys/class/net/eth0/carrier was reporting 1, which disagreed with what the switch was telling me.

On a hunch, I tried turning off auto-negotiation, forcing instead 100Mbps full-duplex. Bingo, a link LED appeared. The topology showed the AP was now wired, not talking via downstairs.

Network topology shown in the UniFi Controller UI

Switched back to auto-negotiation, and the AP switched to being a wireless extender with the link LED disappearing from the switch. This may be a quirk of the PoE injectors I’m using, which do not handle 100Mbps, and maybe the switch hasn’t realised this because the AP otherwise “advertises” 1Gbps link capability. For now, I’m leaving that switch port locked at 100Mbps full-duplex. If you have problems with an AP showing up via Ethernet, here’s a place that is worth checking.

Solar Cluster: Networking

So, having got some instances going… I thought I better sort out the networking issues proper.  While it was working, I wanted to do a few things:

  1. Bring a dedicated link down from my room into the rack directly for redundancy
  2. Define some more VLANs
  3. Sort out the intermittent faults being reported by Ceph

I decided to tackle (1) first.  I have two 8-port Cisco SG-200 switches linked via a length of Cat5E that snakes its way from our study, through the ceiling cavity then comes up through a small hole in the floor of my room, near where two brush-tail possums call home.

I drilled a new hole next to where the existing cable entered, then came the fun of trying to feed the new cable along side the old one.  First attempt had the cable nearly coil itself just inside the cavity.  I tried to make a tool to grab the end of it, but it was well and truly out of reach.  I ended up getting the job done by taping the cable to a section of fibreglass tubing, feeding that in, taping another section of tubing to that, feed that in, etc… but then I ran out of tubing.

Luckily, a rummage around, and I found some rigid plastic that I was able to tape to the tubing, and that got me within a half-metre of my target.  Brilliant, except I forgot to put a leader cable through for next time didn’t I?

So more rummaging around for a length of suitable nylon rope, tape the rope to the Cat5E, haul the Cat5E out, then grab another length of rope and tape that to the end and use the nylon rope to haul everything back in.

The rope should be handy for when I come to install the solar panels.

I had one 16-way patch panel, so wound up terminating the rack-end with that, and just putting a RJ-45 on the end in my room and plugging that directly into the switch.  So on the shopping list will be some RJ-45 wall jacks.

The cable tester tells me I possibly have brown and white-brown switched, but never mind, I’ll be re-terminating it properly when I get the parts, and that pair isn’t used anyway.

The upshot: I now have a nice 1Gbps ring loop between the two SG-200s and the LGS326 in the rack.  No animals were harmed in the formation of this ring, although two possums were mildly inconvenienced.  (I call that payback for the times they’ve held the Marsupial Olympics at 2AM when I’m trying to sleep!)

Having gotten the physical layer sorted out, I was able to introduce the upstairs SG-200 to the new switch, then remove the single-port LAG I had defined on the downstairs SG-200.  A bit more tinkering going, and I had a nice redundant set-up: setting my laptop to ping one of the instances in the cluster over WiFi, I could unplug my upstairs trunk, wait a few seconds, plug it back in, wait some more, unplug the downstairs trunk, wait some more again, then plug in back in again, and not lose a single ICMP packet.

I moved my two switches and my AP over to the new management VLAN I had set up, along side the IPMI interfaces on the nodes.  The SG-200s were easy, aside from them insisting on one port being configured with a PVID equal to the management VLAN (I guess they want to ensure you don’t get locked out), it all went smoothly.

The AP though, a Cisco WAP4410N… not so easy.  In their wisdom, and unlike the SG-200s, the management VLAN settings page is separate from the IP interface page, so you can’t change both at the same time.  I wound up changing the VLAN, only to find I had locked myself out of it.  Much swearing at the cantankerous AP and wondering how could someone overlook such a fundamental requirement!  That, and the switch where the AP plugs in, helpfully didn’t add the management VLAN to the right port like I asked of it.

Once that was sorted out, I was able to configure an IP on the old subnet and move the AP across.

That just left dealing with the intermittent issues with Ceph.  My original intention with the cluster was to use 802.3AD so each node had two 2Gbps links.  Except: the LGS326-AU only supports 4 LAGs.  For me to do this, I need 10!

Thankfully, the bonding support in the Linux kernel has several other options available.  Switching from 802.3ad to balance-tlb, resolved the issue.

slaves_bond0="enp0s20f0 enp0s20f1"
slaves_bond1="enp0s20f2 enp0s20f3"
config_bond0="null"
config_bond1="null"
config_enp0s20f0="null"
config_enp0s20f1="null"
config_enp0s20f2="null"
config_enp0s20f3="null"
rc_net_bond0_need="net.enp0s20f0 net.enp0s20f1"
rc_net_bond1_need="net.enp0s20f2 net.enp0s20f3"
mode_bond0="balance-tlb"
mode_bond1="balance-tlb"

I am now currently setting up a core router instance (with OpenBSD 6.1) and a OpenNebula instance (with Gentoo AMD64/musl libc).