3 Ubiquiti Pro Max switches in a rack, demonstrating the Etherlighting feature that makes cable ends glow.

Taking a fresh look at Ubiquiti

Having spent most of the last 5 years steeped in the HPE/Aruba networking ecosystem, I didn’t pay much attention to what Ubiquiti has been doing, as their primarily Prosumer and small business market segment didn’t significantly overlap with the large global customers I was working with at Aruba. HPE did venture into the SMB space with the InstantON product line, and have been reasonably successful at it, but the consulting work I was doing rarely went anywhere near that space. Now that I’m free of any competitive constraints. I’m taking another look at a product line that has clearly evolved.

About a year or so ago, I saw a marketing slide that surprised me, indicating that the #3 player in the Wi-Fi space (measured by total access points deployed) behind 800-pound gorilla Cisco and #2 Aruba, was, in fact, Ubiquiti.

Ubiquiti is a company that originally made a name for themselves by selling proprietary (AirMax) Wi-Fi-ish point-to-point and service provider gear, and eventually jumped into Wi-Fi. They quietly built themselves into a nearly US$2B/year operation by selling inexpensive (but broadly adequate) Wi-Fi hardware that appealed to small businesses, and they were largely ignored by the “Big Players” for years.

It had long been my own experience that the Ubiquiti ecosystem was decent but had some significant scalability limitations in terms of their management platform. This was especially clear after I got down and dirty with some absolutely massive deployments while working at Aruba. Over the years prior, I’ve deployed a number of Ubiquiti-based solutions and periodically watched them release a few products that were real stinkers (both hardware and software) or distractions from the core product line (anyone remember mFi? their solar product? the first 802.11ac access point to ship?). They gained a reputation (and it was largely deserved) by marketing the product as plug-and-play that required minimal engineering skill, along with a bunch of defaults that really weren’t doing anyone much good. They were the “cheap” brand, preferred by minimally competent “trunk-slammers” and customer bean counters and IT departments who didn’t really understand Wi-Fi, leading to a lot of really bad installations. They were also plagued by supply chain and quality challenges for a very long time.

But over the last several years, other consumer-focused companies like Netgear, TP-Link, Linksys (fresh off the utter debacle that was Cisco’s ownership of the brand), new players like Google (Nest) and Amazon (Eero), and eventually larger legacy players like Aruba (InstantON) started realizing that Ubiquiti did in fact have the prosumer market largely locked up. They decided they wanted a piece of that pie, right around the same time that the consumer market started realizing that it needed multi-AP solutions, especially in the US market where homes were growing ever larger, and all the while, ISPs were providing equipment was almost criminally cheap and awful.

This new competition ultimately proved to be a good thing for Ubiquiti, as complacency had been taking hold. The competition breathed new life into the company, who began to make some really smart hiring decisions on the product and development side.

Meanwhile, the founder of Ubiquiti, Robert Pera, decided to do as so many company founders with mountains of wealth do, and invested some of it into a professional sports team, the Memphis Grizzlies of the National Basketball Association. Along with the team came the management and operation of their home arena, the FedExForum. As it would never do for a competitor’s networking product to be in the arena, the team set about deploying Ubiquiti’s Wi-Fi product to serve the arena, at a scale that the product had never been deployed before. This ultimately proved that the product can in fact scale, if given the proper engineering attention by someone who actually knows what they’re doing (their lead network architect is a good friend and colleague), and product support and development that can integrate feedback and the unique challenges posed by Large Public Venues. This gave the hardware and software some much-needed improvements.

Most recently, Ubiquiti put in a long-awaited appearance at Mobility Field Day 11, and got to present their product portfolio and roadmap to deeply experienced professionals in the wireless network space. They came to announce that they have been listening to extensive feedback from Wi-Fi pros (including yours truly), and how they’re starting to make moves into offering truly enterprise-grade products and services. In doing so, they’ve announced that they’re coming to play, and I’m hopeful that they provide some much-needed competition to the likes of Cisco (especially Meraki), Aruba, Juniper, Extreme, and Ruckus, in the same way the consumer players did for them. The big players have gotten complacent too, and it’s high time the market got shaken up.

I was fortunate enough to be part of the beta test audience for their MFD presentation and received a rather generous shipment of lab equipment in exchange for my time and expertise. Over the next several posts, I’ll go over the unboxing and commissioning of the gear and provide my thoughts on it. Stay tuned!

A nice cup of MoCA…

Let’s jump into the time machine and head back to the turn of the century (21 years ago, y’all… can you believe it?). It was a time when cable TV was king, and you could usually count on a cable outlet in almost every room of the house, when a cable TV package could easily come with half a dozen converter boxes, before the term “cord-cutter” struck fear into the hearts of cable executives. and when Netflix was an upstart DVD by mail company. This was also when a brand new technology called “Wi-Fi” had just showed up on the scene. Broadband internet (a whole 5 megabits!) was starting to find its way into homes served by cable TV, and it made dialup look severely lame. Usually these “cable modems” were hooked directly up to a single computer, either via USB, or via Ethernet if your computer was really snazzy. Often, these computers were directly connected to the internet with no firewall software, which led to all kinds of shenanigans.

Ah, those were the days.

If you had a home built around that time, chances are, the builder put coaxial cable into every room they could think of so you could have TV everywhere. And they’d usually string a daisy chained chunk of Cat5 for telephones. If they were really fancy, they would run each cable and phone outlet back to a central point where you could pick and chose where the signals went.

The challenge is that while technology changes every few years, the wiring in a house is generally put in place with little thought given to even the near future. In 2000, only the serious nerds (such as yours truly) had computers (plural) in their homes. The idea of the networked home and the Internet of Things was still a long way off.

If you were a nerd with computers (plural) and so fortunate as to have a home whose Cat5 phone cables were “home-run” back to a central interconnect (where they were usually all spliced together on a single pair for voice), you could reterminate them on both ends with a modular jack and use them for Ethernet (the idea of a router at home with NAT was still pretty new back then as well). In most cases, the runs were short enough that when gigabit Ethernet started showing up, you could still make the Cat5 work.

Recently, I had to figure out how to connect up a bunch of access points in a few homes that were built in the 1999-2000 time frame. One is the rental I just moved into, and the other is a moderately sized home owned by a client who has found himself and his family working from home a lot more lately, just like the rest of us.

My home was wired to nearly every room with home run Cat5 and coax (lucky me!). Since I have buckets full of Cat5e jacks, it was a pretty simple swap on both ends and I got gigabit. Didn’t require much effort, and thankfully didn’t require causing any damage to the rental house, which the landlord tends to get cranky about.

The client’s home, on the other hand, had daisy chained telephone line and coaxial cable throughout. And since it’s a higher end home, running ethernet cable to each room is a non-starter (not to mention expensive and disruptive). And, of course, the cable modem/router/wireless/waffle iron/juicer/vacuum combo device provided by the cable company is as far across the house from the home office as you can possibly get without actually putting it in the neighbors’ house. Cable installers love outside walls, which are about the worst possible place to put a wireless access point. Zoom calls can get a little frustrating and embarrassing when you’re the presenter and your connection sucks…

So how to get a decent connection up to the office and elsewhere in the client’s house to blanket it with wifi? Thankfully, 20 years of innovation has happened, and the chip makers and the cable companies got together to solve this problem, because they needed to deliver services over IP within the homes as well. What they came up with is the deliciously named “MoCA“, which stands for “Multimedia over Coax Alliance”. They figured out a way to be able to run a digital network signal over the existing coax wires present in most houses, and make it compatible with Ethernet.

Early versions weren’t very fast (version 1.0 in 2006 was capable of 100Mbps), but as they applied some of the same RF tricks that Wi-Fi used, they were able to make it perform at a much higher level (Version 2.5, released in 2016, is capable of 2.5Gbps). Version 3 aims to provide 10Gbps.

MoCA will support up to 16 nodes on the wire, and can coexist with some shockingly bad signal conditions. It operates from 1125MHz up to 1675MHz, which is above where cable TV signals live but still quite functional over short distances with existing coaxial cable and splitters. It forms a full mesh where each node talks directly to the other nodes that it needs to, using a combination of Time-Division Multiple Access (TDMA) and Orthogonal Frequency Division Multiple Access (OFDMA), a trick that is also used by WiFi 6/802.11ax to make better use of airtime.

If you want a quick summary of how it works, device maker GoCoax has a great rundown on their home page.

MoCA also requires putting in a filter between the pole and your house so that your MoCA signals don’t end up putting your neighbors on the same network or screwing with the cable company’s lines.

Most current cable company provided gateways also support MoCA, and adding a MoCA transceiver to a live coaxial port on the wall in your house basically acts as another ethernet port on the gateway device. Cable companies commonly use this for IP based set-top boxes (over coax!) and additional wireless access points (such as Cox’s “Panoramic WiFi” and XFinity’s “XFi pods”).

While I haven’t tested the cable company’s wireless offerings (because I’m not a masochist, and I have access to vastly better wifi gear), I did want to find out how well MoCA performed as a straight Ethernet bridge for connecting up the client’s access points in such a way that I didn’t have to use wireless meshing, which performs quite poorly in most residential environments.

So I grabbed a couple of MoCA adapters (and a splitter) from Amazon and tried it out in a couple of different configurations. Testing was done from a MacBook Pro connected to the network via Ethernet, and a WLANpi connected on the other end of a MoCA adapter.

The test setup.

The first thing I noticed is that these devices are truly “plug and play”. I hooked one up to the coax in my office and the Ethernet side went into my switch. I then hooked 3 more up around the house, and on two of them, hooked up an access point, and on the third, the WLANpi. The access points came up and showed up in the controller just like they would on Ethernet (caveat: I had to power them externally). The WLANpi grabbed a DHCP address, and I started testing, using the librespeed web speed test built into the WLANpi, as well as iPerf3, also built into the Pi.

First, the baseline with the WLANpi connected directly to the switch. Pretty solid, about what you would expect from a gigabit network.

Next: The WLANpi at the other end of a 4-node MoCA 2.5 network:

An ever so slight reduction in throughput, and an extra few milliseconds of latency.

Directly connecting two nodes performed similarly.

So, bottom line, MoCA is a pretty solid option if all you have available is coax. It has the full wire speed, and doesn’t introduce the kind of latency that a wifi mesh does.

Downside: The MoCA spec doesn’t seem to provide for any means of powering converters centrally, or pushing PoE to the Ethernet device.

Other MoCA devices worth looking at:

  • Kiwee Broadband, has a passthru port as well as a second Ethernet port.
  • GoCoax, another inexpensive option that works on v2.5.

“It’s ALWAYS DNS (or DHCP)”

There’s a common saying among my network engineering peers: “It’s ALWAYS DNS!”. For those not familiar with the concept, this refers to the alarming regularity with which networking troubles end up being caused by something trivial, such as name resolution. And when it’s not DNS, it’s usually DHCP. Those two troublemakers alone are responsible for some ridiculously large percentage of network support issues. (At least until someone at a tier 1 provider inserts a typo into a route table advertised to half the internet via BGP, and takes everything down, but I digress.)

Last weekend, I rebuilt my home wireless network from an Aruba Instant cluster back to a controller based network, using ClearPass as an authentication and authorization backend for the home network. Gross overkill for a home network, but it gives me stick time on stuff that I need to know for work, at a much grander scale.

But first, a little background into the Aruba Way of doing things: In an Instant cluster, the wireless networks are bridged to a VLAN that is trunked to the access point. You can also do this with campus networking, but managing all those VLANs on every port that feeds an access point is usually a recipe for forgetting something vital. So the campus model lets you build a single access VLAN on your AP ports, and the AP establishes a GRE tunnel back to the controller cluster (which also allows for some great redundancy and high availability options), and the various VLANs terminate on the user anchor controllers (because each user has their own tunnel back to the controller, which allows you to segment their traffic out and handle it at layers 4-7 based on a variety of rules, and the only thing going over the wire is an encrypted tunnel, which is a significantly better security posture should someone unethically decide to monitor traffic on a switch port when they shouldn’t.

This is also where ClearPass comes into play – How user sessions and traffic are handled is defined in roles. Each role consists of various rules. How roles are applied are defined by policies. You can map roles to users and/or machines with the magic of ClearPass, and then when someone connects to the wireless network, ClearPass can return a role (and it can map a different role based on whether you authenticated with a username/password, a certificate, or any one of a number of other data inputs). Basically, when ClearPass returns the OK to the controller, it also includes a bunch of attributes for that user, including roles. It’s extremely powerful magic, and when wielded wrong, it can cause no end of heartache trying to figure out just what exactly went haywire. And I’m still very much a ClearPass n00b.

Which brings me back to my newly built and ClearPass-enabled network. And so like every good story…

No $#!+, there I was…

When I connected, it would take a good 10 minutes before I could access the internet. And so, I’m wondering what I screwed up in my ClearPass setup that would have done this… But the roles were being assigned correctly, and the rules associated with those roles were pretty straightforward: “allow all”. So why in the heck were devices on the home network taking forever and a week to get an address? This was not happening on my IoT and guest networks.

First, I realized that my devices were associating just fine, so ClearPass and the role derivation were working correctly, which immediately acquitted the Wi-Fi (but as far as the others in my house were concerned, the Wi-Fi was still screwed up). But that meant I had a good Layer 2 connection. I tried to make sure that the VLAN was properly connected from the pfSense router to the core switch, and the controllers (running in VMWare) were properly trunking to the distributed vSwitch and also out to the core switch. Everything on that front looked good. I tried manually assigning IPs to the wireless clients on the home LAN, and they worked great. So L3 worked, which implied L2 did as well. And when clients on the home network did eventually get an IP address, they worked fine as well. So nothing was being bottlenecked anywhere either (I should hope not, as the VMWare hosts and the router are all connected to the core switch with dual 10-gigabit fiber links!).

After a few days of racking my brain over this, and hearing the people who live in my house continue to complain about network weirdness (thankfully, my family is not doing virtual school/work… except for me), I finally resigned myself to doing what I should have done in the first place: Breaking out Wireshark and figure out just what was actually happening on the network. DHCP is pretty simple, so finding out what broke should be straightforward, right?

Quick refresher on DHCP: The process of obtaining a DHCP address goes like this:

Since I knew I had good L2 connectivity, I fired up Wireshark on my laptop, capturing what was going on at L2, and would move to other points in the network if I needed to. The first thing I saw is that a residential network, even with isolated guest and IoT traffic, while nobody else in the house is using it, is a fairly chatty place. I saw a bunch of multicast traffic (I have a lot of Apple devices), even IP broadcast traffic. And there, among all that, was the DHCP process. Discover. Discover. Discover. Offer. Request. Request. Request. Discover. Discover. Offer. Request. Request. Request. Discover. Discover. Discover. Offer. Request. Request. Request. The more astute among you may have noticed something missing from this sequence. Something rather… important.

Turns out, my DHCP server was making an offer, and then ghosting my devices as soon as they responded to that offer. And periodically, a DHCP ACK would sneak through. And by now, it had started happening on my IoT network as well, as half my Nest Protect alarms were now showing offline. But that told me one very important thing: that my DHCP server was in fact online, reachable, and responding. Up until that very last point.

So I then did what any sane engineer would do:

I had already restarted the dhcpd on my pfSense box, so I didn’t have much faith in the curative effects of a digital boot to the head, but what the heck, can’t hurt, right?

And that’s when I saw it. I went down to my lab, and there, on the front of the DL360 that is running my router, is an angry orange light which should normally be a happy little green. Uh-oh.

So, I pop out the handy little SID tray, to see what it’s angry about… And this is not something a server admin wants to see:

Yep, that’s flagging all three memory modules in Processor 1’s Bank A. This just became more than a simple reboot. Sure enough, when it went through POST, it flagged all three modules. Power off, slide out the server (rails FTW), and perform that tried and true troubleshooting method I learned and perfected in the Air Force a quarter century ago: Swaptronics. Move a suspected bad component and see if the problem follows. So, I switched all the DIMMs from bank A with those in Bank B. If the fault stayed with Bank A, then I had a bum system board. If the fault followed the DIMMs to Bank B, then the fault was in the DIMMs. I really wanted the fault to follow the DIMMs.

Plug it all back in, and fire it up, and the fault was…

NOW IN BANK B!!!! Hallelujah, I don’t have a bad server on my hands!

So now I shut it down, tossed the bad DIMMs in the recycling bin (yes, our recycling pickup actually takes e-waste, which is really nice when you’re a nerd with way too many electronic bits), and repopulated/balanced the banks (I also had to remove a fourth DIMM to keep things even, but it’s a known good part, so it did not go to recycling).

I fire the machine back up, and yay, it’s no longer grumpy about the bad memory, although it is briefly perplexed by the fact that it now only has 24GB instead of 32GB, and has somehow realized that it just had a partial lobotomy. After a few minutes of much more intensive self-testing than usual, it boots up pfSense, and gives me the happy beeps that pfSense does when it’s fully booted (for those of us who run our pfSense boxes headless!)

The moment of truth: I connect my laptop to the Wi-Fi (with the wireshark still circling)… and sure enough, the DHCP ACK comes through on the first try… So as near as I can tell, whatever part of the system RAM contained the bit of code required to send the DHCP ACK had suffered some kind of stroke, but not one severe enough to take the whole box or even the operating system down.

See? It’s always DHCP.

EDIT: Turns out there was also more to this – Wired clients (and access points) started getting DHCP right away after fixing this, but wireless was still giving me fits. As it turned out, There was something about the Aruba mobility controllers terminating user sessions that played havoc with the hashing algorithms that VMWare uses to handle NIC teaming on switch uplinks, and the ACKs were coming back through a different path and getting lost along the way.

For the moment, I disabled one of the 10G links to the switch until I can figure out what magic incantations I need to make on the vSwitch to get the hashing algorithms to properly use the multiple connections with the VMCs – or I may just use the second 10G interfaces for vMotion or something.

and that, kids, is how I used Wireshark to diagnose a system memory problem.