Aruba AP Provisioning

As part of trying to wrap my own head around the various profile dependencies in actually provisioning an Aruba AP , I’ve mapped it out. This is the <stuff> that goes into this process:

provision-ap
read-bootinfo {wired-mac|ip-addr|ap-name} <data>
<stuff>
reprovision {serial|wired-mac|ip-addr|ap-name} <data>

As you go to provision an AP, start on the outside of this map and work your way in. This will make sure that all the various profiles you need are in place. The web UI hides some of this stuff from you and doesn’t organize it as logically as one might expect.

When doing this on the CLI in Mobility Master Conductor, make sure you’re in the right corner of your hierarchy (namely, /md or /md/GROUP). And remember that on MMMCR, show run is not nearly as useful as show config effective… And config purge-pending sure comes in handy when you goof something up.

You can also do show profile-hierarchy but that only shows the profile entries… And it doesn’t fit neatly in a terminal window…

Caveat: This is not comprehensive by any stretch. There are dozens more options, these are just the more common ones. If I goofed, let me know. All the gory details can be found in the ArubaOS User Guide.

“It’s ALWAYS DNS (or DHCP)”

There’s a common saying among my network engineering peers: “It’s ALWAYS DNS!”. For those not familiar with the concept, this refers to the alarming regularity with which networking troubles end up being caused by something trivial, such as name resolution. And when it’s not DNS, it’s usually DHCP. Those two troublemakers alone are responsible for some ridiculously large percentage of network support issues. (At least until someone at a tier 1 provider inserts a typo into a route table advertised to half the internet via BGP, and takes everything down, but I digress.)

Last weekend, I rebuilt my home wireless network from an Aruba Instant cluster back to a controller based network, using ClearPass as an authentication and authorization backend for the home network. Gross overkill for a home network, but it gives me stick time on stuff that I need to know for work, at a much grander scale.

But first, a little background into the Aruba Way of doing things: In an Instant cluster, the wireless networks are bridged to a VLAN that is trunked to the access point. You can also do this with campus networking, but managing all those VLANs on every port that feeds an access point is usually a recipe for forgetting something vital. So the campus model lets you build a single access VLAN on your AP ports, and the AP establishes a GRE tunnel back to the controller cluster (which also allows for some great redundancy and high availability options), and the various VLANs terminate on the user anchor controllers (because each user has their own tunnel back to the controller, which allows you to segment their traffic out and handle it at layers 4-7 based on a variety of rules, and the only thing going over the wire is an encrypted tunnel, which is a significantly better security posture should someone unethically decide to monitor traffic on a switch port when they shouldn’t.

This is also where ClearPass comes into play – How user sessions and traffic are handled is defined in roles. Each role consists of various rules. How roles are applied are defined by policies. You can map roles to users and/or machines with the magic of ClearPass, and then when someone connects to the wireless network, ClearPass can return a role (and it can map a different role based on whether you authenticated with a username/password, a certificate, or any one of a number of other data inputs). Basically, when ClearPass returns the OK to the controller, it also includes a bunch of attributes for that user, including roles. It’s extremely powerful magic, and when wielded wrong, it can cause no end of heartache trying to figure out just what exactly went haywire. And I’m still very much a ClearPass n00b.

Which brings me back to my newly built and ClearPass-enabled network. And so like every good story…

No $#!+, there I was…

When I connected, it would take a good 10 minutes before I could access the internet. And so, I’m wondering what I screwed up in my ClearPass setup that would have done this… But the roles were being assigned correctly, and the rules associated with those roles were pretty straightforward: “allow all”. So why in the heck were devices on the home network taking forever and a week to get an address? This was not happening on my IoT and guest networks.

First, I realized that my devices were associating just fine, so ClearPass and the role derivation were working correctly, which immediately acquitted the Wi-Fi (but as far as the others in my house were concerned, the Wi-Fi was still screwed up). But that meant I had a good Layer 2 connection. I tried to make sure that the VLAN was properly connected from the pfSense router to the core switch, and the controllers (running in VMWare) were properly trunking to the distributed vSwitch and also out to the core switch. Everything on that front looked good. I tried manually assigning IPs to the wireless clients on the home LAN, and they worked great. So L3 worked, which implied L2 did as well. And when clients on the home network did eventually get an IP address, they worked fine as well. So nothing was being bottlenecked anywhere either (I should hope not, as the VMWare hosts and the router are all connected to the core switch with dual 10-gigabit fiber links!).

After a few days of racking my brain over this, and hearing the people who live in my house continue to complain about network weirdness (thankfully, my family is not doing virtual school/work… except for me), I finally resigned myself to doing what I should have done in the first place: Breaking out Wireshark and figure out just what was actually happening on the network. DHCP is pretty simple, so finding out what broke should be straightforward, right?

Quick refresher on DHCP: The process of obtaining a DHCP address goes like this:

Since I knew I had good L2 connectivity, I fired up Wireshark on my laptop, capturing what was going on at L2, and would move to other points in the network if I needed to. The first thing I saw is that a residential network, even with isolated guest and IoT traffic, while nobody else in the house is using it, is a fairly chatty place. I saw a bunch of multicast traffic (I have a lot of Apple devices), even IP broadcast traffic. And there, among all that, was the DHCP process. Discover. Discover. Discover. Offer. Request. Request. Request. Discover. Discover. Offer. Request. Request. Request. Discover. Discover. Discover. Offer. Request. Request. Request. The more astute among you may have noticed something missing from this sequence. Something rather… important.

Turns out, my DHCP server was making an offer, and then ghosting my devices as soon as they responded to that offer. And periodically, a DHCP ACK would sneak through. And by now, it had started happening on my IoT network as well, as half my Nest Protect alarms were now showing offline. But that told me one very important thing: that my DHCP server was in fact online, reachable, and responding. Up until that very last point.

So I then did what any sane engineer would do:

I had already restarted the dhcpd on my pfSense box, so I didn’t have much faith in the curative effects of a digital boot to the head, but what the heck, can’t hurt, right?

And that’s when I saw it. I went down to my lab, and there, on the front of the DL360 that is running my router, is an angry orange light which should normally be a happy little green. Uh-oh.

So, I pop out the handy little SID tray, to see what it’s angry about… And this is not something a server admin wants to see:

Yep, that’s flagging all three memory modules in Processor 1’s Bank A. This just became more than a simple reboot. Sure enough, when it went through POST, it flagged all three modules. Power off, slide out the server (rails FTW), and perform that tried and true troubleshooting method I learned and perfected in the Air Force a quarter century ago: Swaptronics. Move a suspected bad component and see if the problem follows. So, I switched all the DIMMs from bank A with those in Bank B. If the fault stayed with Bank A, then I had a bum system board. If the fault followed the DIMMs to Bank B, then the fault was in the DIMMs. I really wanted the fault to follow the DIMMs.

Plug it all back in, and fire it up, and the fault was…

NOW IN BANK B!!!! Hallelujah, I don’t have a bad server on my hands!

So now I shut it down, tossed the bad DIMMs in the recycling bin (yes, our recycling pickup actually takes e-waste, which is really nice when you’re a nerd with way too many electronic bits), and repopulated/balanced the banks (I also had to remove a fourth DIMM to keep things even, but it’s a known good part, so it did not go to recycling).

I fire the machine back up, and yay, it’s no longer grumpy about the bad memory, although it is briefly perplexed by the fact that it now only has 24GB instead of 32GB, and has somehow realized that it just had a partial lobotomy. After a few minutes of much more intensive self-testing than usual, it boots up pfSense, and gives me the happy beeps that pfSense does when it’s fully booted (for those of us who run our pfSense boxes headless!)

The moment of truth: I connect my laptop to the Wi-Fi (with the wireshark still circling)… and sure enough, the DHCP ACK comes through on the first try… So as near as I can tell, whatever part of the system RAM contained the bit of code required to send the DHCP ACK had suffered some kind of stroke, but not one severe enough to take the whole box or even the operating system down.

See? It’s always DHCP.

EDIT: Turns out there was also more to this – Wired clients (and access points) started getting DHCP right away after fixing this, but wireless was still giving me fits. As it turned out, There was something about the Aruba mobility controllers terminating user sessions that played havoc with the hashing algorithms that VMWare uses to handle NIC teaming on switch uplinks, and the ACKs were coming back through a different path and getting lost along the way.

For the moment, I disabled one of the 10G links to the switch until I can figure out what magic incantations I need to make on the vSwitch to get the hashing algorithms to properly use the multiple connections with the VMCs – or I may just use the second 10G interfaces for vMotion or something.

and that, kids, is how I used Wireshark to diagnose a system memory problem.

Hands On : Aruba Instant

After our quick little tour of Aruba InstantON, I’m going to move up to the next level of Aruba gear: Instant.

The naming can be a little confusing to the ArubaNoob, but Instant has been part of Aruba’s product offering for a very long time. While it appears controllerless, it still makes use of a virtual controller that lives inside the APs on the network (and in case the AP running the controller goes offline, the remaining APs on the network decide on a new leader by holding a rap battle or a dance-off. OK, just kidding. They actually do a sort of digital version of Rock, Paper, Scissors, Lizard, Spock.

This virtual controller concept has also been done by Ruckus with their Unleashed platform, which in terms of functionality is somewhere between Instant and InstantON, and Cisco’s Mobility Express. I’m not 100% sure, but I think Aruba had it first.

In previous generations of Aruba access points, you either purchased an Instant AP (IAP), a Campus AP (CAP) , or a Remote AP (RAP). The latter two required a Mobility Controller (MC). You definitely couldn’t RAP without an MC. Now, all APs ship as Universal APs and figure out which mode to be when they boot up, and can be easily converted from one to the other (in the dog park that is Ruckus Unleashed, you would have to reimage the AP with new firmware).

Who it’s meant for

Instant is designed for small and medium business environments, and home labs of geeks who subscribe to the idea of “if it’s worth doing, it’s worth overdoing” (My home wireless network right now consists of 7 APs in an Instant cluster). It also is very useful in large enterprises that consist of many small locations, especially once you start managing them all with Central. If you have a chain of coffee shops or boutiques that only require a few APs, then Instant+Central is definitely something you should look at. If you only have one, InstantON is more your speed.

Instant does not require any per-AP licensing, but it still includes a lot of the features you find on the campus systems. It even includes an internal RADIUS server and user database so you can do enterprise authentication (as of 8.7 which was just released in July 2020, you can even do up to 24 unique passphrases with MPSK before having to get ClearPass involved, which is real handy for IoT networks that use crappy chipsets that don’t support enterprise auth). It will also do an internal captive portal. It still has role-based access control, which provides layer 3 policy enforcement at the AP, including content filtering. And much like the InstantON APs can do, you can even use an Instant AP as your internet gateway (guess where InstantON learned it from?). You can even use it with ClearPass and all the goodies that come with that.

When a Universal AP powers up, it goes through the following process:

If setup mode is not accessed within a period of 15 minutes, the UAP reboots and goes through the process again. It can be a lonely existence. (this mode is not unusual to find in large campus networks where there exists a network disconnect at Layer 2 or Layer 3 between the AP and the controller. Chasing these down on a cruise ship is maddening… but it gets you a lot of steps.)

Setup Mode

Once the AP is in setup mode, it will broadcast an open SSID called SetMeUp-DD:BE:EF (where the last half is the last half of the wired MAC address of the AP). Connecting to this SSID will bring you to the configuration page (it will even conveniently pop it up in the captive portal window if your OS has such a thing). You can also access this by opening a browser to https://setmeup.arubanetworks.com, which it looks up via mDNS. (Caveat: This doesn’t work so great if the AP does not have an uplink and an IP address on the network, even if that IP is not routable… And accessing it via IP address only redirects to the hostname, and mDNS doesn’t really like not having a network to do its thing. So give it an uplink, even if it’s just a WLANpi.)

I once was traveling through a midwestern airport where I was scanning the wifi (it’s a wifi nerd thing) when I saw a lone AP broadcasting “Instant” (which is what Instant used to do before AOS 8.x). I eventually found the AP in a restaurant, where it was sitting all by itself on the ceiling, still in setup mode with the defaults… A quick peek into the setup page showed that this thing had never been configured… I found the manager to let them know that someone didn’t finish a job they were likely paid handsomely for, and she told me it had been there for almost 3 years and nobody had any idea what it was for or remembered who installed it or when. The airport’s installed public system was Meraki.

Once you’re in the setup interface, you can then configure it to your heart’s content. Then, when you bring up a second and subsequent access points on the network, they will find the first one, grab the configuration, and join the party. This scales surprisingly well – you can run several dozen access points on a network like this (There’s no actual hard limit, and it’s been officially tested up to 128 APs, but this is definitely not recommended – that’s well into Campus AP territory). It may not be truly instantaneous (we do love instant gratification), but it’s pretty darn close.

Limitations

There are a few limitations to this mode of operation, in addition to the aforementioned scaling issues (if you’re used to a SOHO/SMB system like Ubiquiti, 100 APs will sound like a lot to you. Once you get into controller based networks with Aruba, even a thousand APs is middle of the road – I routinely work with networks well in excess of this).

A few of the things you can’t do with Instant:

  • AP Groups
  • AirMatch (Instant uses the older ARM techniques for RF management)
  • Tunneling to controller (yet…)
  • I’m probably forgetting some things…

Perhaps the most useful aspect of Instant is that it can either be managed in the cloud with Aruba Central (if you’re used to Meraki, you’ll love Central), or if your network requirements grow to where you need to get a controller involved, switching the APs over to that mode is quick and easy, and you don’t have to buy new gear.

Labbing It Up

If you want to play around with Instant, it’s pretty easy: Buy an AP. Or more. If you have to fund your own lab gear, there’s a ton of used and refurbished Aruba gear on Amazon or eBay (If you go with HPE Renew, you still get HPE’s legendary lifetime warranty on network equipment). Recently, I saw a whole bunch of Renewed AP-345s on ebay for under $200. Just make sure you get the correct country code (US or RW) – the two can’t coexist on the same Instant cluster (in a controller environment, the controller country code takes over and ignores the AP setting).

If you’re new to the Aruba product line, here’s a quick cheat sheet to figure out what kind of AP you’re getting. It’s not 100% exact, but it should give you a general idea of what you should be getting.

The first digit of the 3-digit model number indicates product generation:

  • AP-0XX (or just AP-XX): 802.11g
  • AP-1XX: 802.11n
  • AP-2XX: 802.11ac Wave 1
  • AP-3XX: 802.11ac Wave 2 with integrated BLE
  • AP-5XX: 802.11ax with integrated BLE and ZigBee

The second digit indicates capabilities (1XX series and up)

  • AP-X0X: 2 spatial streams
  • AP-X1X: 3 spatial streams (although the 51X series is 2SS on 2.4GHz and 4SS on 5GHz)
  • AP-X2X: 3 spatial streams, second Ethernet port
  • AP-X3X: 4 spatial streams, SmartRate port, Gigabit Port
  • AP-X4X: 4 spatial streams, dual SmartRate ports, dual-5GHz,
  • AP-X5X: 8 spatial streams, three radios (only AP-555 for now… that thing is a monster)
  • AP-X6X: Outdoor AP with 2 Spatial streams
  • AP-X7X: Outdoor AP with 4 spatial streams
  • AP-X8X: Outdoor AP with 60GHz (only AP-387)

The last digit indicates the antenna type. Odd numbers are internal, even numbers are external.

  • AP-XX3: Internal Omni
  • AP-XX4: Connectorized
  • AP-XX5: Internal Omni
  • AP-XX7: Internal Directional
  • AP-XX8: Connectorized and ruggedized,

APs with the H suffix indicate a wallplate mount designed for the hospitality industry. These APs also have a built-in switch. I love these APs.

Naturally, if you want to get the gory details, head on over to Aruba and look for the data sheet.

Stay tuned for the next Hands On post in which I will discuss Aruba Central.

Disclaimer: Aruba is my employer, but this post reflects my personal experience as a wi-fi nerd with Aruba products. Some APs were purchased on the open market, some were provided to me by my employer for lab use. This is not a paid promotion, and is not official Aruba communication. I am not part of the Instant product team.