Posts Tagged ‘aggregator’

Force10 and vSphere vDS Interoperability Issue

June 10, 2016

dell-force10Recently I had an opportunity to work with Dell FX2 platform from the design and delivery point of view. I was deploying a FX2s chassis with FC630 blades and FN410S 10Gb I/O aggregators.

I ran into an interesting interoperability glitch between Force10 and vSphere distributed switch when using LLDP. LLDP is an equivalent of Cisco CDP, but is an open standard. And it allows vSphere administrators to determine which physical switch port a given vSphere distributed switch uplink is connected to. If you enable both Listen and Advertise modes, network administrators can get similar visibility, but from the physical switch side.

In my scenario, when LLDP was enabled on a vSphere distributed switch, uplinks on all ESXi hosts started disconnecting and connecting back intermittently, with log errors similar to this:

Lost uplink redundancy on DVPorts: “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”, “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”, “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”, “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”. Physical NIC vmnic1 is down.

Network connectivity restored on DVPorts: “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”, “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”. Physical NIC vmnic1 is up

Uplink redundancy restored on DVPorts: “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”, “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”, “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”, “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”. Physical NIC vmnic1 is up

Issue Troubleshooting

FX2 I/O aggregator logs were reviewed for potential errors and the following log entries were found:

%STKUNIT0-M:CP %DIFFSERV-5-DSM_DCBX_PFC_PARAMETERS_MISMATCH: PFC Parameters MISMATCH on interface: Te 0/2

%STKUNIT0-M:CP %IFMGR-5-OSTATE_DN: Changed interface state to down: Te 0/2

%STKUNIT0-M:CP %IFMGR-5-OSTATE_UP: Changed interface state to up: Te 0/2

This clearly looks like some DCB negotiation issue between Force10 and the vSphere distributed switch.

Root Cause

Priority Flow Control (PFC) is one of the protocols from the Data Center Bridging (DCB) family. DCB was purposely built for converged network environments where you use 10Gb links for both Ethernet and FC traffic in the form of FCoE. In such scenario, PFC can pause Ethernet frames when FC is not having enough bandwidth and that way prioritise the latency sensitive storage traffic.

In my case NIC ports on Qlogic 57840 adaptors were used for 10Gb Ethernet and iSCSI and not FCoE (which is very uncommon unless you’re using Cisco UCS blade chassis). So the question is, why Force10 switches were trying to negotiate FCoE? And what did it have to do with enabling LLDP on the vDS?

The answer is simple. LLDP not only advertises the port numbers, but also the port capabilities. Data Center Bridging Exchange Protocol (DCBX) uses LLDP when conveying capabilities and configuration of FCoE features between neighbours. This is why enabling LLDP on the vDS triggered this. When Force10 switches determined that vDS uplinks were CNA adaptors (which was in fact true, I was just not using FCoE) it started to negotiate FCoE using DCBX. Which didn’t really go well.

Solution

The easiest solution to this problem is to disable DCB on the Force10 switches using the following command:

# conf t
# no dcb enable

Alternatively you can try and disable FCoE from the ESXi end by using the following commands from the host CLI:

# esxcli fcoe nic list
# esxcli fcoe nic disable -n vmnic0

Once FCoE has been disabled on all NICs, run the following command and you should get an empty list:

# esxcli fcoe adapter list

Conclusion

It is still not clear why PFC mismatch would cause vDS uplinks to start flapping. If switch cannot establish a FCoE connection it should just ignore it. Doesn’t seem to be the case on Force10. So if you run into a similar issue, simply disable DCB on the switches and it should fix it.

Painless Dell FX2 Firmware Upgrade

April 10, 2016

Overview

Recently I’ve had a chance to play with Dell’s FX2 chassis for a bit. Dell FX2 falls into the category of blade chassis and can hold up to 8 blades with Atom or 4 blades with Xeon CPUs in a 2U chassis.

Dell_FX2

Besides the compute blades FX2 also supports storage blades, which you can dedicate to particular compute blades and use as additional storage.

On the networking side you can choose from either pass-through modules or three types of I/O aggregators – four 10G SFP+ ports, four 10GBASE-T ports, or two Fibre Channel plus two SFP+ external ports.

The chassis itself also comes in two flavors – FX2 or FX2s. The main difference between the two is that FX2s additionally has PCIe slots at the back, which can be mapped to the server blades to provide additional connectivity.

Dell_FX2_Rear

First step of every hardware solution deployment is a firmware upgrade. But when it comes to firmware on Dell blade equipment be it M1000e, VRTX or FX2 you can quickly get confused. Especially when you go to the blade section and see a dozen of hardware components. Download and update each of them individually would be daunting. Fortunately there is an better way.

blade_firmware

CMC Firmware

Upgrade starts from the chassis management controller, which has two components: Chassis Infrastructure Firmware (or Main Board) and the CMC itself. You can find them on the Chassis Overview > Update tab.

CMC firmware comes as an .exe package, which you can extract. You really need just the fx2_cmc.bin file. During upgrade you will lose access to CMC for 5-10 minutes, while CMC is rebooting.

For the infrastructure firmware you’ll need the fx2_mainboard.bin file. The gotcha with the infrastructure firmware upgrade is that you’ll need all blades to be powered off. So if you have just one chassis this might be tricky.

Blade Firmware

Blades firmware is where this gets interesting. You can certainly upgrade all blades from the CMC by downloading firmware from the Dell support web-site and choosing one component at a time in Chassis Overview > Server Overview > Update section. CMC is capable of upgrading say iDRAC across all blades simultaneously, but it’s still about a dozen components.

The easier approach would be to use Dell Repository Manager (DRM). DRM can download firmware for virtually any blade or rack server (including some of the storage and network hardware) and build a bootable ISO image for an easy upgrade.

To build a bootable ISO follow the following steps:

  • Download and install Dell Repository Manager from the Dell support web-site
  • Add a source by going to Source > View Dell Online Catalog
  • Create a repository by going to Repository > New > Create New Repository
  • In the wizard select your hardware (I selected PowerEdge FC630 from the Blade category) and choose Linux (32-bit and 64-bit) as a DUP format (I’ll explain that later).
  • Go to the newly created repository, select the bundle and click Export

export_bundle

DRM can export bundles in multiple forms, we are interested in a bootable ISO and this is why we selected the Linux DUP format when we created the repository. DRM creates a Linux bootable ISO, so there was no point selecting Windows bundles.

  • Select “Create Bootable ISO (Linux Only)” and continue with the default settings for the rest

As a result you will get an .iso file, which you can mount to the server via iDRAC Remote Console and boot from it for a firmware upgrade.

Network I/O Aggregators

FX2 I/O aggregators are Dell Force10 switches, which use Force10 OS (FTOS). FTOS firmware is NOT available from the Dell web-site. You’ll need to register an account at https://www.force10networks.com to download the firmware.

Make sure to download firmware release specifically built for FX2 I/O aggregators, which can be found in M-Series Software section.

aggregators_firmware

To upgrade the aggregators go to Chassis Overview > I/O Module Overview > Update. Aggregators reset after a reboot, so make sure to upgrade them one at a time. Or if you stacked them instead of using VLT or standalone mode, you’ll have to have a downtime, as stacked switches reboot together.

Conclusion

There is nothing fancy in upgrading firmware on a blade chassis, you want it to be quick and painless. Make sure to use Dell Repository Manager for blades upgrade. It may save you heaps of time and make your life easier.