Posts Tagged ‘Data Center Bridging’

Force10 and vSphere vDS Interoperability Issue

June 10, 2016

dell-force10Recently I had an opportunity to work with Dell FX2 platform from the design and delivery point of view. I was deploying a FX2s chassis with FC630 blades and FN410S 10Gb I/O aggregators.

I ran into an interesting interoperability glitch between Force10 and vSphere distributed switch when using LLDP. LLDP is an equivalent of Cisco CDP, but is an open standard. And it allows vSphere administrators to determine which physical switch port a given vSphere distributed switch uplink is connected to. If you enable both Listen and Advertise modes, network administrators can get similar visibility, but from the physical switch side.

In my scenario, when LLDP was enabled on a vSphere distributed switch, uplinks on all ESXi hosts started disconnecting and connecting back intermittently, with log errors similar to this:

Lost uplink redundancy on DVPorts: “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”, “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”, “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”, “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”. Physical NIC vmnic1 is down.

Network connectivity restored on DVPorts: “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”, “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”. Physical NIC vmnic1 is up

Uplink redundancy restored on DVPorts: “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”, “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”, “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”, “1549/03 4b 0b 50 22 3f d7 8f-28 3c ff dd a4 76 26 15”. Physical NIC vmnic1 is up

Issue Troubleshooting

FX2 I/O aggregator logs were reviewed for potential errors and the following log entries were found:

%STKUNIT0-M:CP %DIFFSERV-5-DSM_DCBX_PFC_PARAMETERS_MISMATCH: PFC Parameters MISMATCH on interface: Te 0/2

%STKUNIT0-M:CP %IFMGR-5-OSTATE_DN: Changed interface state to down: Te 0/2

%STKUNIT0-M:CP %IFMGR-5-OSTATE_UP: Changed interface state to up: Te 0/2

This clearly looks like some DCB negotiation issue between Force10 and the vSphere distributed switch.

Root Cause

Priority Flow Control (PFC) is one of the protocols from the Data Center Bridging (DCB) family. DCB was purposely built for converged network environments where you use 10Gb links for both Ethernet and FC traffic in the form of FCoE. In such scenario, PFC can pause Ethernet frames when FC is not having enough bandwidth and that way prioritise the latency sensitive storage traffic.

In my case NIC ports on Qlogic 57840 adaptors were used for 10Gb Ethernet and iSCSI and not FCoE (which is very uncommon unless you’re using Cisco UCS blade chassis). So the question is, why Force10 switches were trying to negotiate FCoE? And what did it have to do with enabling LLDP on the vDS?

The answer is simple. LLDP not only advertises the port numbers, but also the port capabilities. Data Center Bridging Exchange Protocol (DCBX) uses LLDP when conveying capabilities and configuration of FCoE features between neighbours. This is why enabling LLDP on the vDS triggered this. When Force10 switches determined that vDS uplinks were CNA adaptors (which was in fact true, I was just not using FCoE) it started to negotiate FCoE using DCBX. Which didn’t really go well.

Solution

The easiest solution to this problem is to disable DCB on the Force10 switches using the following command:

# conf t
# no dcb enable

Alternatively you can try and disable FCoE from the ESXi end by using the following commands from the host CLI:

# esxcli fcoe nic list
# esxcli fcoe nic disable -n vmnic0

Once FCoE has been disabled on all NICs, run the following command and you should get an empty list:

# esxcli fcoe adapter list

Conclusion

It is still not clear why PFC mismatch would cause vDS uplinks to start flapping. If switch cannot establish a FCoE connection it should just ignore it. Doesn’t seem to be the case on Force10. So if you run into a similar issue, simply disable DCB on the switches and it should fix it.

Traffic Load Balancing in Cisco UCS

December 21, 2015

Whenever I deploy a Cisco UCS at a customer the question I get asked a lot is how traffic flows within the system between VMs running on the blades and FEX modules, FEX modules and Fabric Interconnects and finally how it’s uplinked to the network core.

Cisco has a range of CNA cards for UCS blades. With VIC 1280 you get 8 x 10Gb ports split between two FEX modules for redundancy. And FEX modules on their own can have up to 8 x 10Gb Fabric Interconnect facing interfaces, which can give you up to 160Gb of bandwidth per chassis. And all these numbers may sound impressive, but unless you understand how your VMs traffic flows through UCS it’s easy to make wrong assumptions on what per VM and aggregate bandwidth you can achieve. So let’s dive deep into UCS and shed some light on how VM traffic is load-balanced within the system.

UCS Hardware Components

Each Fabric Extender (FEX) has external and internal ports. External FEX ports are patched to FIs and internal ports are internally wired to the blade adapters. FEX 2204 has 4 external and 16 internal and FEX 2208 has 8 external and 32 internal ports.

External ports are connected to FIs in powers of two: 1, 2, 4 or 8 ports per FEX and form a port channel (make sure to use “Port Channel” link grouping preference under Chassis/FEX Discovery Policy). Same rule is applied to blade Virtual Interface Cards (VIC). The most common VIC 1240 and 1280 have 4 x 10Gb and 8 x 10Gb ports respectively and also form a port channel to the internal FEX ports. Every VIC adaptor is connected to both FEX modules for redundancy.

chassis_network

Fabric Interconnects are then patched to your network core and FC Fabric (if you have one). Whether Ethernet uplinks will be individual uplinks or port channels will depend on your network topology. For fibre uplinks the rule of thumb is to patch FI A to your FC Fabric A and FI B to FC Fabric B, which follows the common FC traffic isolation principle.

Virtual Circuits

To provide network and storage connectivity to blades you create virtual NICs and virtual HBAs on each blade. Since internally UCS uses FCoE to transfer FC frames, both vNICs and vHBAs use the same 10GbE uplinks to send and receive traffic. Worth mentioning that Cisco uses Data Center Bridging (DCB) protocol with it’s sub-protocols Priority Flow Control (PFC) and Enhanced Transmission Selection (ETS), which guarantee that FC frames have higher priority in the queue and are processed first to ensure low latency. But I digress.

UCS assigns a virtual circuit to each virtual adaptor, which is a representation of how the traffic traverses the system all the way from the VIC port to a FEX internal port, then FEX external port, FI server port and finally a FI uplink. You can trace the full path of each virtual adaptor in UCS Manager by selecting a Service Profile and viewing the VIF Paths tab.

vif_paths

In this example we have a blade with four vNICs and two vHBAs which are split between two fabrics. All virtual adaptors on fabric A are connected through VIC port channel PC-1283 which is represented as port channel PC-1025 on the FEX A side. Then traffic leaves FEX A and reaches the Fabric Interconnect A which sends the traffic out to the network core through port channel A/PC-1.

You can also get the list of port channels from the FI CLI:

# connect nxos
# show port-channel summary

ucs_portchannels

Network Load Balancing

Now that we know how all components are interconnected to each other, let’s discuss the traffic flow in a typical VMware environment and how we achieve the massive network throughput that UCS provides.

As an example let’s take a look at the vSwitch where your VM Network port group is configured. vSwitch will have two uplinks – one goes to Fabric A and the other one to Fabric B for redundancy. Default load balancing policy on a vSwitch is “Route based on the originating port ID”, which essentially pins all traffic for a VM to a particular uplink. vSphere makes sure that VMs are evenly distributed between the uplinks to use all network bandwidth available.

From each uplink (or vNIC in UCS world) traffic is forwarded through an adapter port channel to a FEX, then to a Fabric Interconnect and leaves UCS from a FI uplink. Within UCS traffic is distributed between port channel members using source/destination IP hash algorithm. Which is even more granular and is capable of very efficient traffic distribution between all members of a port channel all the way up to your network core.

ucs_loadbalancing

If you look at the vSwitch you’ll see that with UCS each uplink shows the maximum available bandwidth from vNIC and is not limited to a port channel member speed of 10Gb. Why is this so powerful? Because with UCS you don’t need to slice adapter’s available bandwidth between different types of traffic. Even though you provision multiple vNICs and vHBAs for the vSphere hosts, UCS uses the same port channel links (20Gb in the example below) from the VIC adapter to transfer all traffic and takes care of load balancing for you.

vswitch_uplinks

You may legitimately ask, if UCS uses the same pipe to transfer all data regardless of which vSwitch uplink is being used, then how can I make sure that different types of traffic, such as vMotion, storage, VM traffic, replication, etc, do not compete for the same pipe? First you need to ask yourself if you can saturate that much bandwidth with your workloads. If the answer is yes, then you can use another great feature available in UCS, which is QoS. QoS lets you assign a minimum available bandwidth guarantee on a per vNIC/vHBA basis. But that’s a topic for another blog post.

References

In this post I tried to summarise the logic behind UCS traffic distribution. If you want to dig deeper in UCS network architecture, then there’re a lot of great bloggers out there. I would like to call out the following authors: