Posts Tagged ‘stack’

Dell Force10 Part 2: VLT Basics

July 10, 2016

dell-force10Last time I made a blog post on initial configuration of Force10 switches, which you can find here. There I talked about firmware upgrade and basic features, such as STP and Flow Control. In this blog post I would like to touch on such a key feature of Force10 switches as Virtual Link Trunking (VLT).

VLT is Force10’s implementation of Multi-Chassis Link Aggregation Group (MLAG), which is similar to Virtual Port Channels (vPC) on Cisco Nexus switches. The goal of VLT is to let you establish one aggregated link to two physical network switches in a loop-free topology. As opposed to two standalone switches, where this is not possible.

You could say that switch stacking gives you similar capabilities and you would  be right. The issue with stacked switches, though, is that they act as a single switch not only from the data plane point of view, but also from the control plane point of view. The implication of this is that if you need to upgrade a switch stack, you have to reboot both switches at the same time, which brings down your network. If you have an iSCSI or NFS storage array connected to the stack, this may cause trouble, especially in enterprise environments.

With VLT you also have one data plane, but individual control planes. As a result, each switch can be managed and upgraded separately without full network downtime.

VLT Terminology

Virtual Link Trunking uses the following set of terms:

  • VLT peer – one of the two switches participating in VLT (you can have a maximum of two switches in a VLT domain)
  • VLT interconnect (VLTi) – interconnect link between the two switches to synchronize the MAC address tables and other VLT-related data
  • VLT backup link – heartbeat link to send keep alive messages between the two switches, it’s also used to identify switch state if VLTi link fails
  • VLT – this is the name of the feature – Virtual Link Trunking, as well as a VLT link aggregation group – Virtual Link Trunk. We will call aggregated link a VLT LAG to avoid ambiguity.
  • VLT domain – grouping of all of the above

VLT Topology

This’s what a sample VLT domain looks like. S4048-ON switches have six 40Gb QSFP+ ports, two of which we use for a VLT interconnect. It’s recommended to use a static LAG for VLTi.

basic_vlt

Two 1Gb links are used for VLT backup. You can use switch out-of-band management ports for this. Four 10Gb links form a VLT LAG to the upstream core switch.

Use Cases

So where is this actually helpful? Vast majority of today’s environments are virtualized and do not require LAGs. vSphere already uses teaming on vSwitch uplinks for traffic distribution across all network ports by default. There are some use cases in VMware environments, where you can create a LAG to a vSphere Distributed Switch for faster link failure convergence or improved packet switching. Unless you have a really large vSphere environment this is generally not required, but you may use this option later on if required. Read Chris Wahl’s blog post here for more info.

Where VLT is really helpful is in building a loop-free network topology in your datacenter. See, all your vSphere hosts are connected to both Force10 switches for redundancy. Since traffic comes to either of the switches depending on which uplink is being picked on a ESXi host, you have to make sure that VMs on switch 1 are able to communicate to VMs on switch 2. If all you had in your environment were two Force10 switches, you would establish a LAG between the two and be done with it. But if your network topology is a bit larger than this and you have at least a single additional core switch/router in your environment you’d be faced with the following dilemma. How can you ensure efficient traffic switching in your network without creating loops?

stp_loop

You can no longer create a LAG between the two Force10 switches, as it will create a loop. Your only option is to keep switches connected only to the core and not to each other. And by doing that you will cause all traffic from VMs on switch 1 destined to VMs on switch 2 and vise versa to traverse the core.

east_west_traffic

And that’s where VLT comes into play. All east-west traffic between servers is contained within the VLT domain and doesn’t need to traverse the core. As shown above, if we didn’t use VLT, traffic from one switch to another would have to go from switch 1 to core and then back from core to switch 2. In a VLT domain traffic between the switches goes directly form switch 1 to switch 2 using VLTi.

Conclusion

That’s a brief introduction to VLT theory. In the next few posts we will look at how exactly VLT is configured and map theory to practice.

Advertisements

Force10 MXL: Firmware Upgrade

March 19, 2015

Uploading new firmware image

MXL switches keep two firmware images – A and B. You can set either one of them to be active. Use the following command to list firmware version of all stack members and see which one is active:

 # show boot system stack-unit all

mxl_firmware

To upload firmware you will need a TFTP server. You can use TFTPD64 (also called TFTPD32), which can be downloaded from Philippe Jounin page here .

If the active firmware image is in A, upload new firmware to B. You’ll also be asked if you want to upload firmware to all switches in the stack.

# upgrade system tftp://10.0.0.1/FTOS-XL-9.5.0.0P2.bin b:

firmware_upload

At the time of upgrade the latest version was 9.7. This version was 2 weeks old and wasn’t recommended for use in production. Version 9.6 had a major bug. As a result version 9.5SP2 was chosen for the upgrade.

Double-check that new firmware has been successfully distributed to all switches:

 # show boot system stack-unit all

check_firmware

Backing up startup config

Make sure you do not miss this step. If something goes wrong and switch looses its config, you’ll have to recreate all configuration from scratch. Imagine the consequences.

# copy start tftp://10.0.0.1/MXL_01.01.15.conf

Reloading the stack

Once firmware is uploaded and config is saved, reload the stack. Be mindful that it’s a disruptive procedure and all links connected to the stack will go down. A reboot shouldn’t take more than a couple of minutes, but make sure you do that after hours.

# conf t
# boot system stack-unit all primary system b:
# exit
# copy run start
# reload

Confirm that the active firmware image is now B. And that concludes the switch stack upgrade.

firmware_upgraded

 

Force10 MXL Switch: Stacking

March 3, 2015

Overview

There are two typical scenarios for stacking MXL’s – within the chassis and across the chassis. In both cases it’s recommended to use ring topology. Daisy chaining is also supported, but not desirable because of the lack of redundancy.

In this post I will be describing the more common case, which is intra-stacking. For inter-stacking configuration you can refer to Dell or Force10 documentation.

Cabling

dell_chassis

In my case I have four MXL switches in bays A1, B1, B2, A2. Cabling is simple, you basically daisy chain all switches and then plug the last switch to the first one.

Stack roles and unit numbers

When stack is built each switch is assigned an ID starting 0 and a role in the stack. There are three roles: Master, Standby and Member:

  • Master – is the switch you’ll use for all configuration. If you currently have IPs assigned to all your MXL switches, all of them except for one will be reset and only the Master will be accessible via SSH.
  • Standby – is the switch which takes over if Master switch fails. Master switch IP address is transferred to Standby in a failover scenario and stack continues to be managed via the same IP.
  • Member switch provides port capacity and doesn’t play any additional roles in the stack.

When you plug cables in, assign stack ports and restart the switches, they will go through election process and automatically pick up roles, as well as IDs. There’s an algorithm that assigns stack IDs and roles, which switches follow. But this algorithm has nothing to do with interconnect bay IDs in the chassis or order in which you cable the switches. You end up with pretty much random numbering.

If order matters, then you’ll have to reboot switches one by one in a particular order to have the desired IDs assigned. In that case IDs are assigned sequentially in a controlled fashion.

Stack configuration

If you don’t have any additional 40GbE modules in slots 0 and 1, then you’ll end up with two QSFP+ ports in a built-in module – ports 33 and 37 (refer to my Force10 MXL Switch: Port Numbering post for port numbering details). All you need to do is to designate them as stack ports on all switches, save config and reboot.

# stack-unit 0 stack-group 0
# stack-unit 0 stack-group 1
# copy run start
# reload

By default each switch is unit 0 in its own stack and stack-group is basically just a 40GbE stack port. You can have maximum of six such ports numbered from 0 to 5. To check that stack ports have been enabled run:

# do show system stack-unit 0 stack-group configured

enabled_ports

It could be that your 40GbE ports are in quad 10GbE mode and are not shown. You’ll need to convert them back to 40GbE mode to proceed. To show the list of available ports type in the command below. Switch shows empty expansion slots as stack ports as well (port 0/41 and 0/45), which is a bit confusing.

# show system stack-unit 0 stack-group

port_list

After a reboot, switches will join the stack and get a role and an id. This process is automatic by default. To see if stack ports have come up after a reboot type:

# show system stack-port status

stack_up

Conclusion

In my example I let switches to go through election process and select roles and IDs on their own. If you want to control the assignment process refer to Dell and Force10 documentation for instructions.

Now you may wonder if unit IDs are assigned automatically, how do you know which stack unit corresponds to which chassis bay ID. The hint for that is to show system inventory and map them by the Service Tag ID which is also shown in the Chassis Management Controller:

# show system brief
# show inventory

NetApp Active/Active Cabling

October 9, 2011

Cabling for active/active NetApp cluster is defined in Active/Active Configuration Guide. It’s described in detail but may be rather confusing for beginners.

First of all we use old DATA ONTAP 7.2.3. Much has changed since it’s release, particularly in disk shelves design. If documentation says:

If your disk shelf modules have terminate switches, set the terminate switches to Off on all but the last disk shelf in loop 1, and set the terminate switch on the last disk shelf to On.

You can be pretty much confident that you won’t have any “terminate switches”. Just skip this step.

To configuration types. We have two NetApp Filers and four disk shelves – two FC and two SATA. You can connect them in several ways.

First way is making two stacks (formely loops) each will be built from shelves of the same type. Each filer will own its stack. This configuration also allows you to implement multipathing. Lets take a look at picture from NetApp flyer:

Solid blue lines show primary connection. Appliance X (AX) 0a port is connected to Appliance X Disk Shelf 1 (AXDS1) A-In port, AXDS1 A-Out port is connected to AXDS2 A-In port. This comprises first stack. Then AY 0c port is connected to AYDS1 A-In port, AYDS1 A-Out port is connected to AYDS2 A-In port. This comprises second stack. If you leave it this way you will have to fully separate stacks.

If you want to implement active/active cluster you should do the same for B channels. As you can see in the picture AX 0c port is connected to AYDS1 B-In port, AYDS1 B-Out port is connected to AYDS2 B-In port. Then AY 0a port is connected to AXDS1 B-In port, AXDS1 B-Out port is connected to AXDS2 B-In port. Now both filers are connected to both stacks and in case of one filer failure the other can takeover.

Now we have four additional free ports: A-Out and B-Out in AXDS2 and AYDS2. You can use these ports for multipathing. Connect AX 0d to AXDS2 B-Out, AYe0d to AXDS2 A-Out, AX 0b to AYDS2 A-Out and AY 0b to AXDS2 B-Out. Now if disk shelf module, connection, or host bus adapter fails there is also a redundant path.

Second way which we implemented assumes that each filer owns one FC and one SATA disk shelf. It requires four loops instead of two, because FC and SATA shelves can’t be mixed in one loop. The shortcoming of such configuration is inability to implement multipathing, because each Filer has only four ports and each of it will be used for its own loop.

This time cabling is simpler. AX 0a is connected to AXDS1 A-In, AX 0b is connected to AYDS1 A-In, AY 0a is connected to AXDS2 A-In, AY 0b is connected to AYDS2 A-In. And to implement clustering you need to connect AX 0c to AXDS2 B-In, AX 0d to AYDS2 B-In, AYe0c to AXDS1 B-In and AY 0d to AYDS1 B-In.

Also I need to mention hardware and software disk ownership. In older system ownership was defined by cable connections. Filer which had connection to shelf A-In port owned all disks in this shelf or stack if there were other shelves daisy chained to channel A. Our FAS3020 for instance already supports software ownership where you can assign any disk to any filer in cluster. It means that it doesn’t matter now which port you use for connection – A-In or B-In. You can reassign disks during configuration.