Posts Tagged ‘topology’

Dell Force10 Part 2: VLT Basics

July 10, 2016

dell-force10Last time I made a blog post on initial configuration of Force10 switches, which you can find here. There I talked about firmware upgrade and basic features, such as STP and Flow Control. In this blog post I would like to touch on such a key feature of Force10 switches as Virtual Link Trunking (VLT).

VLT is Force10’s implementation of Multi-Chassis Link Aggregation Group (MLAG), which is similar to Virtual Port Channels (vPC) on Cisco Nexus switches. The goal of VLT is to let you establish one aggregated link to two physical network switches in a loop-free topology. As opposed to two standalone switches, where this is not possible.

You could say that switch stacking gives you similar capabilities and you would  be right. The issue with stacked switches, though, is that they act as a single switch not only from the data plane point of view, but also from the control plane point of view. The implication of this is that if you need to upgrade a switch stack, you have to reboot both switches at the same time, which brings down your network. If you have an iSCSI or NFS storage array connected to the stack, this may cause trouble, especially in enterprise environments.

With VLT you also have one data plane, but individual control planes. As a result, each switch can be managed and upgraded separately without full network downtime.

VLT Terminology

Virtual Link Trunking uses the following set of terms:

  • VLT peer – one of the two switches participating in VLT (you can have a maximum of two switches in a VLT domain)
  • VLT interconnect (VLTi) – interconnect link between the two switches to synchronize the MAC address tables and other VLT-related data
  • VLT backup link – heartbeat link to send keep alive messages between the two switches, it’s also used to identify switch state if VLTi link fails
  • VLT – this is the name of the feature – Virtual Link Trunking, as well as a VLT link aggregation group – Virtual Link Trunk. We will call aggregated link a VLT LAG to avoid ambiguity.
  • VLT domain – grouping of all of the above

VLT Topology

This’s what a sample VLT domain looks like. S4048-ON switches have six 40Gb QSFP+ ports, two of which we use for a VLT interconnect. It’s recommended to use a static LAG for VLTi.

basic_vlt

Two 1Gb links are used for VLT backup. You can use switch out-of-band management ports for this. Four 10Gb links form a VLT LAG to the upstream core switch.

Use Cases

So where is this actually helpful? Vast majority of today’s environments are virtualized and do not require LAGs. vSphere already uses teaming on vSwitch uplinks for traffic distribution across all network ports by default. There are some use cases in VMware environments, where you can create a LAG to a vSphere Distributed Switch for faster link failure convergence or improved packet switching. Unless you have a really large vSphere environment this is generally not required, but you may use this option later on if required. Read Chris Wahl’s blog post here for more info.

Where VLT is really helpful is in building a loop-free network topology in your datacenter. See, all your vSphere hosts are connected to both Force10 switches for redundancy. Since traffic comes to either of the switches depending on which uplink is being picked on a ESXi host, you have to make sure that VMs on switch 1 are able to communicate to VMs on switch 2. If all you had in your environment were two Force10 switches, you would establish a LAG between the two and be done with it. But if your network topology is a bit larger than this and you have at least a single additional core switch/router in your environment you’d be faced with the following dilemma. How can you ensure efficient traffic switching in your network without creating loops?

stp_loop

You can no longer create a LAG between the two Force10 switches, as it will create a loop. Your only option is to keep switches connected only to the core and not to each other. And by doing that you will cause all traffic from VMs on switch 1 destined to VMs on switch 2 and vise versa to traverse the core.

east_west_traffic

And that’s where VLT comes into play. All east-west traffic between servers is contained within the VLT domain and doesn’t need to traverse the core. As shown above, if we didn’t use VLT, traffic from one switch to another would have to go from switch 1 to core and then back from core to switch 2. In a VLT domain traffic between the switches goes directly form switch 1 to switch 2 using VLTi.

Conclusion

That’s a brief introduction to VLT theory. In the next few posts we will look at how exactly VLT is configured and map theory to practice.

Advertisements

First Look at UCS Performance Manager

May 12, 2016

Overview

perf_gaugeCisco UCS has been in the market for seven years now. It was quite expensive blade chassis when it was first introduced by Cisco in March 2009, but has reached the price parity with most of the server vendors these days.

Over the course of the last seven years Cisco has built a great set of products, which helps UCS customers in various areas:

  • UCS Central for configuration management across multiple Cisco UCS domains
  • UCS Director for infrastructure automation not only of UCS, but also network, storage and virtualization layers (don’t expect it to support any other vendors than Cisco for IP networks, though)
  • UCS Performance Manager for performance monitoring and capacity planning, which can also tap into your network, storage, virtualization and even individual virtual machines

UCS Performance Manager

UCS Performance Manager was first released in October 2014. The product comes in two versions – full and express. PM Express covers only servers, hypervisors and operating systems. The full version on top of that supports storage and network devices. Product is licensed on a per UCS server basis. So you don’t pay for additional network/storage devices or hypervisors.

PM supports vSphere hypervisor (plus Hyper-V), Cisco networking and EMC VNX / EMC VMAX / NetApp FAS storage arrays. By the list of the supported products you may quickly guess that the full version of Performance Manager is targeted mainly at NetApp FlexPod, VCE Vblock and EMC VSPEX customers.

Product architecture

UCS Performance Manager can be downloaded and quickly deployed as a virtual appliance. You might be shocked when you start it up first time, as the appliance by default comes configured with 8 vCPUs and 40GB of RAM. If you’re using it for demo purposes you can safely reduce it to something like 2-4 vCPUs and 8-12GB of RAM. You will experience some slowdowns during the startup, but performance will be acceptable overall.

UCS PM is built on Zenoss monitoring software and is essentially a customized version of Zenoss Service Dynamics with Cisco UCS ZenPacks. You may notice references to Zenoss throughout the management GUI.

ucspm_zenoss

Two main components of the solution are the Control Center and the Performance Manager itself. Control Center is a container orchestration product, which runs Performance Manager as an application in Docker containers (many containers).

ucspm_docker

When deploying Performance Manager you start with one VM and then you can scale to up to four VMs total. Each of the VMs can run in two modes – master or agent. When you deploy the first VM you will have to select it’s role at first login. You have to have one master host, which also runs an agent. And if you need to scale you can deploy three additional agent VMs and build a ZooKeeper cluster. One master host can support up to 500 UCS servers, when configured with 8 vCPUs and 64GB of RAM. Depending on your deployment size you may not ever need to scale to more than one Performance Manager VM.

Installation

After you’ve deployed the OVA you will need to log in to the VM’s CLI and change the password, configure the host as a master, set up a static IP, DNS, time zone, hostname and reboot.

Then you connect to Control Center and click “+ Application” button in the Applications section and deploy UCS PM on port 4979. For the hostname use Control Center’s hostname.

deploy_ucspm

Once the UCS PM application is deployed, click on the Start button next to UCS PM line in the Applications section

start_ucspm

Performance manager is accessible from a separate link which is Control Center’s hostname prefixed with “ucspm”. So if your CC hostname is ucspm01.domain.local, UCS PM link will be https://ucspm.ucspm01.domain.local:443. You can see it in Virtual Host Names column. You will have to add an alias in DNS which would point from ucspm.ucspm01.domain.local to ucspm01.domain.local, otherwise you won’t be able to connect to it.

When you finally open UCS PM you will see a wizard which will ask you to add the licences, set an admin account and add your UCS chassis, VMware vCenters and UCS Central if you happen to have one. In the full version you will have a chance to add storage and network devices as well.

ucspm_wizard

UCS performance monitoring

Probably the easiest way to start working with Performance Manager is to jump from the dashboard to the Topology view. Topology view shows your UCS domain topology and provides an easy way to look at various components from one screen.

ucspm_topology

Click on the fabric interconnect and you can quickly see the uplink utilization. Click on the chassis and you will get summarized FEX port statistics. How about drilling down to a particular port-channel or service profile or vNIC? UCS Performance Manager can give you the most comprehensive information about every UCS component with historical data up to 1 year based on the default storage configuration.

north_traffic

Another great feature you may want to straight away drill down into is Bandwidth Usage, which gives you an overview of bandwidth utilization across all UCS components, which you can look at from a server or network perspective. This can let you quickly identify such things as uneven workload distribution between the blades or maybe uneven traffic distribution between fabric interconnect A and B side or SAN/LAN uplinks going to the upstream switches.

ucspm_bandwidth

You can of course also generate various reports to determine your total capacity utilization or if you’re for example planning to add memory to your blades, you can quickly find out the number of DIMM slots available in the corresponding report.

memory_slots

VMware performance monitoring

UCS Performance Manager is not limited to monitoring only Cisco UCS blade chassis even in the Express version. You can add your hypervisors and also individual virtual machines. Once you add your vCenter to the list of the monitored devices you get a comprehensive list of VMware components, such as hosts, VMs, datastores, pNICs, vNICs and associated performance monitoring graphs, configuration information, events, etc.

Performance Manager can correlate VMware to UCS components and for example for a given VM provide you FC uplink utilization on the corresponding fabric interconnects of the chassis where this VM is running:

vmware_stats

If you want to go further, you can add individual VMs to Performance Manager, connected via WinRM/SSH or SNMP. Some cool additional functionality you get, which is not available in VMware section is the Dynamic View. Dynamic View lets you see VM connectivity from the ESXi host it’s running on all the way through to blade, chassis, vNIC, VIC, backplane port, I/O module and fabric interconnect. Which is very helpful for troubleshooting connectivity issues:

dynamic_view

Conclusion

UCS Performance Manager is not the only product for performance monitoring in virtualized environments. There are many others, VMware vRealize Operations Manager is one of the most popular of its kind. But if you’re a Cisco UCS customer you can definitely benefit from the rich functionality this product offers for monitoring UCS blade chassis. And if you are a lucky owner of NetApp FlexPod, VCE Vblock or EMC VSPEX, UCS Performance Manager for you is a must.

pm_dashboard

Force10 MXL Switch: Stacking

March 3, 2015

Overview

There are two typical scenarios for stacking MXL’s – within the chassis and across the chassis. In both cases it’s recommended to use ring topology. Daisy chaining is also supported, but not desirable because of the lack of redundancy.

In this post I will be describing the more common case, which is intra-stacking. For inter-stacking configuration you can refer to Dell or Force10 documentation.

Cabling

dell_chassis

In my case I have four MXL switches in bays A1, B1, B2, A2. Cabling is simple, you basically daisy chain all switches and then plug the last switch to the first one.

Stack roles and unit numbers

When stack is built each switch is assigned an ID starting 0 and a role in the stack. There are three roles: Master, Standby and Member:

  • Master – is the switch you’ll use for all configuration. If you currently have IPs assigned to all your MXL switches, all of them except for one will be reset and only the Master will be accessible via SSH.
  • Standby – is the switch which takes over if Master switch fails. Master switch IP address is transferred to Standby in a failover scenario and stack continues to be managed via the same IP.
  • Member switch provides port capacity and doesn’t play any additional roles in the stack.

When you plug cables in, assign stack ports and restart the switches, they will go through election process and automatically pick up roles, as well as IDs. There’s an algorithm that assigns stack IDs and roles, which switches follow. But this algorithm has nothing to do with interconnect bay IDs in the chassis or order in which you cable the switches. You end up with pretty much random numbering.

If order matters, then you’ll have to reboot switches one by one in a particular order to have the desired IDs assigned. In that case IDs are assigned sequentially in a controlled fashion.

Stack configuration

If you don’t have any additional 40GbE modules in slots 0 and 1, then you’ll end up with two QSFP+ ports in a built-in module – ports 33 and 37 (refer to my Force10 MXL Switch: Port Numbering post for port numbering details). All you need to do is to designate them as stack ports on all switches, save config and reboot.

# stack-unit 0 stack-group 0
# stack-unit 0 stack-group 1
# copy run start
# reload

By default each switch is unit 0 in its own stack and stack-group is basically just a 40GbE stack port. You can have maximum of six such ports numbered from 0 to 5. To check that stack ports have been enabled run:

# do show system stack-unit 0 stack-group configured

enabled_ports

It could be that your 40GbE ports are in quad 10GbE mode and are not shown. You’ll need to convert them back to 40GbE mode to proceed. To show the list of available ports type in the command below. Switch shows empty expansion slots as stack ports as well (port 0/41 and 0/45), which is a bit confusing.

# show system stack-unit 0 stack-group

port_list

After a reboot, switches will join the stack and get a role and an id. This process is automatic by default. To see if stack ports have come up after a reboot type:

# show system stack-port status

stack_up

Conclusion

In my example I let switches to go through election process and select roles and IDs on their own. If you want to control the assignment process refer to Dell and Force10 documentation for instructions.

Now you may wonder if unit IDs are assigned automatically, how do you know which stack unit corresponds to which chassis bay ID. The hint for that is to show system inventory and map them by the Service Tag ID which is also shown in the Chassis Management Controller:

# show system brief
# show inventory

EIGRP enhancements

August 19, 2012

Enhanced Interior Gateway Routing Protocol (EIGRP) is a Cisco proprietary IGP. So if you have several vendors inside your corporate LAN like HP or Juniper then it’s probably not your choice. However, EIGRP has several enhancements that make it even faster in convergence time in comparison to OSPF.

One of the main drawbacks of OSPF is that it consumes considerable amount of memory to maintain LSDB and CPU power to run Dijkstra on it. EIGRP doesn’t do that. Routers with EIGRP enabled on their interfaces exchange only partial information with their neighbors, as OSPF does. But EIGRP routers don’t maintain the whole topology. On that matters they behave more like RIP. Each router holds information about networks and next hop routers to reach them. But unlike RIP, for each network EIGRP finds primary and secondary (if possible) routes. So that in case of link failure router could immediately switch to the backup route. In EIGRP terminology main route is called successor route and alternative route is feasible successor route.

Also, EIGRP has more sophisticated metric calculation. It considers not only bandwidth, but also delay. The formula is:

metric  = (10^7 / least-bandwidth + cumulative-delay) * 256

Here least-bandwidth is the slowest link speed in kbps along the path and cumulative-delay is sum of all delays from the network to the router in tens of microseconds.

To understand how EIGRP preventsloops there is a need for another two terms. Feasible Distance (FD) is a metric of the best route to reach a subnet, as calculated on a router. And Reported Distance (RD) is a metric as calculated on a neighboring router and then reported and learned in an EIGRP update. The trick here is that route can be a feasible successor route only if its RD is less than FD. It guarantees that this route doesn’t go through this router. Because otherwise it would obviously be greater than FD.

Again, EIGRP is better IGP from all perspectives. The only barrier that restricts its proliferation is proprietary nature of the protocol.