Posts Tagged ‘UCS’

First Look at UCS Performance Manager

May 12, 2016

Overview

perf_gaugeCisco UCS has been in the market for seven years now. It was quite expensive blade chassis when it was first introduced by Cisco in March 2009, but has reached the price parity with most of the server vendors these days.

Over the course of the last seven years Cisco has built a great set of products, which helps UCS customers in various areas:

  • UCS Central for configuration management across multiple Cisco UCS domains
  • UCS Director for infrastructure automation not only of UCS, but also network, storage and virtualization layers (don’t expect it to support any other vendors than Cisco for IP networks, though)
  • UCS Performance Manager for performance monitoring and capacity planning, which can also tap into your network, storage, virtualization and even individual virtual machines

UCS Performance Manager

UCS Performance Manager was first released in October 2014. The product comes in two versions – full and express. PM Express covers only servers, hypervisors and operating systems. The full version on top of that supports storage and network devices. Product is licensed on a per UCS server basis. So you don’t pay for additional network/storage devices or hypervisors.

PM supports vSphere hypervisor (plus Hyper-V), Cisco networking and EMC VNX / EMC VMAX / NetApp FAS storage arrays. By the list of the supported products you may quickly guess that the full version of Performance Manager is targeted mainly at NetApp FlexPod, VCE Vblock and EMC VSPEX customers.

Product architecture

UCS Performance Manager can be downloaded and quickly deployed as a virtual appliance. You might be shocked when you start it up first time, as the appliance by default comes configured with 8 vCPUs and 40GB of RAM. If you’re using it for demo purposes you can safely reduce it to something like 2-4 vCPUs and 8-12GB of RAM. You will experience some slowdowns during the startup, but performance will be acceptable overall.

UCS PM is built on Zenoss monitoring software and is essentially a customized version of Zenoss Service Dynamics with Cisco UCS ZenPacks. You may notice references to Zenoss throughout the management GUI.

ucspm_zenoss

Two main components of the solution are the Control Center and the Performance Manager itself. Control Center is a container orchestration product, which runs Performance Manager as an application in Docker containers (many containers).

ucspm_docker

When deploying Performance Manager you start with one VM and then you can scale to up to four VMs total. Each of the VMs can run in two modes – master or agent. When you deploy the first VM you will have to select it’s role at first login. You have to have one master host, which also runs an agent. And if you need to scale you can deploy three additional agent VMs and build a ZooKeeper cluster. One master host can support up to 500 UCS servers, when configured with 8 vCPUs and 64GB of RAM. Depending on your deployment size you may not ever need to scale to more than one Performance Manager VM.

Installation

After you’ve deployed the OVA you will need to log in to the VM’s CLI and change the password, configure the host as a master, set up a static IP, DNS, time zone, hostname and reboot.

Then you connect to Control Center and click “+ Application” button in the Applications section and deploy UCS PM on port 4979. For the hostname use Control Center’s hostname.

deploy_ucspm

Once the UCS PM application is deployed, click on the Start button next to UCS PM line in the Applications section

start_ucspm

Performance manager is accessible from a separate link which is Control Center’s hostname prefixed with “ucspm”. So if your CC hostname is ucspm01.domain.local, UCS PM link will be https://ucspm.ucspm01.domain.local:443. You can see it in Virtual Host Names column. You will have to add an alias in DNS which would point from ucspm.ucspm01.domain.local to ucspm01.domain.local, otherwise you won’t be able to connect to it.

When you finally open UCS PM you will see a wizard which will ask you to add the licences, set an admin account and add your UCS chassis, VMware vCenters and UCS Central if you happen to have one. In the full version you will have a chance to add storage and network devices as well.

ucspm_wizard

UCS performance monitoring

Probably the easiest way to start working with Performance Manager is to jump from the dashboard to the Topology view. Topology view shows your UCS domain topology and provides an easy way to look at various components from one screen.

ucspm_topology

Click on the fabric interconnect and you can quickly see the uplink utilization. Click on the chassis and you will get summarized FEX port statistics. How about drilling down to a particular port-channel or service profile or vNIC? UCS Performance Manager can give you the most comprehensive information about every UCS component with historical data up to 1 year based on the default storage configuration.

north_traffic

Another great feature you may want to straight away drill down into is Bandwidth Usage, which gives you an overview of bandwidth utilization across all UCS components, which you can look at from a server or network perspective. This can let you quickly identify such things as uneven workload distribution between the blades or maybe uneven traffic distribution between fabric interconnect A and B side or SAN/LAN uplinks going to the upstream switches.

ucspm_bandwidth

You can of course also generate various reports to determine your total capacity utilization or if you’re for example planning to add memory to your blades, you can quickly find out the number of DIMM slots available in the corresponding report.

memory_slots

VMware performance monitoring

UCS Performance Manager is not limited to monitoring only Cisco UCS blade chassis even in the Express version. You can add your hypervisors and also individual virtual machines. Once you add your vCenter to the list of the monitored devices you get a comprehensive list of VMware components, such as hosts, VMs, datastores, pNICs, vNICs and associated performance monitoring graphs, configuration information, events, etc.

Performance Manager can correlate VMware to UCS components and for example for a given VM provide you FC uplink utilization on the corresponding fabric interconnects of the chassis where this VM is running:

vmware_stats

If you want to go further, you can add individual VMs to Performance Manager, connected via WinRM/SSH or SNMP. Some cool additional functionality you get, which is not available in VMware section is the Dynamic View. Dynamic View lets you see VM connectivity from the ESXi host it’s running on all the way through to blade, chassis, vNIC, VIC, backplane port, I/O module and fabric interconnect. Which is very helpful for troubleshooting connectivity issues:

dynamic_view

Conclusion

UCS Performance Manager is not the only product for performance monitoring in virtualized environments. There are many others, VMware vRealize Operations Manager is one of the most popular of its kind. But if you’re a Cisco UCS customer you can definitely benefit from the rich functionality this product offers for monitoring UCS blade chassis. And if you are a lucky owner of NetApp FlexPod, VCE Vblock or EMC VSPEX, UCS Performance Manager for you is a must.

pm_dashboard

Advertisement

Upgrading Cisco UCS Fabric Interconnects

March 17, 2016

I have to do this first, as this is a high-risk change for any environment:

disclaimerDISCLAMER: I ACCEPT NO RESPONSIBILITY FOR ANY DAMAGE OR CORRUPTION OF DATA THAT MAY OCCUR AS A RESULT OF CARRYING OUT STEPS DESCRIBED BELOW. YOU DO THIS AT YOUR OWN RISK.

And now to the point. Cisco has two generations of Fabric Interconnects with the third generation released just recently. There is 6100 series, which includes 6120XP and 6140XP. Second generation is 6200 series, which introduced unified ports and also has two models in its range – 6248UP and 6296UP. And there is now a third generation of 40Gb fabric interconnects with 6324, 6332 and 6332-16UP models.

We are yet to see mass adoption of 40Gb FIs. And some of the customers are still upgrading from the first to the second generation.

In this blog post we will go through the process of upgrading 6100 fabric interconnects to 6200 by using 6120 and 6248 as an example.

Prerequisites

Cisco UCS has a pair of fabric interconnects which work in an active/passive mode from a control plane perspective. This lets us do an in-place upgrade of a FI cluster by upgrading interconnects one at a time without any further reconfiguration needed in UCS Manager in most cases.

For a successful upgrade old and new interconnects MUST run on the same firmware revision. That means you will need to upgrade the first new FI to the same firmware before you can join it to the cluster to replace the first old FI.

This can be done by booting the FI in a standalone mode, giving it an IP address and installing firmware via UCS Manager.

The second FI won’t need a manual firmware update, because when a FI of the same hardware model is joined to a cluster it’s upgraded automatically from the other FI.

Preparation tasks

It’s a good idea to make a record of all connections from the current fabric interconnects and make a configuration backup before an upgrade.

ucs_backup

If you have any unused connections which you’re not planning to move, it’s a good time to disconnect the cables and disable these ports.

Cisco strongly suggests to also upgrade the firmware on all software and hardware components of the existing UCS to the latest recommended version first.

Upgrading firmware on the first new FI

Steps to upgrade firmware on the first new fabric interconnect are as follows:

  • Rack and stack the new FI close enough to the old interconnects to make sure all cables can reach it.
  • Connect a console cable to the new FI, boot it up and when you are asked “Is this Fabric interconnect part of a cluster”, select NO to boot the FI in a standalone mode.
  • Assign an IP address to the FI and connect to it using UCS Manager.
  • Upgrade the firmware, which will reboot the fabric interconnect.
  • Reset the configuration on the FI, which will cause another reboot:
    • # connect local-mgmt
      # erase config

  • Once the FI is upgraded and reset to factory defaults you can proceed with joining it to the cluster.

Replacing the first FI

  • Determine which old FI is in the subordinate mode (upgrade a FI only if it’s in subordinate mode!) and disable server ports on it.
  • Shut down the old subordinate FI.
  • Move L1/L2, management, server and Ethernet/FC/FCoE uplink ports to the new FI.
  • Boot the new FI. This time the new FI will detect the presence of the peer FI. When you see the following prompt type YES:
    • Installer has detected the presence of a peer Fabric interconnect. This Fabric interconnect will be added to the cluster. Continue (y/n) ?

  • Follow the console prompts and assign an IP address to the new FI. The rest of the settings will be pulled from the peer FI.

Once the new FI joins the cluster you should see the following equipment topology in UCS Manager (This screenshot was made after the primary role had been moved to the new FI. Initially you should see the new FI as subordinate.):

two_fis

  • At this stage make sure that all configuration has been applied to the new FI and you can see all LAN and SAN uplinks and port channels.
  • Enable server ports on the new FI and reacknowledge all chassis.

Reacknowledging a chassis might be disruptive to the traffic flow from the blades. So make sure you don’t have any production workloads running on it. If you have two chassis and enough capacity to run all VMs on either of them, you can temporarily move VMs between the chassis and reacknowledge one chassis at a time.

Replacing the second FI

You will need to promote the new FI to be the primary, before proceeding with an upgrade of the second FI. To change the roles, use SSH to log in to the old FI, which is currently the primary (you can’t change roles from the subordinate FI) and run the following commands:

# connect local-mgmt
# cluster lead b
# show cluster state

The rest of the process is exactly the same.

After the upgrade, if needed, reconfigure any of the links which may have had their port numbers changed, such as if you had an expansion module in the old FIs, but not on the new FIs.

References

Cisco has a guide which has a step by step procedures for upgrading fabric interconnects, I/O modules, VIC cards as well as rack-mount servers. Refer to this guide for any further clarifications:

 

Traffic Load Balancing in Cisco UCS

December 21, 2015

Whenever I deploy a Cisco UCS at a customer the question I get asked a lot is how traffic flows within the system between VMs running on the blades and FEX modules, FEX modules and Fabric Interconnects and finally how it’s uplinked to the network core.

Cisco has a range of CNA cards for UCS blades. With VIC 1280 you get 8 x 10Gb ports split between two FEX modules for redundancy. And FEX modules on their own can have up to 8 x 10Gb Fabric Interconnect facing interfaces, which can give you up to 160Gb of bandwidth per chassis. And all these numbers may sound impressive, but unless you understand how your VMs traffic flows through UCS it’s easy to make wrong assumptions on what per VM and aggregate bandwidth you can achieve. So let’s dive deep into UCS and shed some light on how VM traffic is load-balanced within the system.

UCS Hardware Components

Each Fabric Extender (FEX) has external and internal ports. External FEX ports are patched to FIs and internal ports are internally wired to the blade adapters. FEX 2204 has 4 external and 16 internal and FEX 2208 has 8 external and 32 internal ports.

External ports are connected to FIs in powers of two: 1, 2, 4 or 8 ports per FEX and form a port channel (make sure to use “Port Channel” link grouping preference under Chassis/FEX Discovery Policy). Same rule is applied to blade Virtual Interface Cards (VIC). The most common VIC 1240 and 1280 have 4 x 10Gb and 8 x 10Gb ports respectively and also form a port channel to the internal FEX ports. Every VIC adaptor is connected to both FEX modules for redundancy.

chassis_network

Fabric Interconnects are then patched to your network core and FC Fabric (if you have one). Whether Ethernet uplinks will be individual uplinks or port channels will depend on your network topology. For fibre uplinks the rule of thumb is to patch FI A to your FC Fabric A and FI B to FC Fabric B, which follows the common FC traffic isolation principle.

Virtual Circuits

To provide network and storage connectivity to blades you create virtual NICs and virtual HBAs on each blade. Since internally UCS uses FCoE to transfer FC frames, both vNICs and vHBAs use the same 10GbE uplinks to send and receive traffic. Worth mentioning that Cisco uses Data Center Bridging (DCB) protocol with it’s sub-protocols Priority Flow Control (PFC) and Enhanced Transmission Selection (ETS), which guarantee that FC frames have higher priority in the queue and are processed first to ensure low latency. But I digress.

UCS assigns a virtual circuit to each virtual adaptor, which is a representation of how the traffic traverses the system all the way from the VIC port to a FEX internal port, then FEX external port, FI server port and finally a FI uplink. You can trace the full path of each virtual adaptor in UCS Manager by selecting a Service Profile and viewing the VIF Paths tab.

vif_paths

In this example we have a blade with four vNICs and two vHBAs which are split between two fabrics. All virtual adaptors on fabric A are connected through VIC port channel PC-1283 which is represented as port channel PC-1025 on the FEX A side. Then traffic leaves FEX A and reaches the Fabric Interconnect A which sends the traffic out to the network core through port channel A/PC-1.

You can also get the list of port channels from the FI CLI:

# connect nxos
# show port-channel summary

ucs_portchannels

Network Load Balancing

Now that we know how all components are interconnected to each other, let’s discuss the traffic flow in a typical VMware environment and how we achieve the massive network throughput that UCS provides.

As an example let’s take a look at the vSwitch where your VM Network port group is configured. vSwitch will have two uplinks – one goes to Fabric A and the other one to Fabric B for redundancy. Default load balancing policy on a vSwitch is “Route based on the originating port ID”, which essentially pins all traffic for a VM to a particular uplink. vSphere makes sure that VMs are evenly distributed between the uplinks to use all network bandwidth available.

From each uplink (or vNIC in UCS world) traffic is forwarded through an adapter port channel to a FEX, then to a Fabric Interconnect and leaves UCS from a FI uplink. Within UCS traffic is distributed between port channel members using source/destination IP hash algorithm. Which is even more granular and is capable of very efficient traffic distribution between all members of a port channel all the way up to your network core.

ucs_loadbalancing

If you look at the vSwitch you’ll see that with UCS each uplink shows the maximum available bandwidth from vNIC and is not limited to a port channel member speed of 10Gb. Why is this so powerful? Because with UCS you don’t need to slice adapter’s available bandwidth between different types of traffic. Even though you provision multiple vNICs and vHBAs for the vSphere hosts, UCS uses the same port channel links (20Gb in the example below) from the VIC adapter to transfer all traffic and takes care of load balancing for you.

vswitch_uplinks

You may legitimately ask, if UCS uses the same pipe to transfer all data regardless of which vSwitch uplink is being used, then how can I make sure that different types of traffic, such as vMotion, storage, VM traffic, replication, etc, do not compete for the same pipe? First you need to ask yourself if you can saturate that much bandwidth with your workloads. If the answer is yes, then you can use another great feature available in UCS, which is QoS. QoS lets you assign a minimum available bandwidth guarantee on a per vNIC/vHBA basis. But that’s a topic for another blog post.

References

In this post I tried to summarise the logic behind UCS traffic distribution. If you want to dig deeper in UCS network architecture, then there’re a lot of great bloggers out there. I would like to call out the following authors:

 

Troubleshooting Cisco UCS LDAP

December 4, 2015

If you ever configured LDAP integration on a blade chassis or a storage array, you know that troubleshooting authentication is painful on these things. It will accept all your configuration settings and if you’ve made a mistake somewhere all you get when you try to log in is “Authentication Error” message with no clue of what the actual error is.

Committing configuration changes

There three common places where you can make a mistake when setting up LDAP authentication on UCS. Number one is committing configuration changes to the Fabric Interconnects in UCS Manager.

There are four configuration options which you need to set to enable Active Directory authentication to your domain:

  • LDAP Providers – these are your domain controllers
  • LDAP Provider Groups – are used to group multiple domain controllers of the same domain
  • LDAP Group Maps – where you give permissions to your AD groups and users
  • Authentication Domains – final configuration step where you enable authentication via the domain

Now if you decide to delete a LDAP Provider Group which is configured under an Authentication Domain in attempt to change the settings, this may become an issue.

What is confusing here is UCS Manager will let you delete the LDAP Provider Group, save the changes and LDAP Provider Group will disappear from the list. And you may legitimately conclude that it’s deleted from UCS, but it’s actually not. This is what you’ll see in UCS Manager logs:

[FSM:STAGE:STALE-FAIL]: external aaa server configuration to primary(FSM-STAGE:sam:dme:AaaEpUpdateEp:SetEpLocal)
[FSM:STAGE:REMOTE-ERROR]: Result: resource-unavailable Code: ERR-ep-set-error Message: Re-ordering/Deletion of Providers cannot be applied while ldap is used for yourdomain.com(Domain) authentication(sam:dme:AaaEpUpdateEp:SetEpLocal)

The record will stay on the UCS and you may encounter very confusing issues where you change your LDAP Provider settings but changes are not reflected on UCS. So make sure to delete the object from the higher level entity first.

Distinguished Name typos

There are two ways to group Active Directory entities on a domain controller – Security Groups and Organizational Units. When configuring your AD bind account in LDAP Providers section and setting up permissions in LDAP Group Maps, make sure to not confuse the two. The best advice I can give – always use ADSI Edit tool to find the exact DN. Why? As an example let’s say you want to give permissions to the builtin administrator group and you use the following DN:

CN=Administrators,OU=Builtin,DC=yourdomain,DC=com

This won’t work, because even though Builtin container may look like a OU, it’s actually a CN in AD, as well as Users and Computers containers.

adsi_edit.JPG

ADSI Edit will give you the exact Distinguished Name. Make sure to use it to save yourself the hassle.

Group Authorization settings

Last but not least are the following two LDAP Provider configuration settings:

  • Group Authorization – whether UCS searches within groups when authenticating
  • Group Recursion – whether UCS searches groups recursively

If you add an AD group which the user is a part of in LDAP Group Maps and do not enable Group Authorization, UCS simply won’t search within the group. Enable this option unless you give permissions only on a per user basis.

Second option enables recursive search within AD groups. If you have nested groups in AD (which most people have) enable recursive search or UCS won’t look deeper than 1 level.

If you get really stuck

If you’ve set all the settings up and are certain they the are correct, but authentication still doesn’t work, then there is a relatively easy way to localize the issue.

First step is to check whether UCS can bind to your LDAP Providers and authenticate users. Pick a user (LDAP Group Maps don’t matter at this point), SSH to a Fabric Interconnect and type the following:

ucs # connect nxos
ucs(nxos)# test aaa server ldap yourdc.yourdomain.com john password123

yourdc.yourdomain.com – is the domain controller you’ve configured in LDAP Providers section. If authentication doesn’t work, then the issue is in LDAP Provider settings.

If you can authenticate, then the next step is to make sure that UCS searches through the right AD groups. To check that you will need to enable LDAP authentication logging on a Fabric Interconnect:

ucs # connect nxos
ucs(nxos)# debug ldap aaa-request-lowlevel

Now try to authenticate and look through the list of groups which UCS is searching through. If you can’t see the group which your user is a part of, then you most likely using a wrong DN in LDAP Group Maps.

In my case the settings are configured correctly and I can see that UCS is searching in the Builtin Administrators group:

2015 Dec 1 14:12:19.581737 ldap: value: CN=Enterprise Admins,CN=Users,DC=yourdomain,DC=com
2015 Dec 1 14:12:19.581747 ldap: ldap_add_to_groups: Discarding. group map not configured for CN=Enterprise Admins,CN=Users,DC=yourdomain,DC=com
2015 Dec 1 14:12:19.581756 ldap: value: CN=Administrators,CN=Builtin,DC=yourdomain,DC=com
2015 Dec 1 14:12:19.581767 ldap: ldap_add_to_groups: successfully added group:CN=Administrators,CN=Builtin,DC=yourdomain,DC=com
2015 Dec 1 14:12:19.581777 ldap: value: CN=Exchange Organization Administrators,OU=Microsoft Exchange Security Groups,DC=yourdomain,DC=com

Make sure to disable logging when you’re done:

ucs(nxos)# undebug all

References: