Archive for the ‘Server’ Category

Dell Repository Manager: Bootable ISO Issues

May 23, 2016

problem_solutionIn one of my previous posts I described the process of upgrading a Dell FX2 chassis firmware using Dell Repository Manager (DRM).

In an ideal world you just follow the process and in an hour or two you can get your chassis upgraded. You may sometimes run into issues. I want to go through some of them in this post, including possible remediation.

Issue Description

When exporting firmware to a bootable ISO you can find DRM not being able to download some of the bundle components with the following error in the Job Result:

Processing failed:
Failed downloading files:
Diagnostics_Application_PWMC8_LN64_OSC_1.1_A00.BIN

And errors in the Log:


60. 24/03/2016 5:58:50 PM Export to Bootable ISO : Downloaded 34 / 56
61. 24/03/2016 5:59:44 PM Export to Bootable ISO : Error downloading some files
62. 24/03/2016 5:59:45 PM Export to Bootable ISO : Failed exporting to Bootable ISO.

Workaround #1: Skip the Component

You can try the following option “Continue download irrespective of any error (in the selected components)” in the export dialog. It won’t help to get the component downloaded, but you will got a bootable ISO.

However, DRM will still keep the failed component in the bundle and try to install it during the upgrade, which will obviously fail (update 16/56):

failed_update

Once the upgrade is finished you will get the following error at the end:

Note: Some update requires machine reboot. Please reboot to CD/DVD to continue for the failed update because of the dependency…

upgrade_status

No matter how many times you reboot you will obviously get the same errors. You can ignore it if you 100% sure this is what causes the upgrade to fail or use Workaround #2.

Workaround #2: Create Custom ISO

When you create a repository in DRM it’s populated with pre-built components and bundles. But you can create custom repositories. The idea is that you can exclude the failed component from the repository by creating it manually.

Assuming you already have the base repository configured, do the following:

  • Open the existing repository and click on the Components tab
  • Deselect the failed component in the component list (in my case it was Diagnostics_Application_PWMC8_LN64_OSC_1.1_A00.BIN)
  • Click on the “Copy To” button:

custom_components

  • In the opened dialogue select “Create NEW Repository and copy component(s) into it”
  • Follow the wizard and when you click finish, components will be copied to the newly created repository
  • Open the new repository and click on the Componenets tab
  • Select all components and click on the “Copy to” button once again
  • This time select “Create a NEW Bundle in the same repository and add component(s) into it”
  • On the next screen give the bundle a name and make sure to choose “Linux 32-bit and 64-bit” in the OS Type

custom_bundle

As a result you should get a new bundle created which you can export to a bootable ISO using the same process.

Workaround #3: Use Server Update Utility

If none of the above helps you can fall back to a proven upgrade approach and use Server Update Utility (SUU). SUU is a huge 12GB ISO to download, but you can use Dell Download Manager, which supports resuming interrupted downloads. Make sure to disable proxy! Dell Download Manager does not support resuming an interrupted download if you’re using a proxy server.

SUU is not a bootable ISO. Previously you had to use Dell Systems Build and Update Utility (SBUU) to boot from it first and then mount the ISO to proceed with the upgrade. Starting with Dell 11G servers you don’t need it anymore and can upgrade firmware straight form Dell Lifecycle Controller (LC).

You’ll need to boot into the Lifecycle Controller and choose Firmware Update > Launch Firmware Update > Local Drive(CD or DVD or USB). Mount the SUU ISO and the rest is fairly straightforward. LC will upgrade the firmware and reboot the blade.

lc_upgrade

Conclusion

Dell Repository Manager is the recommended approach to upgrade firmware on Dell hardware. Unlike SUU, DRM downloads the latest updates and only the necessary components. It is also capable of making a bootable ISO.

If you have issues, rely on Server Update Utility as it’s bulletproof and always work. But be prepared to download a 12GB ISO image and make sure you have an option to bypass proxy.

Advertisement

First Look at UCS Performance Manager

May 12, 2016

Overview

perf_gaugeCisco UCS has been in the market for seven years now. It was quite expensive blade chassis when it was first introduced by Cisco in March 2009, but has reached the price parity with most of the server vendors these days.

Over the course of the last seven years Cisco has built a great set of products, which helps UCS customers in various areas:

  • UCS Central for configuration management across multiple Cisco UCS domains
  • UCS Director for infrastructure automation not only of UCS, but also network, storage and virtualization layers (don’t expect it to support any other vendors than Cisco for IP networks, though)
  • UCS Performance Manager for performance monitoring and capacity planning, which can also tap into your network, storage, virtualization and even individual virtual machines

UCS Performance Manager

UCS Performance Manager was first released in October 2014. The product comes in two versions – full and express. PM Express covers only servers, hypervisors and operating systems. The full version on top of that supports storage and network devices. Product is licensed on a per UCS server basis. So you don’t pay for additional network/storage devices or hypervisors.

PM supports vSphere hypervisor (plus Hyper-V), Cisco networking and EMC VNX / EMC VMAX / NetApp FAS storage arrays. By the list of the supported products you may quickly guess that the full version of Performance Manager is targeted mainly at NetApp FlexPod, VCE Vblock and EMC VSPEX customers.

Product architecture

UCS Performance Manager can be downloaded and quickly deployed as a virtual appliance. You might be shocked when you start it up first time, as the appliance by default comes configured with 8 vCPUs and 40GB of RAM. If you’re using it for demo purposes you can safely reduce it to something like 2-4 vCPUs and 8-12GB of RAM. You will experience some slowdowns during the startup, but performance will be acceptable overall.

UCS PM is built on Zenoss monitoring software and is essentially a customized version of Zenoss Service Dynamics with Cisco UCS ZenPacks. You may notice references to Zenoss throughout the management GUI.

ucspm_zenoss

Two main components of the solution are the Control Center and the Performance Manager itself. Control Center is a container orchestration product, which runs Performance Manager as an application in Docker containers (many containers).

ucspm_docker

When deploying Performance Manager you start with one VM and then you can scale to up to four VMs total. Each of the VMs can run in two modes – master or agent. When you deploy the first VM you will have to select it’s role at first login. You have to have one master host, which also runs an agent. And if you need to scale you can deploy three additional agent VMs and build a ZooKeeper cluster. One master host can support up to 500 UCS servers, when configured with 8 vCPUs and 64GB of RAM. Depending on your deployment size you may not ever need to scale to more than one Performance Manager VM.

Installation

After you’ve deployed the OVA you will need to log in to the VM’s CLI and change the password, configure the host as a master, set up a static IP, DNS, time zone, hostname and reboot.

Then you connect to Control Center and click “+ Application” button in the Applications section and deploy UCS PM on port 4979. For the hostname use Control Center’s hostname.

deploy_ucspm

Once the UCS PM application is deployed, click on the Start button next to UCS PM line in the Applications section

start_ucspm

Performance manager is accessible from a separate link which is Control Center’s hostname prefixed with “ucspm”. So if your CC hostname is ucspm01.domain.local, UCS PM link will be https://ucspm.ucspm01.domain.local:443. You can see it in Virtual Host Names column. You will have to add an alias in DNS which would point from ucspm.ucspm01.domain.local to ucspm01.domain.local, otherwise you won’t be able to connect to it.

When you finally open UCS PM you will see a wizard which will ask you to add the licences, set an admin account and add your UCS chassis, VMware vCenters and UCS Central if you happen to have one. In the full version you will have a chance to add storage and network devices as well.

ucspm_wizard

UCS performance monitoring

Probably the easiest way to start working with Performance Manager is to jump from the dashboard to the Topology view. Topology view shows your UCS domain topology and provides an easy way to look at various components from one screen.

ucspm_topology

Click on the fabric interconnect and you can quickly see the uplink utilization. Click on the chassis and you will get summarized FEX port statistics. How about drilling down to a particular port-channel or service profile or vNIC? UCS Performance Manager can give you the most comprehensive information about every UCS component with historical data up to 1 year based on the default storage configuration.

north_traffic

Another great feature you may want to straight away drill down into is Bandwidth Usage, which gives you an overview of bandwidth utilization across all UCS components, which you can look at from a server or network perspective. This can let you quickly identify such things as uneven workload distribution between the blades or maybe uneven traffic distribution between fabric interconnect A and B side or SAN/LAN uplinks going to the upstream switches.

ucspm_bandwidth

You can of course also generate various reports to determine your total capacity utilization or if you’re for example planning to add memory to your blades, you can quickly find out the number of DIMM slots available in the corresponding report.

memory_slots

VMware performance monitoring

UCS Performance Manager is not limited to monitoring only Cisco UCS blade chassis even in the Express version. You can add your hypervisors and also individual virtual machines. Once you add your vCenter to the list of the monitored devices you get a comprehensive list of VMware components, such as hosts, VMs, datastores, pNICs, vNICs and associated performance monitoring graphs, configuration information, events, etc.

Performance Manager can correlate VMware to UCS components and for example for a given VM provide you FC uplink utilization on the corresponding fabric interconnects of the chassis where this VM is running:

vmware_stats

If you want to go further, you can add individual VMs to Performance Manager, connected via WinRM/SSH or SNMP. Some cool additional functionality you get, which is not available in VMware section is the Dynamic View. Dynamic View lets you see VM connectivity from the ESXi host it’s running on all the way through to blade, chassis, vNIC, VIC, backplane port, I/O module and fabric interconnect. Which is very helpful for troubleshooting connectivity issues:

dynamic_view

Conclusion

UCS Performance Manager is not the only product for performance monitoring in virtualized environments. There are many others, VMware vRealize Operations Manager is one of the most popular of its kind. But if you’re a Cisco UCS customer you can definitely benefit from the rich functionality this product offers for monitoring UCS blade chassis. And if you are a lucky owner of NetApp FlexPod, VCE Vblock or EMC VSPEX, UCS Performance Manager for you is a must.

pm_dashboard

Painless Dell FX2 Firmware Upgrade

April 10, 2016

Overview

Recently I’ve had a chance to play with Dell’s FX2 chassis for a bit. Dell FX2 falls into the category of blade chassis and can hold up to 8 blades with Atom or 4 blades with Xeon CPUs in a 2U chassis.

Dell_FX2

Besides the compute blades FX2 also supports storage blades, which you can dedicate to particular compute blades and use as additional storage.

On the networking side you can choose from either pass-through modules or three types of I/O aggregators – four 10G SFP+ ports, four 10GBASE-T ports, or two Fibre Channel plus two SFP+ external ports.

The chassis itself also comes in two flavors – FX2 or FX2s. The main difference between the two is that FX2s additionally has PCIe slots at the back, which can be mapped to the server blades to provide additional connectivity.

Dell_FX2_Rear

First step of every hardware solution deployment is a firmware upgrade. But when it comes to firmware on Dell blade equipment be it M1000e, VRTX or FX2 you can quickly get confused. Especially when you go to the blade section and see a dozen of hardware components. Download and update each of them individually would be daunting. Fortunately there is an better way.

blade_firmware

CMC Firmware

Upgrade starts from the chassis management controller, which has two components: Chassis Infrastructure Firmware (or Main Board) and the CMC itself. You can find them on the Chassis Overview > Update tab.

CMC firmware comes as an .exe package, which you can extract. You really need just the fx2_cmc.bin file. During upgrade you will lose access to CMC for 5-10 minutes, while CMC is rebooting.

For the infrastructure firmware you’ll need the fx2_mainboard.bin file. The gotcha with the infrastructure firmware upgrade is that you’ll need all blades to be powered off. So if you have just one chassis this might be tricky.

Blade Firmware

Blades firmware is where this gets interesting. You can certainly upgrade all blades from the CMC by downloading firmware from the Dell support web-site and choosing one component at a time in Chassis Overview > Server Overview > Update section. CMC is capable of upgrading say iDRAC across all blades simultaneously, but it’s still about a dozen components.

The easier approach would be to use Dell Repository Manager (DRM). DRM can download firmware for virtually any blade or rack server (including some of the storage and network hardware) and build a bootable ISO image for an easy upgrade.

To build a bootable ISO follow the following steps:

  • Download and install Dell Repository Manager from the Dell support web-site
  • Add a source by going to Source > View Dell Online Catalog
  • Create a repository by going to Repository > New > Create New Repository
  • In the wizard select your hardware (I selected PowerEdge FC630 from the Blade category) and choose Linux (32-bit and 64-bit) as a DUP format (I’ll explain that later).
  • Go to the newly created repository, select the bundle and click Export

export_bundle

DRM can export bundles in multiple forms, we are interested in a bootable ISO and this is why we selected the Linux DUP format when we created the repository. DRM creates a Linux bootable ISO, so there was no point selecting Windows bundles.

  • Select “Create Bootable ISO (Linux Only)” and continue with the default settings for the rest

As a result you will get an .iso file, which you can mount to the server via iDRAC Remote Console and boot from it for a firmware upgrade.

Network I/O Aggregators

FX2 I/O aggregators are Dell Force10 switches, which use Force10 OS (FTOS). FTOS firmware is NOT available from the Dell web-site. You’ll need to register an account at https://www.force10networks.com to download the firmware.

Make sure to download firmware release specifically built for FX2 I/O aggregators, which can be found in M-Series Software section.

aggregators_firmware

To upgrade the aggregators go to Chassis Overview > I/O Module Overview > Update. Aggregators reset after a reboot, so make sure to upgrade them one at a time. Or if you stacked them instead of using VLT or standalone mode, you’ll have to have a downtime, as stacked switches reboot together.

Conclusion

There is nothing fancy in upgrading firmware on a blade chassis, you want it to be quick and painless. Make sure to use Dell Repository Manager for blades upgrade. It may save you heaps of time and make your life easier.

Upgrading Cisco UCS Fabric Interconnects

March 17, 2016

I have to do this first, as this is a high-risk change for any environment:

disclaimerDISCLAMER: I ACCEPT NO RESPONSIBILITY FOR ANY DAMAGE OR CORRUPTION OF DATA THAT MAY OCCUR AS A RESULT OF CARRYING OUT STEPS DESCRIBED BELOW. YOU DO THIS AT YOUR OWN RISK.

And now to the point. Cisco has two generations of Fabric Interconnects with the third generation released just recently. There is 6100 series, which includes 6120XP and 6140XP. Second generation is 6200 series, which introduced unified ports and also has two models in its range – 6248UP and 6296UP. And there is now a third generation of 40Gb fabric interconnects with 6324, 6332 and 6332-16UP models.

We are yet to see mass adoption of 40Gb FIs. And some of the customers are still upgrading from the first to the second generation.

In this blog post we will go through the process of upgrading 6100 fabric interconnects to 6200 by using 6120 and 6248 as an example.

Prerequisites

Cisco UCS has a pair of fabric interconnects which work in an active/passive mode from a control plane perspective. This lets us do an in-place upgrade of a FI cluster by upgrading interconnects one at a time without any further reconfiguration needed in UCS Manager in most cases.

For a successful upgrade old and new interconnects MUST run on the same firmware revision. That means you will need to upgrade the first new FI to the same firmware before you can join it to the cluster to replace the first old FI.

This can be done by booting the FI in a standalone mode, giving it an IP address and installing firmware via UCS Manager.

The second FI won’t need a manual firmware update, because when a FI of the same hardware model is joined to a cluster it’s upgraded automatically from the other FI.

Preparation tasks

It’s a good idea to make a record of all connections from the current fabric interconnects and make a configuration backup before an upgrade.

ucs_backup

If you have any unused connections which you’re not planning to move, it’s a good time to disconnect the cables and disable these ports.

Cisco strongly suggests to also upgrade the firmware on all software and hardware components of the existing UCS to the latest recommended version first.

Upgrading firmware on the first new FI

Steps to upgrade firmware on the first new fabric interconnect are as follows:

  • Rack and stack the new FI close enough to the old interconnects to make sure all cables can reach it.
  • Connect a console cable to the new FI, boot it up and when you are asked “Is this Fabric interconnect part of a cluster”, select NO to boot the FI in a standalone mode.
  • Assign an IP address to the FI and connect to it using UCS Manager.
  • Upgrade the firmware, which will reboot the fabric interconnect.
  • Reset the configuration on the FI, which will cause another reboot:
    • # connect local-mgmt
      # erase config

  • Once the FI is upgraded and reset to factory defaults you can proceed with joining it to the cluster.

Replacing the first FI

  • Determine which old FI is in the subordinate mode (upgrade a FI only if it’s in subordinate mode!) and disable server ports on it.
  • Shut down the old subordinate FI.
  • Move L1/L2, management, server and Ethernet/FC/FCoE uplink ports to the new FI.
  • Boot the new FI. This time the new FI will detect the presence of the peer FI. When you see the following prompt type YES:
    • Installer has detected the presence of a peer Fabric interconnect. This Fabric interconnect will be added to the cluster. Continue (y/n) ?

  • Follow the console prompts and assign an IP address to the new FI. The rest of the settings will be pulled from the peer FI.

Once the new FI joins the cluster you should see the following equipment topology in UCS Manager (This screenshot was made after the primary role had been moved to the new FI. Initially you should see the new FI as subordinate.):

two_fis

  • At this stage make sure that all configuration has been applied to the new FI and you can see all LAN and SAN uplinks and port channels.
  • Enable server ports on the new FI and reacknowledge all chassis.

Reacknowledging a chassis might be disruptive to the traffic flow from the blades. So make sure you don’t have any production workloads running on it. If you have two chassis and enough capacity to run all VMs on either of them, you can temporarily move VMs between the chassis and reacknowledge one chassis at a time.

Replacing the second FI

You will need to promote the new FI to be the primary, before proceeding with an upgrade of the second FI. To change the roles, use SSH to log in to the old FI, which is currently the primary (you can’t change roles from the subordinate FI) and run the following commands:

# connect local-mgmt
# cluster lead b
# show cluster state

The rest of the process is exactly the same.

After the upgrade, if needed, reconfigure any of the links which may have had their port numbers changed, such as if you had an expansion module in the old FIs, but not on the new FIs.

References

Cisco has a guide which has a step by step procedures for upgrading fabric interconnects, I/O modules, VIC cards as well as rack-mount servers. Refer to this guide for any further clarifications:

 

Traffic Load Balancing in Cisco UCS

December 21, 2015

Whenever I deploy a Cisco UCS at a customer the question I get asked a lot is how traffic flows within the system between VMs running on the blades and FEX modules, FEX modules and Fabric Interconnects and finally how it’s uplinked to the network core.

Cisco has a range of CNA cards for UCS blades. With VIC 1280 you get 8 x 10Gb ports split between two FEX modules for redundancy. And FEX modules on their own can have up to 8 x 10Gb Fabric Interconnect facing interfaces, which can give you up to 160Gb of bandwidth per chassis. And all these numbers may sound impressive, but unless you understand how your VMs traffic flows through UCS it’s easy to make wrong assumptions on what per VM and aggregate bandwidth you can achieve. So let’s dive deep into UCS and shed some light on how VM traffic is load-balanced within the system.

UCS Hardware Components

Each Fabric Extender (FEX) has external and internal ports. External FEX ports are patched to FIs and internal ports are internally wired to the blade adapters. FEX 2204 has 4 external and 16 internal and FEX 2208 has 8 external and 32 internal ports.

External ports are connected to FIs in powers of two: 1, 2, 4 or 8 ports per FEX and form a port channel (make sure to use “Port Channel” link grouping preference under Chassis/FEX Discovery Policy). Same rule is applied to blade Virtual Interface Cards (VIC). The most common VIC 1240 and 1280 have 4 x 10Gb and 8 x 10Gb ports respectively and also form a port channel to the internal FEX ports. Every VIC adaptor is connected to both FEX modules for redundancy.

chassis_network

Fabric Interconnects are then patched to your network core and FC Fabric (if you have one). Whether Ethernet uplinks will be individual uplinks or port channels will depend on your network topology. For fibre uplinks the rule of thumb is to patch FI A to your FC Fabric A and FI B to FC Fabric B, which follows the common FC traffic isolation principle.

Virtual Circuits

To provide network and storage connectivity to blades you create virtual NICs and virtual HBAs on each blade. Since internally UCS uses FCoE to transfer FC frames, both vNICs and vHBAs use the same 10GbE uplinks to send and receive traffic. Worth mentioning that Cisco uses Data Center Bridging (DCB) protocol with it’s sub-protocols Priority Flow Control (PFC) and Enhanced Transmission Selection (ETS), which guarantee that FC frames have higher priority in the queue and are processed first to ensure low latency. But I digress.

UCS assigns a virtual circuit to each virtual adaptor, which is a representation of how the traffic traverses the system all the way from the VIC port to a FEX internal port, then FEX external port, FI server port and finally a FI uplink. You can trace the full path of each virtual adaptor in UCS Manager by selecting a Service Profile and viewing the VIF Paths tab.

vif_paths

In this example we have a blade with four vNICs and two vHBAs which are split between two fabrics. All virtual adaptors on fabric A are connected through VIC port channel PC-1283 which is represented as port channel PC-1025 on the FEX A side. Then traffic leaves FEX A and reaches the Fabric Interconnect A which sends the traffic out to the network core through port channel A/PC-1.

You can also get the list of port channels from the FI CLI:

# connect nxos
# show port-channel summary

ucs_portchannels

Network Load Balancing

Now that we know how all components are interconnected to each other, let’s discuss the traffic flow in a typical VMware environment and how we achieve the massive network throughput that UCS provides.

As an example let’s take a look at the vSwitch where your VM Network port group is configured. vSwitch will have two uplinks – one goes to Fabric A and the other one to Fabric B for redundancy. Default load balancing policy on a vSwitch is “Route based on the originating port ID”, which essentially pins all traffic for a VM to a particular uplink. vSphere makes sure that VMs are evenly distributed between the uplinks to use all network bandwidth available.

From each uplink (or vNIC in UCS world) traffic is forwarded through an adapter port channel to a FEX, then to a Fabric Interconnect and leaves UCS from a FI uplink. Within UCS traffic is distributed between port channel members using source/destination IP hash algorithm. Which is even more granular and is capable of very efficient traffic distribution between all members of a port channel all the way up to your network core.

ucs_loadbalancing

If you look at the vSwitch you’ll see that with UCS each uplink shows the maximum available bandwidth from vNIC and is not limited to a port channel member speed of 10Gb. Why is this so powerful? Because with UCS you don’t need to slice adapter’s available bandwidth between different types of traffic. Even though you provision multiple vNICs and vHBAs for the vSphere hosts, UCS uses the same port channel links (20Gb in the example below) from the VIC adapter to transfer all traffic and takes care of load balancing for you.

vswitch_uplinks

You may legitimately ask, if UCS uses the same pipe to transfer all data regardless of which vSwitch uplink is being used, then how can I make sure that different types of traffic, such as vMotion, storage, VM traffic, replication, etc, do not compete for the same pipe? First you need to ask yourself if you can saturate that much bandwidth with your workloads. If the answer is yes, then you can use another great feature available in UCS, which is QoS. QoS lets you assign a minimum available bandwidth guarantee on a per vNIC/vHBA basis. But that’s a topic for another blog post.

References

In this post I tried to summarise the logic behind UCS traffic distribution. If you want to dig deeper in UCS network architecture, then there’re a lot of great bloggers out there. I would like to call out the following authors: