Posts Tagged ‘monitoring’

First Look at UCS Performance Manager

May 12, 2016

Overview

perf_gaugeCisco UCS has been in the market for seven years now. It was quite expensive blade chassis when it was first introduced by Cisco in March 2009, but has reached the price parity with most of the server vendors these days.

Over the course of the last seven years Cisco has built a great set of products, which helps UCS customers in various areas:

  • UCS Central for configuration management across multiple Cisco UCS domains
  • UCS Director for infrastructure automation not only of UCS, but also network, storage and virtualization layers (don’t expect it to support any other vendors than Cisco for IP networks, though)
  • UCS Performance Manager for performance monitoring and capacity planning, which can also tap into your network, storage, virtualization and even individual virtual machines

UCS Performance Manager

UCS Performance Manager was first released in October 2014. The product comes in two versions – full and express. PM Express covers only servers, hypervisors and operating systems. The full version on top of that supports storage and network devices. Product is licensed on a per UCS server basis. So you don’t pay for additional network/storage devices or hypervisors.

PM supports vSphere hypervisor (plus Hyper-V), Cisco networking and EMC VNX / EMC VMAX / NetApp FAS storage arrays. By the list of the supported products you may quickly guess that the full version of Performance Manager is targeted mainly at NetApp FlexPod, VCE Vblock and EMC VSPEX customers.

Product architecture

UCS Performance Manager can be downloaded and quickly deployed as a virtual appliance. You might be shocked when you start it up first time, as the appliance by default comes configured with 8 vCPUs and 40GB of RAM. If you’re using it for demo purposes you can safely reduce it to something like 2-4 vCPUs and 8-12GB of RAM. You will experience some slowdowns during the startup, but performance will be acceptable overall.

UCS PM is built on Zenoss monitoring software and is essentially a customized version of Zenoss Service Dynamics with Cisco UCS ZenPacks. You may notice references to Zenoss throughout the management GUI.

ucspm_zenoss

Two main components of the solution are the Control Center and the Performance Manager itself. Control Center is a container orchestration product, which runs Performance Manager as an application in Docker containers (many containers).

ucspm_docker

When deploying Performance Manager you start with one VM and then you can scale to up to four VMs total. Each of the VMs can run in two modes – master or agent. When you deploy the first VM you will have to select it’s role at first login. You have to have one master host, which also runs an agent. And if you need to scale you can deploy three additional agent VMs and build a ZooKeeper cluster. One master host can support up to 500 UCS servers, when configured with 8 vCPUs and 64GB of RAM. Depending on your deployment size you may not ever need to scale to more than one Performance Manager VM.

Installation

After you’ve deployed the OVA you will need to log in to the VM’s CLI and change the password, configure the host as a master, set up a static IP, DNS, time zone, hostname and reboot.

Then you connect to Control Center and click “+ Application” button in the Applications section and deploy UCS PM on port 4979. For the hostname use Control Center’s hostname.

deploy_ucspm

Once the UCS PM application is deployed, click on the Start button next to UCS PM line in the Applications section

start_ucspm

Performance manager is accessible from a separate link which is Control Center’s hostname prefixed with “ucspm”. So if your CC hostname is ucspm01.domain.local, UCS PM link will be https://ucspm.ucspm01.domain.local:443. You can see it in Virtual Host Names column. You will have to add an alias in DNS which would point from ucspm.ucspm01.domain.local to ucspm01.domain.local, otherwise you won’t be able to connect to it.

When you finally open UCS PM you will see a wizard which will ask you to add the licences, set an admin account and add your UCS chassis, VMware vCenters and UCS Central if you happen to have one. In the full version you will have a chance to add storage and network devices as well.

ucspm_wizard

UCS performance monitoring

Probably the easiest way to start working with Performance Manager is to jump from the dashboard to the Topology view. Topology view shows your UCS domain topology and provides an easy way to look at various components from one screen.

ucspm_topology

Click on the fabric interconnect and you can quickly see the uplink utilization. Click on the chassis and you will get summarized FEX port statistics. How about drilling down to a particular port-channel or service profile or vNIC? UCS Performance Manager can give you the most comprehensive information about every UCS component with historical data up to 1 year based on the default storage configuration.

north_traffic

Another great feature you may want to straight away drill down into is Bandwidth Usage, which gives you an overview of bandwidth utilization across all UCS components, which you can look at from a server or network perspective. This can let you quickly identify such things as uneven workload distribution between the blades or maybe uneven traffic distribution between fabric interconnect A and B side or SAN/LAN uplinks going to the upstream switches.

ucspm_bandwidth

You can of course also generate various reports to determine your total capacity utilization or if you’re for example planning to add memory to your blades, you can quickly find out the number of DIMM slots available in the corresponding report.

memory_slots

VMware performance monitoring

UCS Performance Manager is not limited to monitoring only Cisco UCS blade chassis even in the Express version. You can add your hypervisors and also individual virtual machines. Once you add your vCenter to the list of the monitored devices you get a comprehensive list of VMware components, such as hosts, VMs, datastores, pNICs, vNICs and associated performance monitoring graphs, configuration information, events, etc.

Performance Manager can correlate VMware to UCS components and for example for a given VM provide you FC uplink utilization on the corresponding fabric interconnects of the chassis where this VM is running:

vmware_stats

If you want to go further, you can add individual VMs to Performance Manager, connected via WinRM/SSH or SNMP. Some cool additional functionality you get, which is not available in VMware section is the Dynamic View. Dynamic View lets you see VM connectivity from the ESXi host it’s running on all the way through to blade, chassis, vNIC, VIC, backplane port, I/O module and fabric interconnect. Which is very helpful for troubleshooting connectivity issues:

dynamic_view

Conclusion

UCS Performance Manager is not the only product for performance monitoring in virtualized environments. There are many others, VMware vRealize Operations Manager is one of the most popular of its kind. But if you’re a Cisco UCS customer you can definitely benefit from the rich functionality this product offers for monitoring UCS blade chassis. And if you are a lucky owner of NetApp FlexPod, VCE Vblock or EMC VSPEX, UCS Performance Manager for you is a must.

pm_dashboard

Monitoring ESX Storage Queues

July 30, 2013

6a00d8341c328153ef01774354e2fd970d-500wiQueue Limits

I/O data goes through several storage queues on its way to disk drives. VMware is responsible for VM queue, LUN queue and HBA queue. VM and LUN queues are usually equal to 32 operations. It means that each ESX host at any moment can have no more than 32 active operations to a LUN. Same is true for VMs. Each VM can have as many as 32 active operations to a datastore. And if multiple VMs share the same datastore, their combined I/O flow can’t go over the 32 operations limit (per LUN queue for QLogic HBAs has been increased from 32 to 64 operations in vSphere 5). HBA queue size is much bigger and can hold several thousand operations (4096 for QLogic, however I can see in my config that driver is configured with 1014 operations).

Queue Monitoring

You can monitor storage queues of ESX host from the console. Run “esxtop”, press “d” to view disk adapter stats, then press “f” to open fields selection and add Queue Stats by pressing “d”.

AQLEN column will show the queue depth of the storage adapter. CMDS/s is the real-time number of IOPS. DAVG is the latency which comes from the frame traversing through the “driver – HBA – fabric – array SP” path and should be less than 20ms. Otherwise it means that storage is not coping. KAVG shows the time which operation spent in hypervisor kernel queue and should be less than 2ms.

Press “u” to see disk device statistics. Press “f” to open the add or remove fields dialog and select Queue Stats “f”. Here you’ll see a number of active (ACTV) and queue (QUED) operations per LUN.  %USD is the queue load. If you’re hitting 100 in %USD and see operations under QUED column, then again it means that your storage cannot manage the load an you need to redistribute your workload between spindles.

Some useful documents:

PowerShell script for disk space

November 3, 2011

I’ve written simple PowerShell script for monitoring of free space on one of our important volumes. As well as inserting it as plaint text below, for the sake of convenience I uploaded the script as a text file on FileDen file hosting service. Here is the link to the file.

$path = "F:\script\dspace.txt"
Clear-Content $path
$server = "SERVERNAME"
$recps = "qwe@contoso.com", "wer@contoso.com", "ert@contoso.com"
$sender = "alerts@contoso.com"
$user = "alerts"
$passw = "123456"
$thold = 3
$drives = Get-WmiObject -ComputerName $server Win32_LogicalDisk | `
Where-Object {$_.DriveType -eq 3}
$i = 0

echo "This message is to inform you that server $server is low on disk space.
Please carry out corrective actions. Find details below. `n" >> $path

foreach($drive in $drives) {
	$size_gb = $drive.size / 1GB
	$size_gb_fmt = "{0:N2}" -f $size_gb
	$free_gb = $drive.freespace / 1GB
	$free_gb_fmt = "{0:N2}" -f $free_gb
	$ID = $drive.DeviceID
	$pct = $free_gb / $size_gb * 100
	$pct_fmt = "{0:N0}" -f $pct

	if (($ID -eq "F:") -and ($free_gb -lt $thold)) {
		echo "Server Name: $server" >> $path
		echo "Drive Letter: $ID" >> $path
		echo "Drive Size: $size_gb_fmt GB" >> $path
		echo "Free Space: $free_gb_fmt GB" >> $path
		echo "Percent Free: $pct_fmt%" >> $path
		$i++
        }
}

if ($i -gt 0) {
	$smtpServer = "mail.contoso.com"
	$smtp = New-Object Net.Mail.SmtpClient($smtpServer)
	$smtp.Credentials = New-Object System.Net.NetworkCredential($user, $passw)
	$msg = new-object Net.Mail.MailMessage
	$msg.From = $sender
	foreach ($recp in $recps) {
		$msg.To.Add($recp)
	}
	$subject = "Disk space on $server is under $thold GB"
	$msg.Subject = $subject
	foreach ($line in Get-Content $path) {
		$body += "$line `n"
	}
	$msg.Body = $body
	$smtp.Send($msg)
}

To run this script from scheduler just write another simple .bat file with following line inside:

powershell -command “& ‘.\dspace.ps1′”