Posts Tagged ‘cisco’

Puppet Camp 2016 Recap

December 4, 2016

puppet-campLast week I had a chance to attend Puppet Camp 2016. Puppet Camp is a one day event that is held once a year in many places around the world including Australia. This time it was the fourth Melbourne conference, which gathered 240 attendees and several key partners, such as NetApp, Diaxion and Katana1.

In this blog post I want to give a quick overview of the keynote, customer and partner sessions, as well as my key takeaways from the conference.

First Impressions

I’ve never been to Puppet Camp before and this was my first experience. Sheer number of participants clearly shows that areas of configuration management and DevOps in general attracts a lot of attention of both customers and channel.

imag5067_2

You may have heard how Cisco in Q3 of 2015 announced Puppet support for the Nexus 3000 and 9000 series switches. This was not just an accident. I had a chance to speak to NetApp, who was one of the vendors presented at the conference, and they now have Puppet integration with their Data ONTAP / FAS platform, as well as E-Series and recently acquired SolidFire line of storage arrays. I’m sure many other hardware vendors will follow.

Keynote and Puppet Update

The conference had one track of sessions spread out throughout the day and was opened by a keynote from Robert Finn, APJ Sales Director at Puppet, who was talking about the raising complexity of modern IT environments and challenges that come with it. We have gone from tens of servers to hundreds of VMs and are now on the verge of the next evolution from hundreds of VMs to thousands of containers. We can no longer manage environments manually and that is where tools such as Puppet come into play and let us manage configuration and provisioning at scale.

Rob also mentioned the “State of DevOps Report” an annual survey Puppet has been running now for five years in a row. In 2016 they collected responses from 4600 technical professionals and shared a lot of their findings in a public report, which I’ll link in the references section below.

state_of_devops

Key takeaways: introducing configuration management in their software development practices organizations were able to achieve 3x lower change failure rate and 24x faster recovery from failures.

Ronny Sabapathee, Puppet Solutions Engineer gave an overview of the new features in the latest Puppet Enterprise 2016.4, such as corrective change reporting, changes to Puppet Orchestrator, enhancements in Code Manager and API improvements.

Key takeaways: Puppet ecosystem is growing quickly with Docker module, Jenkins plugin, significant enhancements in Azure module and VMware vRealize Automation/Orchestrator integration coming soon.

Customer Sessions

Rob Kenefik from SpecSavers spoke about their journey of scaling free version of Puppet from 10 to 290 nodes, what issues they came across and what adjustments they had to make, especially around the DB back-end.

Key takeaways: don’t use embedded Puppet database for production deployments. PostgreSQL (which is now default) provides required scalability.

Steve Curtis from ANZ briefly discussed how they automated deployment of Application Performance Monitoring (APM) agents using Puppet. Steve also has a post in Puppet blog, which I’ll link below.

Chris Harwood from Healthdirect Australia touched on a sensitive topic of organizational silos and how teams become too focused on their own performance forgetting about the customers, who should be the key priority for businesses offering customer-facing services.

Then he showed how Healthdirect moved some of the ops people to development teams giving devs access to infrastructure and making them autonomous, which significantly improved their development workflows and release frequency.

Key takeaways: DevOps key challenges are around people and processes, not technology. Teams not collaborating and lengthy infrastructure change management processes can significantly hinder development teams’ performance.

Partner Sessions

Dinesh Siriwardhane who represented Versent compared pros and cons of master/agent vs. masterless Puppet deployments and showed a demo on Puppet certificate management.

Key takeaways: Puppet master simplifies centralized management, provides reporting capabilities, but can be a single point of failure. Agentless deployment using GitHub has no single points of failure and is free, but can have major security repercussions if Git repository is compromised.

imag5085_1

Kieran Sweet and Pedram Sanayei from Sourced made a presentation on Puppet integration with Azure and how using Puppet instead of just the low-level Azure APIs and PowerShell, can significantly simplify deployment and configuration management in the Microsoft cloud.

Key takeaways: Azure Resource Manager is a big step forward from the old Azure Service Management (classic deployment model). In light of the significant recent enhancements in the Azure Puppet module, this can become a reasonable alternative to AWS.

Scott Coulton from Autopilot closed the conference with a session on Puppet integration with Docker and more specifically around container orchestration tools, such as Docker Swarm, Kubernetes, Mesos and Flocker. Be sure to check Scott’s blog and GitHub repository where you can find a Puppet module for Docker Swarm, Vagrant template and more.

Key takeaways: Docker can be used to deploy containers, but Puppet is still essential to keep configuration across the hosts consistent.

Conclusion

I spoke to a lot of customers at the conference and what became apparent to me was that Puppet is not just another DevOps tool amongst the many. It has a wide ecosystem of partners and has gone a long way since they started as a small start-up 12 years ago in 2005.

It has a strong use case for general configuration management in Linux environments, as well as providing application configuration consistency as part of CI/CD pipelines.

Speaking of the conference itself I was pleasantly surprised by the quality of sessions and organization in general. Puppet Camp will definitely stay on my radar. I’d love to come back next year and geek out with the DevOps crowd again.

References

First Look at UCS Performance Manager

May 12, 2016

Overview

perf_gaugeCisco UCS has been in the market for seven years now. It was quite expensive blade chassis when it was first introduced by Cisco in March 2009, but has reached the price parity with most of the server vendors these days.

Over the course of the last seven years Cisco has built a great set of products, which helps UCS customers in various areas:

  • UCS Central for configuration management across multiple Cisco UCS domains
  • UCS Director for infrastructure automation not only of UCS, but also network, storage and virtualization layers (don’t expect it to support any other vendors than Cisco for IP networks, though)
  • UCS Performance Manager for performance monitoring and capacity planning, which can also tap into your network, storage, virtualization and even individual virtual machines

UCS Performance Manager

UCS Performance Manager was first released in October 2014. The product comes in two versions – full and express. PM Express covers only servers, hypervisors and operating systems. The full version on top of that supports storage and network devices. Product is licensed on a per UCS server basis. So you don’t pay for additional network/storage devices or hypervisors.

PM supports vSphere hypervisor (plus Hyper-V), Cisco networking and EMC VNX / EMC VMAX / NetApp FAS storage arrays. By the list of the supported products you may quickly guess that the full version of Performance Manager is targeted mainly at NetApp FlexPod, VCE Vblock and EMC VSPEX customers.

Product architecture

UCS Performance Manager can be downloaded and quickly deployed as a virtual appliance. You might be shocked when you start it up first time, as the appliance by default comes configured with 8 vCPUs and 40GB of RAM. If you’re using it for demo purposes you can safely reduce it to something like 2-4 vCPUs and 8-12GB of RAM. You will experience some slowdowns during the startup, but performance will be acceptable overall.

UCS PM is built on Zenoss monitoring software and is essentially a customized version of Zenoss Service Dynamics with Cisco UCS ZenPacks. You may notice references to Zenoss throughout the management GUI.

ucspm_zenoss

Two main components of the solution are the Control Center and the Performance Manager itself. Control Center is a container orchestration product, which runs Performance Manager as an application in Docker containers (many containers).

ucspm_docker

When deploying Performance Manager you start with one VM and then you can scale to up to four VMs total. Each of the VMs can run in two modes – master or agent. When you deploy the first VM you will have to select it’s role at first login. You have to have one master host, which also runs an agent. And if you need to scale you can deploy three additional agent VMs and build a ZooKeeper cluster. One master host can support up to 500 UCS servers, when configured with 8 vCPUs and 64GB of RAM. Depending on your deployment size you may not ever need to scale to more than one Performance Manager VM.

Installation

After you’ve deployed the OVA you will need to log in to the VM’s CLI and change the password, configure the host as a master, set up a static IP, DNS, time zone, hostname and reboot.

Then you connect to Control Center and click “+ Application” button in the Applications section and deploy UCS PM on port 4979. For the hostname use Control Center’s hostname.

deploy_ucspm

Once the UCS PM application is deployed, click on the Start button next to UCS PM line in the Applications section

start_ucspm

Performance manager is accessible from a separate link which is Control Center’s hostname prefixed with “ucspm”. So if your CC hostname is ucspm01.domain.local, UCS PM link will be https://ucspm.ucspm01.domain.local:443. You can see it in Virtual Host Names column. You will have to add an alias in DNS which would point from ucspm.ucspm01.domain.local to ucspm01.domain.local, otherwise you won’t be able to connect to it.

When you finally open UCS PM you will see a wizard which will ask you to add the licences, set an admin account and add your UCS chassis, VMware vCenters and UCS Central if you happen to have one. In the full version you will have a chance to add storage and network devices as well.

ucspm_wizard

UCS performance monitoring

Probably the easiest way to start working with Performance Manager is to jump from the dashboard to the Topology view. Topology view shows your UCS domain topology and provides an easy way to look at various components from one screen.

ucspm_topology

Click on the fabric interconnect and you can quickly see the uplink utilization. Click on the chassis and you will get summarized FEX port statistics. How about drilling down to a particular port-channel or service profile or vNIC? UCS Performance Manager can give you the most comprehensive information about every UCS component with historical data up to 1 year based on the default storage configuration.

north_traffic

Another great feature you may want to straight away drill down into is Bandwidth Usage, which gives you an overview of bandwidth utilization across all UCS components, which you can look at from a server or network perspective. This can let you quickly identify such things as uneven workload distribution between the blades or maybe uneven traffic distribution between fabric interconnect A and B side or SAN/LAN uplinks going to the upstream switches.

ucspm_bandwidth

You can of course also generate various reports to determine your total capacity utilization or if you’re for example planning to add memory to your blades, you can quickly find out the number of DIMM slots available in the corresponding report.

memory_slots

VMware performance monitoring

UCS Performance Manager is not limited to monitoring only Cisco UCS blade chassis even in the Express version. You can add your hypervisors and also individual virtual machines. Once you add your vCenter to the list of the monitored devices you get a comprehensive list of VMware components, such as hosts, VMs, datastores, pNICs, vNICs and associated performance monitoring graphs, configuration information, events, etc.

Performance Manager can correlate VMware to UCS components and for example for a given VM provide you FC uplink utilization on the corresponding fabric interconnects of the chassis where this VM is running:

vmware_stats

If you want to go further, you can add individual VMs to Performance Manager, connected via WinRM/SSH or SNMP. Some cool additional functionality you get, which is not available in VMware section is the Dynamic View. Dynamic View lets you see VM connectivity from the ESXi host it’s running on all the way through to blade, chassis, vNIC, VIC, backplane port, I/O module and fabric interconnect. Which is very helpful for troubleshooting connectivity issues:

dynamic_view

Conclusion

UCS Performance Manager is not the only product for performance monitoring in virtualized environments. There are many others, VMware vRealize Operations Manager is one of the most popular of its kind. But if you’re a Cisco UCS customer you can definitely benefit from the rich functionality this product offers for monitoring UCS blade chassis. And if you are a lucky owner of NetApp FlexPod, VCE Vblock or EMC VSPEX, UCS Performance Manager for you is a must.

pm_dashboard

Upgrading Cisco UCS Fabric Interconnects

March 17, 2016

I have to do this first, as this is a high-risk change for any environment:

disclaimerDISCLAMER: I ACCEPT NO RESPONSIBILITY FOR ANY DAMAGE OR CORRUPTION OF DATA THAT MAY OCCUR AS A RESULT OF CARRYING OUT STEPS DESCRIBED BELOW. YOU DO THIS AT YOUR OWN RISK.

And now to the point. Cisco has two generations of Fabric Interconnects with the third generation released just recently. There is 6100 series, which includes 6120XP and 6140XP. Second generation is 6200 series, which introduced unified ports and also has two models in its range – 6248UP and 6296UP. And there is now a third generation of 40Gb fabric interconnects with 6324, 6332 and 6332-16UP models.

We are yet to see mass adoption of 40Gb FIs. And some of the customers are still upgrading from the first to the second generation.

In this blog post we will go through the process of upgrading 6100 fabric interconnects to 6200 by using 6120 and 6248 as an example.

Prerequisites

Cisco UCS has a pair of fabric interconnects which work in an active/passive mode from a control plane perspective. This lets us do an in-place upgrade of a FI cluster by upgrading interconnects one at a time without any further reconfiguration needed in UCS Manager in most cases.

For a successful upgrade old and new interconnects MUST run on the same firmware revision. That means you will need to upgrade the first new FI to the same firmware before you can join it to the cluster to replace the first old FI.

This can be done by booting the FI in a standalone mode, giving it an IP address and installing firmware via UCS Manager.

The second FI won’t need a manual firmware update, because when a FI of the same hardware model is joined to a cluster it’s upgraded automatically from the other FI.

Preparation tasks

It’s a good idea to make a record of all connections from the current fabric interconnects and make a configuration backup before an upgrade.

ucs_backup

If you have any unused connections which you’re not planning to move, it’s a good time to disconnect the cables and disable these ports.

Cisco strongly suggests to also upgrade the firmware on all software and hardware components of the existing UCS to the latest recommended version first.

Upgrading firmware on the first new FI

Steps to upgrade firmware on the first new fabric interconnect are as follows:

  • Rack and stack the new FI close enough to the old interconnects to make sure all cables can reach it.
  • Connect a console cable to the new FI, boot it up and when you are asked “Is this Fabric interconnect part of a cluster”, select NO to boot the FI in a standalone mode.
  • Assign an IP address to the FI and connect to it using UCS Manager.
  • Upgrade the firmware, which will reboot the fabric interconnect.
  • Reset the configuration on the FI, which will cause another reboot:
    • # connect local-mgmt
      # erase config

  • Once the FI is upgraded and reset to factory defaults you can proceed with joining it to the cluster.

Replacing the first FI

  • Determine which old FI is in the subordinate mode (upgrade a FI only if it’s in subordinate mode!) and disable server ports on it.
  • Shut down the old subordinate FI.
  • Move L1/L2, management, server and Ethernet/FC/FCoE uplink ports to the new FI.
  • Boot the new FI. This time the new FI will detect the presence of the peer FI. When you see the following prompt type YES:
    • Installer has detected the presence of a peer Fabric interconnect. This Fabric interconnect will be added to the cluster. Continue (y/n) ?

  • Follow the console prompts and assign an IP address to the new FI. The rest of the settings will be pulled from the peer FI.

Once the new FI joins the cluster you should see the following equipment topology in UCS Manager (This screenshot was made after the primary role had been moved to the new FI. Initially you should see the new FI as subordinate.):

two_fis

  • At this stage make sure that all configuration has been applied to the new FI and you can see all LAN and SAN uplinks and port channels.
  • Enable server ports on the new FI and reacknowledge all chassis.

Reacknowledging a chassis might be disruptive to the traffic flow from the blades. So make sure you don’t have any production workloads running on it. If you have two chassis and enough capacity to run all VMs on either of them, you can temporarily move VMs between the chassis and reacknowledge one chassis at a time.

Replacing the second FI

You will need to promote the new FI to be the primary, before proceeding with an upgrade of the second FI. To change the roles, use SSH to log in to the old FI, which is currently the primary (you can’t change roles from the subordinate FI) and run the following commands:

# connect local-mgmt
# cluster lead b
# show cluster state

The rest of the process is exactly the same.

After the upgrade, if needed, reconfigure any of the links which may have had their port numbers changed, such as if you had an expansion module in the old FIs, but not on the new FIs.

References

Cisco has a guide which has a step by step procedures for upgrading fabric interconnects, I/O modules, VIC cards as well as rack-mount servers. Refer to this guide for any further clarifications:

 

Troubleshooting Cisco UCS LDAP

December 4, 2015

If you ever configured LDAP integration on a blade chassis or a storage array, you know that troubleshooting authentication is painful on these things. It will accept all your configuration settings and if you’ve made a mistake somewhere all you get when you try to log in is “Authentication Error” message with no clue of what the actual error is.

Committing configuration changes

There three common places where you can make a mistake when setting up LDAP authentication on UCS. Number one is committing configuration changes to the Fabric Interconnects in UCS Manager.

There are four configuration options which you need to set to enable Active Directory authentication to your domain:

  • LDAP Providers – these are your domain controllers
  • LDAP Provider Groups – are used to group multiple domain controllers of the same domain
  • LDAP Group Maps – where you give permissions to your AD groups and users
  • Authentication Domains – final configuration step where you enable authentication via the domain

Now if you decide to delete a LDAP Provider Group which is configured under an Authentication Domain in attempt to change the settings, this may become an issue.

What is confusing here is UCS Manager will let you delete the LDAP Provider Group, save the changes and LDAP Provider Group will disappear from the list. And you may legitimately conclude that it’s deleted from UCS, but it’s actually not. This is what you’ll see in UCS Manager logs:

[FSM:STAGE:STALE-FAIL]: external aaa server configuration to primary(FSM-STAGE:sam:dme:AaaEpUpdateEp:SetEpLocal)
[FSM:STAGE:REMOTE-ERROR]: Result: resource-unavailable Code: ERR-ep-set-error Message: Re-ordering/Deletion of Providers cannot be applied while ldap is used for yourdomain.com(Domain) authentication(sam:dme:AaaEpUpdateEp:SetEpLocal)

The record will stay on the UCS and you may encounter very confusing issues where you change your LDAP Provider settings but changes are not reflected on UCS. So make sure to delete the object from the higher level entity first.

Distinguished Name typos

There are two ways to group Active Directory entities on a domain controller – Security Groups and Organizational Units. When configuring your AD bind account in LDAP Providers section and setting up permissions in LDAP Group Maps, make sure to not confuse the two. The best advice I can give – always use ADSI Edit tool to find the exact DN. Why? As an example let’s say you want to give permissions to the builtin administrator group and you use the following DN:

CN=Administrators,OU=Builtin,DC=yourdomain,DC=com

This won’t work, because even though Builtin container may look like a OU, it’s actually a CN in AD, as well as Users and Computers containers.

adsi_edit.JPG

ADSI Edit will give you the exact Distinguished Name. Make sure to use it to save yourself the hassle.

Group Authorization settings

Last but not least are the following two LDAP Provider configuration settings:

  • Group Authorization – whether UCS searches within groups when authenticating
  • Group Recursion – whether UCS searches groups recursively

If you add an AD group which the user is a part of in LDAP Group Maps and do not enable Group Authorization, UCS simply won’t search within the group. Enable this option unless you give permissions only on a per user basis.

Second option enables recursive search within AD groups. If you have nested groups in AD (which most people have) enable recursive search or UCS won’t look deeper than 1 level.

If you get really stuck

If you’ve set all the settings up and are certain they the are correct, but authentication still doesn’t work, then there is a relatively easy way to localize the issue.

First step is to check whether UCS can bind to your LDAP Providers and authenticate users. Pick a user (LDAP Group Maps don’t matter at this point), SSH to a Fabric Interconnect and type the following:

ucs # connect nxos
ucs(nxos)# test aaa server ldap yourdc.yourdomain.com john password123

yourdc.yourdomain.com – is the domain controller you’ve configured in LDAP Providers section. If authentication doesn’t work, then the issue is in LDAP Provider settings.

If you can authenticate, then the next step is to make sure that UCS searches through the right AD groups. To check that you will need to enable LDAP authentication logging on a Fabric Interconnect:

ucs # connect nxos
ucs(nxos)# debug ldap aaa-request-lowlevel

Now try to authenticate and look through the list of groups which UCS is searching through. If you can’t see the group which your user is a part of, then you most likely using a wrong DN in LDAP Group Maps.

In my case the settings are configured correctly and I can see that UCS is searching in the Builtin Administrators group:

2015 Dec 1 14:12:19.581737 ldap: value: CN=Enterprise Admins,CN=Users,DC=yourdomain,DC=com
2015 Dec 1 14:12:19.581747 ldap: ldap_add_to_groups: Discarding. group map not configured for CN=Enterprise Admins,CN=Users,DC=yourdomain,DC=com
2015 Dec 1 14:12:19.581756 ldap: value: CN=Administrators,CN=Builtin,DC=yourdomain,DC=com
2015 Dec 1 14:12:19.581767 ldap: ldap_add_to_groups: successfully added group:CN=Administrators,CN=Builtin,DC=yourdomain,DC=com
2015 Dec 1 14:12:19.581777 ldap: value: CN=Exchange Organization Administrators,OU=Microsoft Exchange Security Groups,DC=yourdomain,DC=com

Make sure to disable logging when you’re done:

ucs(nxos)# undebug all

References:

Initial Cisco switch configuration

June 28, 2012

First steps you need to do when you unpack your Cisco switch, for example Catalyst 2960, are configuring passwords and IP access via telnet and ssh. Cisco networking switches and routers have two primary operation modes: User (unprivileged) and Enable (privileged). In User mode you can simply look around, but in Enable mode you can reboot a switch, change configuration info, as well as screw everything up. You are safe in User mode. Switch also has tons of hierarchical configuration modes where you perform actual configuration.

Switch has three passwords: two for User mode (for connection from serial console and for external telnet and ssh connections) and one for Enable mode. Here is how you configure passwords after you unpack your switch and connect the serial cable.

Enter configuration mode:

enable
configure terminal

Configure console password:

line console 0
password pass1
login
exit

Configure ssh and telnet password:

line vty 0 15
password pass2
login
exit

Configure Enable password:

enable secret pass3
exit

‘login’ command tells switch to ask for User mode password. It doesn’t do that by default. Switch has 16 virtual (ssh and telent) consoles, that is why you see ‘0 15’ range in ‘line vty 0 15’ command.

Now to get IP access to the switch you need to configure so-called ‘VLAN 1 interface’:

enable
configure terminal
interface vlan 1
ip address 192.168.1.200 255.255.255.0
no shutdown
exit

ip default-gateway 192.168.1.1
exit

VLANs are not subject of this topic. But to make it a bit more clear, VLAN 1 is a special VLAN where all switch ports are connected. It’s done so that you could connect to the switch by telnet/ssh from any port. ‘no shutdown’ command here brings interface up. It’s disabled by default.

After you’ve made an initial configuration, your changes are active but not saved. After a reload you will have empty switch configuration. To save the configuration changes run:

copy running-config startup-config

Cheers!

Random DC pictures

January 19, 2012

Several pictures of server room hardware with no particular topic.

Click pictures to enlarge.

10kVA APC UPS.

UPS’s Network Management Card (NMC) (with temperature sensor) connected to LAN.

Here you can see battery extenders (white plugs). They allow UPS to support 5kVA of load for 30 mins.

Two Dell PowerEdge 1950 server with 8 cores and 16GB RAM each configured as VMware High Availability (HA) cluster.

Each server has 3 virtual LANs. Each virtual LAN has its own NIC which in its turn has multi-path connection to Cisco switch by two cables, 6 cables in total.

Two Cisco switches which maintain LAN connections for NetApp filers, Dell servers, Sun tape library and APC NMC card. Two switches are tied together by optic cable. Uplink is a 2Gb/s trunk.

HP rack with 9 HP ProLiant servers, HP autoloader and MSA 1500 storage.

HP autoloader with 8 cartridges.

HP MSA 1500 storage which is completely FC.

Hellova cables.

DC Networking

November 26, 2011

Misc photos from our server rooms. This time networking.

Here I post pictures only from central building. There are also several networking rooms in other buildings. Network core is build primarily on Cisco routers and switches. Branches use HP.

Click pictures to enlarge.

Here are the routers, switches, fibre distribution panels, etc.

Cable mess.