Extracting vRealize Operations Data Using REST API

September 17, 2017

Scripting today is an important skill if you’re a part of IT operations team. It is common to use PowerShell or any other scripting language of your choice to automate repetitive tasks and be efficient in what you do. Another use case for scripting and automation, which is often missed, is the fact that they let you do more. Public APIs offered by many software and hardware solutions let you manipulate their data and call functions in the way you need, without being bound by the workflows provided in GUI.

Recently I was asked to extract data from vRealize Operations Manager that was not available in GUI or a report in the format I needed. At first it looked like a non-trivial task as it required scripting and using REST APIs to pull the data. But after some research it turned out to be much easier than I thought.

Using Python this can be done in a few lines of code using existing Python libraries that do most of the work for you. The goal of this blog post is to show that scripting does not have to be hard and using the right tools for the right job you can get things done in a matter of minutes, not hours or days.

Scenario

To demonstrate an example of using vRealize Operations Manager REST APIs we will retrieve the list of vROps adapters, which vROps uses to pull information from many hardware and software solutions it supports, such as Nimble Storage or Microsoft SQL Server.

vROps APIs are obviously much more powerful than that and you can use the same approach to pull other information such as: active and inactive alerts, performance statistics, reports. Full vROps API documentation can be found at https://your-vrops-hostname/suite-api/.

Install Python and Libraries

We will be using two Python libraries: “Requests” to make REST calls and “ElementTree” for XML parsing. ElementTree comes with Python, so we will need to install the Requests package only.

I already made a post here on how to install Python interpreter and Python libraries, so we will dive right into vROps APIs.

Retrieve the List of vROps Adapters

To get the list of all installed vROps adapters we need to make a GET REST call using the “get” method from Requests library:

import requests
from requests.auth import HTTPBasicAuth

akUrl = 'https://vrops/suite-api/api/adapterkinds'
ak = requests.get(akUrl, auth=HTTPBasicAuth('user', 'pass'))

In this code snippet using the “import” command we specify that we are using Requests library, as well as its implementation of basic HTTP authentication. Then we request the list of vROps adapters using the “get” method from Request library, and save the XML response into the “ak” variable. Add “verify=False” to the list of the get call parameters if you struggle with SSL certificate issues.

As a result you will get the full list of vROps adapters in the format similar to the following. So how do we navigate that? Using ElementTree XML library.

Parsing XML Response Sequentially

vRealize Operations Manager returns REST API responses in XML format. ElementTree lets you parse these XML responses to find the data you need, which you can output in a human-readable format, such as CSV and then import into an Excel spreadsheet.

Parsing XML tree requires traversing from top to bottom. You start from the root element:

import xml.etree.ElementTree as ET

akRoot = ET.fromstring(ak.content)

Then you can continue by iterating through child elements using nested loops:

for adapter in akRoot:
  print adapter.tag, adapter.attrib['key']
    for adapterProperty in adapter:
      print adapterProperty.name, adapterProperty.text

Childs of <ops:adapter-kinds> are <ops:adapter-kind> elements. Childs of <ops:adapter-kind> elements are <ops:name>, <ops:adapterKindType>, <ops:describeVersion> and <ops:resourceKinds>. So the output of the above code will be:

adapter-kind CITRIXNETSCALER_ADAPTER
name Citrix NetScaler Adapter
adapterKindType GENERAL
describeVersion 1
resourceKinds citrix_netscaler_adapter_instance
resourceKinds appliance
…

As you could’ve already noticed, all XML elements have tags and can additionally have attributes and associated text. From above example:

  • Tags: adapter-kind, name, adapterKindType
  • Attribute: key
  • Text: Citrix NetScaler Adapter, GENERAL, 1

Finding Interesting Elements

Typically you are looking for specific information and don’t need to traverse the whole XML tree. So instead of walking through the tree sequentially, you can iterate trough interesting elements using the “iterfind” method. For instance if we are looking only for adapter names, the code would look as the following:

ns = {'vrops': 'http://webservice.vmware.com/vRealizeOpsMgr/1.0/'}
for akItem in akRoot.iterfind('vrops:adapter-kind', ns):
  akNameItem = akItem.find('vrops:name', ns)
  print akNameItem.text

All elements in REST API responses are usually prefixed with a namespace. To avoid using the long XML element names, such as http://webservice.vmware.com/vRealizeOpsMgr/1.0/adapter-kind, ElementTree methods support using namespaces, that can be then passed as a variable, as the “ns” variable in this code snippet.

Resulting output will be similar to:

Citrix NetScaler Adapter
Container
Dell EMC PowerEdge
Dell Storage Adapter
EP Ops Adapter
F5 BIG-IP Adapter
HP Servers Adapter

Additional Information

I intentionally tried to keep this post short to give you all information required to start using Python to parse REST API responses in XML format.

I have written two scripts that are more practical and shared them on my GitHub page here:

  • vrops_object_types_1.0.py – extracts adapters, object types and number of objects. Script gives you an idea of what is actually being monitored in vROps, by providing the number of objects you have in your vROps instance for each adapter and object type.
  • vrops_alert_definitions_1.0.py – extracts adapters, object types, alert names, criticality and impact. As opposed to the first script, this script provides the list of alerts for each adapter and object type, which is helpful to identify potential alerts that can be triggered.

Feel free to download these scripts from GitHub and play with them or adapt them according to your needs.

Helpful Links

Advertisements

Python for Windows: Quick Installation

September 7, 2017

Only recently I added Python to the list of tools I use in my job. I had always used PowerShell if I needed to script something, until I saw how easy Python is to use. I will be keeping it in my arsenal from now on.

In near future I plan to make a blog post on how to use Python with REST APIs and in this blog post I wanted to provide quick instructions on how to install Python in Windows, that I can later use as a reference.

Installing Python

Latest version of Python for Windows can be downloaded from https://www.python.org/downloads/windows/. Executable installer is probably the easiest. When installing, make sure to check “Add Python 3.6 to PATH” option, it makes life much easier.

Installing Libraries

Python installation already includes lots of libraries that you can use for scripting. ElementTree library, for example, which is used for XML parsing, comes with the interpreter.

Depending on what you want to use Python for, you may need to install additional libraries. For instance, if you want to call REST APIs, you may need Requests – library for HTTP cals.

Python uses a package manager called “pip”. If it is not already in your PATH variable, find pip under Python installation directory and run from command line as administrator:

pip install requests

Once the library is installed you can use it by importing it into your scripts:

import requests

Writing Code

At this point you can call Python interpreter in Windows command line and start running Python commands. If you want to write a script, however, you will need an IDE. Nothing wrong in using Notepad, but there are more efficient ways to do that.

Python for Windows comes with IDE, which is called simply IDLE. It is very basic, but it provides all essential features, such as as code completion, syntax highlighting and a primitive debugger. It is not perfect but it has everything to get you started.

Conclusion

That is a quick crash course with three simple steps to get Python up and running. I tried to keep it short to demonstrate how you can start using Python with minimum effort.

AWS Cloud Protection Manager Part 3: Backup and Restore

August 21, 2017

Backup

Backups are created according to the schedule specified in the backup policy. We discussed how to configure backup policies in the previous blog post of the series. The list of backups you see on the Backup Monitor tab are your restore points. Backups that are older then the specified retention policy will be purged from the list and you will not see them there, unless you move them to “Freezer”.

It is important to understand that apart from volume snapshots, for each backed up instance CPM also creates an AMI. Those who has hands-on experience with AWS may already know, that AMIs is the only way to create clones of Windows EC2 instances in AWS. If you go to AWS console and try to find a clone action under the instance Action menu, you won’t find any. You will have “Create Image” instead. It creates an AMI, from which you can then spin up a clone of an instance the image was created from.

CPM does exactly that. For each backup policy the instance is under, it creates one AMI. In our example we have four backup policies, that will result in four AMIs for each of the instances. Every AMI has to have at least one storage volume. So CPM will include the root volume of each instance into AMI, just because it has to. But AMIs are required only to restore EC2 instance configuration. Data is restored from volume snapshots, that can be used to create new volumes from them and then attach them to the instance. You can click on the View button under Snapshots to find the corresponding snapshot and AMI IDs.

There is a backup log for each job run as well that is helpful for issue troubleshooting.

Restore

To perform a restore click on the Recover button next to the backup job and you will get the list of the instances you can recover. CPM offers you three options: instance recovery, volume recovery and file recovery. Let’s go back to front.

File recovery is probably the most used recovery option. As it lets you restore individual files. When you click on the “Explore” button, CPM creates new volumes from the snapshots you are restoring from and mount them to the CPM instance. You are then presented with a simple file system browser where you can find the file and click on the green down arrow icon in Download column to save the file to your computer.

If you click on “Volume Only”, you can restore particular volumes. Restored volumes are not attached to any instance, unless you specify it under “Attach to Instance” column. You can then select under “Attach Behaviour” what CPM should do if such volume is already attached to the instance or if you want to automatically detach the original volume, but the instance is running (you can do it only if instance is stopped).

And the last option is “Instance”. It will create a clone of the original instance using the pre-generated AMI and volume snapshots, as we discussed in the Backup section of this blog post. You can specify many options under Advanced Options section, including recovery to another VPC or different availability zone. If anything, make sure you specify a new IP address for the instance, otherwise you’ll have a conflict and your restore will fail. Ideally you should also shut down the original EC2 instance before spinning up a restore clone.

Advanced Features

There are quite a few worth mentioning. So far we have looked at simple EC2 instance restore. But you don’t have to backup whole instances, you can also backup individual volumes. On top of that, CPM supports RDS database, Aurora and Redshift cluster backups.

If you run MS Exchange, Sharepoint or SQL on your EC2 instances, you can install CPM backup agent on them to ensure you have application-consistent backups via VSS, as opposed to crash-consistent backups you get if agent is not used. If you install the agent, you can also run a script on the instance before and after the backup is taken.

Last but not least is DR. Restoring to another availability zone within the region is already supported on instance recovery level. You can choose availability zone you want to restore to. It is not possible to recover to another region, though. Because AWS snapshots and AMIs are local to the region they are created in. If you want to be able to recover to another region, you can configure DR in CPM, which will utilise AWS AMI and snapshot copy functionality to copy backups to another region at configured frequency.

Conclusion

Overall, I found Cloud Protection Manager very easy to install, configure and use. If you come from infrastructure background, at first glance CPM may look to you like a very basic tool, compared to such feature-rich solutions like Veeam or Commvault. But that feeling is misleading. CPM is simple, because AWS simple. All infrastructure complexity is hidden under the covers. As a result, all AWS backup tools need to do is create snapshots and CPM does it well.

AWS Cloud Protection Manager Part 2: Configuration

August 14, 2017

Overview

As we discussed in Part 1 of this series, snapshots serve as a good basis to implement backup in AWS. But AWS does not provide an out-of-the-box tool that can manage snapshots at scale and perform snapshot creation/deletion based on a defined retention. Rich AWS APIs allow you to build such tool yourself or you can use an existing backup solution built for AWS. In this blog post we are looking at one such product, called Cloud Protection Manager.

You will be surprised to know that the first version of Cloud Protection Manager was released back in 2013. The product has matured over the years and the current CPM version 2.1 according to N2W web-site has become quite popular amongst AWS customers.

CPM is offered in four different versions: Standard, Advanced, Enterprise and Enterprise Plus. Functionality across all four versions is mostly the same, with the key difference being the number of instances you can backup. Ranging from 20 instances in Standard and $5 per instance in Enterprise Plus.

Installation

CPM offers a very straightforward consumption model. You purchase it from AWS marketplace and pay by the month. Licensing costs are billed directly to your account. There are no additional steps involved.

To install CPM you need to find the version you want to purchase in AWS Marketplace, specify instance settings, such as region, VPC subnet, security group, then accept the terms and click launch. AWS will spin up a new CPM server as an EC2 instance for you. You also have an option to run a 30-day trial if you want to play with the product before making a purchasing decision.

Note that CPM needs to be able to talk to AWS API endpoints to perform snapshots, so make sure that the appliance has Internet access by means of a public IP address, Elastic IP address or a NAT gateway. Similarly, the security group you attach it to should at least have HTTPS out allowed.

Initial Configuration

Appliance is then configured using an initial setup wizard. Find out what private IP address has been assigned to the instance and open a browser session to it. The wizard is reasonably straightforward, but there are two things I want to draw your attention to.

You will be asked to create a data volume. This volume is needed purely to keep CPM configuration and metadata. Backups are kept in S3 and do not use this volume. The default size is 5GB, which is enough for roughly 50 instances. If you have a bigger environment allocate 1GB per every 10 AWS instances.

You will also need to specify AWS credentials for CPM to be able to talk to AWS APIs. You can use your AWS account, but this is not a security best practice. In AWS you can assign a role to an EC2 instance, which is what you should be using for CPM. You will need to create IAM policies that essentially describe permissions for CPM to create backups, perform restores, send notifications via AWS SNS and configure EC2 instances. Just refer to CPM documentation, copy and paste configuration for all policies, create a role and specify the role in the setup wizard.

Backup Policies

Once you are finished with the initial wizard you will be able to log in to the appliance using the password you specified during installation. As in most backup solutions you start with backup policies, which allow you to specify backup targets, schedule and retention.

One thing that I want to touch on here is backup schedules, that may be a bit confusing at start. It will be easier to explain it in an example. Say you want to implement a commonly used GFS backup schedule, with 7 daily, 4 weekly, 12 monthly and 7 yearly backups. Daily backup should run every day at 8pm and start from today. Weekly backups run on Sundays.

This is how you would configure such schedule in CPM:

  • Daily
    • Repeats Every: 1 Days
    • Start Time: Today Date, 20:00
    • Enabled on: Mon-Sat
  • Weekly
    • Repeats Every: 1 Weeks
    • Start Time: Next Sunday, 20:00
    • Enabled on: Mon-Sun
  • Monthly
    • Repeats Every: 1 Months
    • Start Time: 28th of this month, 21:00
    • Enabled on: Mon-Sun
  • Yearly:
    • Repeats Every: 12 Months
    • Start Time: 31st of December, 22:00
    • Enabled on: Mon-Sun

Some of the gotchas here:

  • “Enabled on” setting is relevant only to the Daily backup, the rest of the schedules are based on the date you specify in “Start Time” field. For instance, if you specify a date in the Weekly backup Start Time that is a Sunday, your weekly backups will run every Sunday.
  • Make sure to run your Monthly backup on 28th day of every month, to guarantee you have a backup every month, including February.
  • It’s not possible to prevent Weekly backup to not run on the last week of every month. So make sure to adjust the Start Time for the Monthly backup so that Weekly and Monthly backups don’t run at the same time if they happen to fall on the same day.
  • Same considerations are true for the Yearly backup as well.

Then you create your daily, weekly, monthly and yearly backup policies using the corresponding schedules and add EC2 instances that require protection to every policy. Retention is also specified at the policy level. According to our scenario we will have 6 generations for Daily, 4 generations for Weekly, 12 generations for Monthly and 7 generations for Yearly.

Notifications

CPM uses AWS Simple Notification Service (SNS) to send email alerts. If you gave CPM instance SNS permissions in IAM role you created previously, you should be able to simply go to Notification settings, enable Alerts and select “Create new topic” and “Add user email as recipient” options. CPM will create a SNS topic in AWS for you automatically and use email address you specified in the setup wizard to send notifications to. You can change or add more email addresses to the SNS topic in AWS console later on if you need to.

Conclusion

This is all you need to get your Cloud Protection Manager up and running. In the next blog post we will look at how instances are backed up and restored and discuss some of the advanced backup options CPM offers.

AWS Cloud Protection Manager Part 1: Intro to Snapshots

August 7, 2017

Cloud computing conceptOverview

We have all been dealing with on-premises infrastructure for years before public cloud became a thing. Depending on complexity and size of your environment, retention and other requirements you can choose from plethora of products, such as Veeam, Commvault, NetWorker, you name it.

When public cloud started getting more traction we realised that the paradigm has shifted. Public cloud providers are pushing us to treat instances of application servers as cattle versus pets and build fault tolerance into code and not rely on underlying infrastructure reliability.

If application is no longer monolithic and is distributed among multiple instances of the same server, loss of any one instance has little overall impact on the application and does not require a restore. We can simply spin up a new instance instead.

However, majority of applications in today’s IT environments are still monolithic and need to be backed up. Data needs to be protected as well, if it gets corrupted or if for any other reason we need to roll back our application to a certain point in time, backups are necessary.

AWS Snapshots

When I started researching the topic of backup in the cloud I realised that it is still a wild west. Speaking of AWS specifically, if you are using one of the native AWS services, such as RDS, you would usually have the backup feature built-in. Otherwise, if you simply want to backup some application or service running inside an AWS instance, all you have is snapshots.

We have all heard the mantra, that snapshot is not a backup, because usually it is kept in the same place as the original data it was created from. This is true for most of the traditional storage arrays. AWS is slightly different.

Typically when you start an EC2 instance, its disk volumes are kept on Elastic Block Store (EBS), which is an equivalent of block storage in traditional infrastructure world. But when you create a snapshot of an EBS volume, the snapshot is transferred to S3 object storage.

When the first snapshot is created, the whole volume needs to be copied to S3. It may take some time to complete, depending on the volume size, but the copy process happens in the background and doesn’t impact the original EC2 instance. All subsequent snapshots will be significantly quicker, unless you overwrite large amount of data between snapshots.

CloudWatch Event Rules

As we can see, unlike traditional storage arrays, AWS snapshots are kept separately from the original data. They can even be restored to another availability zone or copied to another region to protect from region-wide failures, which makes AWS snapshots a decent backup solution. But devil is in the details.

Creating a volume snapshot manually is simple, you pick “Create Snapshot” from the Actions menu and the rest is handled by AWS. To automate the process, AWS offers CloudWatch event rules for scheduled snapshot creation. By going to CloudWatch > Events > Create Rule, setting up a schedule, selecting “EC2 CreateSnapshot API call” and the volume ID in the Target section you can implement a simple backup in AWS.

cloudwatch_snapshot1.jpg

However, there are two immediate problems with such approach. The first is scalability of it. Snapshots in AWS are created on volumes, not instances. This has interesting implications, such as it is not possible to create a consistent snapshot across multiple volumes. But more importantly, if you have more than one volume on every instance, you may end up having dozens of CloudWatch rules, that are hard to manage and keep track of.

cloudwatch_snapshot2.jpg

Second is retention. CloudWatch events let you create snapshots, but you cannot specify how long you want to keep them for. You will need to write your own scripts using AWS APIs to delete snapshots, which makes the whole solution questionable, as you can then use APIs to create snapshots as well.

Conclusion

As you can see, AWS does not offer a simple out of the box solution for EC2 backup. You either need to write your own scripts using AWS APIs or lean on backup solutions built specifically to solve the problem of backups in AWS. One of such solutions Cloud Protection Manager we will discuss in the next blog post of the series.

First Look at AWS Management Portal for vCenter Part 2: Administration

June 30, 2017

aws_migrationIn part 1 of the series we looked at the Management Portal deployment. Let’s move on to an overview of the portal functionality.

Portal Dashboard

Once you open the portal you are asked to pick your region (region preferences can later be changed only from Web Client). You then proceed to the dashboard where you can see all instances you already have running in AWS. If you don’t see your VPCs, make sure the user you’re using to log in is on the list of administrators in AMP (user and domain names are case sensitive).

default_env

Here you can find detailed configuration information of each instance (Summary page), performance metrics (pulled from CloudWatch) and do some simple tasks, such as stopping/rebooting/terminating an instance, creating an AMI (Amazon Machine Image). You can also generate a Windows password from a key pair if you need to connect to VM via RDP or SSH.

Virtual Private Cloud Configuration

If the dashboard tab is more operational-focused, VPC tab is configuration-centric. Here you can create new VPCs, subnets and security groups. This can be handy if you want to add a rule to a security group to for instance allow RDP access to AWS instances from a certain IP.

edit_sg

If you spend most of the time in vCenter this can be helpful as you don’t need to go to AWS console every time to perform such simple day to day tasks.

Virtual Machine Provisioning

Portal supports simple instance provisioning from Amazon Machine Images (AMIs). You start with creating an environment (Default Environment can’t be used to deploy new instances). Then you create a template, where you can pick an AMI and specify configuration options, such as instance type, subnets and security groups.

create_template

Note: when creating a template, make sure to search for AMIs by AMI ID. AMI IDs in quick start list are not up-to-date and will cause instance deployment to fail with the following error:

Failed to launch instance due to EC2 error: The specified AMI is no longer available or you are not authorized to use it.

You can then go ahead and deploy an instance from a template.

Virtual Machine Migration

Saving the best for the last. VM migration – this is probably one of the coolest portal features. Right-click on a VM in vCenter inventory and select Migrate to EC2. You will be asked where you want to place the VM and how AWS instance should be configured.

ec2_migrate

When you hit the button AMP will first export VM as an OVF image and then upload the image to AWS. As a result, you get a copy of your VM in AWS VPC with minimal effort.

ec2_migration2

When it comes to VM migration to AWS, there is, of course, much more to it than just copying the data. Machine gets a new SID, which not all applications and services like. There are compatibility considerations, data gravity, network connectivity and others. But all the heavy lifting AMP does for you.

Conclusion

I can’t say that I was overly impressed with the tool, it’s very basic and somewhat limited. Security Groups can be created, but cannot be applied to running instances. Similarly, templates can be created, but not edited.

But I would still recommend to give it a go. Maybe you will find it useful in your day to day operations. It gives you visibility into your AWS environment, saving time jumping between two management consoles. And don’t underestimate the migration feature. Where other vendors ask for a premium, AWS Management Portal for vCenter gives it to you for free.

First Look at AWS Management Portal for vCenter Part 1: Deployment

December 18, 2016

Cloud has been a hot topic in IT for quite a while, for such valid reasons and benefits it brings as agility and economies of scale. More and more customers start to embark on the cloud journey, whether it’s DR to cloud, using cloud as a Tier 3 storage or even full production migrations for the purpose of shrinking the physical data center footprint.

vmware_aws

Even though full data center migrations to cloud are not that uncommon, many customers use cloud for certain use cases and keep other more static workloads on-premises, where it may be more cost-effective. What it means is that they end up having two environments, that they have to manage separately. This introduces complexity into operational models as each environment has its own management tools.

Overview

AWS Management Portal for vCenter helps to bridge this gap by connecting your on-premises vSphere environment to AWS and letting you perform basic management tasks, such as creating VPCs and security groups, deploying EC2 instances from AMI templates and even migrating VMs from vSphere to cloud, all without leaving the familiar vCenter user interface.

connector_architecture

Solution consists of two components: AWS Management Portal for vCenter, which is configured in AWS and AWS Connector for vCenter, which is a Linux appliance deployed on-prem. Let’s start with the management portal first.

Configure Management Portal

AWS Management Portal for vCenter or simply AMP, can be accessed by the following link https://amp.aws.amazon.com. Configuration is wizard-based and its main purpose is to set up authentication for vCenter users to be able to access AWS cloud through the portal.

aws_amp.jpg

You have an option of either using SAML, which has pre-requisites, or simply choosing the connector to be your authentication provider, which is the easiest option.

If you choose the latter, you will need to pre-configure a trust relationship between AWS Connector and the portal. First step of the process is to create an Identity and Access Management (IAM) user in AWS Management Console and assign “AWSConnector” IAM policy to it (connector will then use this account to authenticate to AWS). This step is explained in detail in Option 1: Federation Authentication Proxy section of the AWS Management Portal for vCenter User Guide.

add_admin

You will also be asked to specify vCenter accounts that will have access to AWS and to generate an AMP-Connector Key. Save your IAM account Access Key / Secret and AMP-Connector Key. You will need them in AWS Connector registration wizard.

Configure AWS Connector

AWS Connector is distributed as an OVA, which you can download here:

To assign a static IP address to the appliance you will need to open VM console and log in as ec2-user with the password ec2pass. Run the setup script and change network settings as desired. Connector also supports connecting to AWS through a proxy if required.

# sudo setup.rb

Browse to the appliance IP address to link AWS Connector to your vCenter and set up appliance’s password. You will then be presented with the registration wizard.

Wizard will ask you to provide a service account for AWS Connector (create a non-privileged domain account for it) and credentials of the IAM trust account you created previously. You will also need your trust role’s ARN (not user’s ARN) which you can get from the AMP-Connector Federation Proxy section of AWS Management Portal for vCenter setup page.

If everything is done correctly, you will get to the plug-in registration page with the configuration summary, which will look similar to this:

registration_complete

Summary

AWS Connector will register a vCenter plug-in, which you will see both in vSphere client and Web Client.

aws_wclient

That completes the deployment part. In the next blog post of the series we will talk in more detail on how AWS Management Portal can be leveraged to manage VPCs and EC2 instances.

Puppet Camp 2016 Recap

December 4, 2016

puppet-campLast week I had a chance to attend Puppet Camp 2016. Puppet Camp is a one day event that is held once a year in many places around the world including Australia. This time it was the fourth Melbourne conference, which gathered 240 attendees and several key partners, such as NetApp, Diaxion and Katana1.

In this blog post I want to give a quick overview of the keynote, customer and partner sessions, as well as my key takeaways from the conference.

First Impressions

I’ve never been to Puppet Camp before and this was my first experience. Sheer number of participants clearly shows that areas of configuration management and DevOps in general attracts a lot of attention of both customers and channel.

imag5067_2

You may have heard how Cisco in Q3 of 2015 announced Puppet support for the Nexus 3000 and 9000 series switches. This was not just an accident. I had a chance to speak to NetApp, who was one of the vendors presented at the conference, and they now have Puppet integration with their Data ONTAP / FAS platform, as well as E-Series and recently acquired SolidFire line of storage arrays. I’m sure many other hardware vendors will follow.

Keynote and Puppet Update

The conference had one track of sessions spread out throughout the day and was opened by a keynote from Robert Finn, APJ Sales Director at Puppet, who was talking about the raising complexity of modern IT environments and challenges that come with it. We have gone from tens of servers to hundreds of VMs and are now on the verge of the next evolution from hundreds of VMs to thousands of containers. We can no longer manage environments manually and that is where tools such as Puppet come into play and let us manage configuration and provisioning at scale.

Rob also mentioned the “State of DevOps Report” an annual survey Puppet has been running now for five years in a row. In 2016 they collected responses from 4600 technical professionals and shared a lot of their findings in a public report, which I’ll link in the references section below.

state_of_devops

Key takeaways: introducing configuration management in their software development practices organizations were able to achieve 3x lower change failure rate and 24x faster recovery from failures.

Ronny Sabapathee, Puppet Solutions Engineer gave an overview of the new features in the latest Puppet Enterprise 2016.4, such as corrective change reporting, changes to Puppet Orchestrator, enhancements in Code Manager and API improvements.

Key takeaways: Puppet ecosystem is growing quickly with Docker module, Jenkins plugin, significant enhancements in Azure module and VMware vRealize Automation/Orchestrator integration coming soon.

Customer Sessions

Rob Kenefik from SpecSavers spoke about their journey of scaling free version of Puppet from 10 to 290 nodes, what issues they came across and what adjustments they had to make, especially around the DB back-end.

Key takeaways: don’t use embedded Puppet database for production deployments. PostgreSQL (which is now default) provides required scalability.

Steve Curtis from ANZ briefly discussed how they automated deployment of Application Performance Monitoring (APM) agents using Puppet. Steve also has a post in Puppet blog, which I’ll link below.

Chris Harwood from Healthdirect Australia touched on a sensitive topic of organizational silos and how teams become too focused on their own performance forgetting about the customers, who should be the key priority for businesses offering customer-facing services.

Then he showed how Healthdirect moved some of the ops people to development teams giving devs access to infrastructure and making them autonomous, which significantly improved their development workflows and release frequency.

Key takeaways: DevOps key challenges are around people and processes, not technology. Teams not collaborating and lengthy infrastructure change management processes can significantly hinder development teams’ performance.

Partner Sessions

Dinesh Siriwardhane who represented Versent compared pros and cons of master/agent vs. masterless Puppet deployments and showed a demo on Puppet certificate management.

Key takeaways: Puppet master simplifies centralized management, provides reporting capabilities, but can be a single point of failure. Agentless deployment using GitHub has no single points of failure and is free, but can have major security repercussions if Git repository is compromised.

imag5085_1

Kieran Sweet and Pedram Sanayei from Sourced made a presentation on Puppet integration with Azure and how using Puppet instead of just the low-level Azure APIs and PowerShell, can significantly simplify deployment and configuration management in the Microsoft cloud.

Key takeaways: Azure Resource Manager is a big step forward from the old Azure Service Management (classic deployment model). In light of the significant recent enhancements in the Azure Puppet module, this can become a reasonable alternative to AWS.

Scott Coulton from Autopilot closed the conference with a session on Puppet integration with Docker and more specifically around container orchestration tools, such as Docker Swarm, Kubernetes, Mesos and Flocker. Be sure to check Scott’s blog and GitHub repository where you can find a Puppet module for Docker Swarm, Vagrant template and more.

Key takeaways: Docker can be used to deploy containers, but Puppet is still essential to keep configuration across the hosts consistent.

Conclusion

I spoke to a lot of customers at the conference and what became apparent to me was that Puppet is not just another DevOps tool amongst the many. It has a wide ecosystem of partners and has gone a long way since they started as a small start-up 12 years ago in 2005.

It has a strong use case for general configuration management in Linux environments, as well as providing application configuration consistency as part of CI/CD pipelines.

Speaking of the conference itself I was pleasantly surprised by the quality of sessions and organization in general. Puppet Camp will definitely stay on my radar. I’d love to come back next year and geek out with the DevOps crowd again.

References

Dell Force10 Part 3: VLT Domain Configuration

July 31, 2016

dell-force10In my previous post here I went through VLT basics and how it helps to establish a loop-free network topology in a modern datacenter. Now lets dive deeper and see how VLT is configured from FTOS CLI.

VLT Configuration

The first step is to configure the backup links and VLT interconnect. Dell S4048-ON switches have six 40Gb QSFP+ ports, two of which 1/49 and 1/50 we will use for VLTi. Repeat the same configuration on both switches.

# int range fo 1/49-1/50
# no shutdown

# interface port-channel 127
# description “VLT interconnect”
# channel-member fo 1/49
# channel-member fo 1/50
# no shutdown

Now that we have a VLT interconnect set up, let’s join the first switch to a VLT domain:

# vlt domain 1
# back-up destination 172.10.10.11
# peer-link port-channel 127
# primary-priority 1

First switch points to the second switch management IP for a backup destination, uses port channel 127 as a VLT interconnect and becomes a primary peer, because it’s given the lowest priority of 1.

Do the same on the second switch, but now point to the first switch management IP for backup and use the highest priority to make this switch a secondary peer:

# vlt domain 1
# back-up destination 172.10.10.10
# peer-link port-channel 127
# primary-priority 8192

To confirm the VLT state use the following command:

# sh vlt brief

vlt_brief

As you can see, the VLTi and backup links are up and the switch can see its peer. For some additional VLT specific information use these commands:

# sh vlt statistics
# sh vlt backup-link

I would also recommend to use the following command to see the port channel state and confirm that both VLTi links are in up state:

# sh int po127

po_state

Conclusion

In this part of the Dell Force10 switch configuration series we quickly went through the initial VLT setup. We haven’t touched on VLT LAG configuration yet. We will take a closer look at it in the next blog post.

Dell Force10 Part 2: VLT Basics

July 10, 2016

dell-force10Last time I made a blog post on initial configuration of Force10 switches, which you can find here. There I talked about firmware upgrade and basic features, such as STP and Flow Control. In this blog post I would like to touch on such a key feature of Force10 switches as Virtual Link Trunking (VLT).

VLT is Force10’s implementation of Multi-Chassis Link Aggregation Group (MLAG), which is similar to Virtual Port Channels (vPC) on Cisco Nexus switches. The goal of VLT is to let you establish one aggregated link to two physical network switches in a loop-free topology. As opposed to two standalone switches, where this is not possible.

You could say that switch stacking gives you similar capabilities and you would  be right. The issue with stacked switches, though, is that they act as a single switch not only from the data plane point of view, but also from the control plane point of view. The implication of this is that if you need to upgrade a switch stack, you have to reboot both switches at the same time, which brings down your network. If you have an iSCSI or NFS storage array connected to the stack, this may cause trouble, especially in enterprise environments.

With VLT you also have one data plane, but individual control planes. As a result, each switch can be managed and upgraded separately without full network downtime.

VLT Terminology

Virtual Link Trunking uses the following set of terms:

  • VLT peer – one of the two switches participating in VLT (you can have a maximum of two switches in a VLT domain)
  • VLT interconnect (VLTi) – interconnect link between the two switches to synchronize the MAC address tables and other VLT-related data
  • VLT backup link – heartbeat link to send keep alive messages between the two switches, it’s also used to identify switch state if VLTi link fails
  • VLT – this is the name of the feature – Virtual Link Trunking, as well as a VLT link aggregation group – Virtual Link Trunk. We will call aggregated link a VLT LAG to avoid ambiguity.
  • VLT domain – grouping of all of the above

VLT Topology

This’s what a sample VLT domain looks like. S4048-ON switches have six 40Gb QSFP+ ports, two of which we use for a VLT interconnect. It’s recommended to use a static LAG for VLTi.

basic_vlt

Two 1Gb links are used for VLT backup. You can use switch out-of-band management ports for this. Four 10Gb links form a VLT LAG to the upstream core switch.

Use Cases

So where is this actually helpful? Vast majority of today’s environments are virtualized and do not require LAGs. vSphere already uses teaming on vSwitch uplinks for traffic distribution across all network ports by default. There are some use cases in VMware environments, where you can create a LAG to a vSphere Distributed Switch for faster link failure convergence or improved packet switching. Unless you have a really large vSphere environment this is generally not required, but you may use this option later on if required. Read Chris Wahl’s blog post here for more info.

Where VLT is really helpful is in building a loop-free network topology in your datacenter. See, all your vSphere hosts are connected to both Force10 switches for redundancy. Since traffic comes to either of the switches depending on which uplink is being picked on a ESXi host, you have to make sure that VMs on switch 1 are able to communicate to VMs on switch 2. If all you had in your environment were two Force10 switches, you would establish a LAG between the two and be done with it. But if your network topology is a bit larger than this and you have at least a single additional core switch/router in your environment you’d be faced with the following dilemma. How can you ensure efficient traffic switching in your network without creating loops?

stp_loop

You can no longer create a LAG between the two Force10 switches, as it will create a loop. Your only option is to keep switches connected only to the core and not to each other. And by doing that you will cause all traffic from VMs on switch 1 destined to VMs on switch 2 and vise versa to traverse the core.

east_west_traffic

And that’s where VLT comes into play. All east-west traffic between servers is contained within the VLT domain and doesn’t need to traverse the core. As shown above, if we didn’t use VLT, traffic from one switch to another would have to go from switch 1 to core and then back from core to switch 2. In a VLT domain traffic between the switches goes directly form switch 1 to switch 2 using VLTi.

Conclusion

That’s a brief introduction to VLT theory. In the next few posts we will look at how exactly VLT is configured and map theory to practice.