Posts Tagged ‘script’

Troubleshooting vSphere Guest Operations API

October 4, 2019

What is vSphere Guest Operations

Recently I’ve been heavily utilizing vSphere Guest Operations API for automating vCenter patching. vSphere Guest Operations (GuestOps) is an API, which allows you to run commands on a virtual machine without needing to connect to it over the network. All you need is credentials to the vCenter managing the virtual machine and to the virtual machine itself.

GuestOps can be called by using an Invoke-VMScript PowerCLI cmdlet in the following format:

> Invoke-VMScript -ScriptText “uname -a” -vm vc01 -GuestUser root -GuestPassword VMware1!

Cmdlet will talk to the vCenter, vCenter will talk to ESXi host, ESXi host will talk to VMware Tools and, eventually, VMware Tools will run the command on the Guest OS.

It worked well for me when I was running commands on VCSA 6.0 VM (managed by another vCenter), but after patching and upgrading this VM to VCSA 6.7 I encountered the following error:

Error occured while executing script on guest OS in VM ‘vc01’. Could not locate “Powershell” script interpreter in any of the expected locations. Probably you do not have enough permissions to execute command within guest.

It’s obvious from the error message that cmdlet is doing something wrong, since it’s supposed to use bash in Linux, not PowerShell.

Enable Debugging in VMware Tools

To better understand what was going on, I logged in to VCSA via SSH and enabled VMware Tools debugging (see KB1007873 for instructions on how to do that) and restarted Open VM Tools:

# systemctl restart vmtoolsd.service

After running the Invoke-VMScript cmdlet again, this is what I noticed in vmsvc.log debug log:

[vix] VixTools_StartProgram: User: root args: progamPath: ‘cmd.exe’, arguments: ‘/C powershell -NonInteractive -EncodedCommand cABvAHcAZQByAHMAaABl…

So it wasn’t just a misleading PowerCLI error message, Invoke-VMScript was actually trying to call a PowerShell command using Windows command interpreter on a Linux VM.

Solution

My guess is that since VMware has changed underlying operating system on VCSA from SUSE Linux to Photon OS, Invoke-VMScript can no longer properly identify the underlying OS and defaults to Windows.

Simple solution to this problem is to give a helping hand to Invoke-VMScript cmdlet and specify interpreter using -ScriptType Bash parameter. This is what a proper resulting debug log message will look like:

[vix] VixToolsStartProgramImpl: started ‘”/bin/bash” -c “bash > /tmp/vmware-root/powerclivmware159 2>&1 -c \”uname -a\””‘, pid 7456

Advertisement

Scripted CIFS Shares Migration

March 8, 2018

I don’t usually blog about Windows Server and Microsoft products in general, but the need for file server migration comes up in my work quite frequently, so I thought I’d make a quick post on that topic.

There are many use cases, it can be migration from a NAS storage array to a Windows Server or between an on-premises file server and cloud. Every such migration involves copying data and recreating shares. Doing it manually is almost impossible, unless you have only a handful of shares. If you want to replicate all NTFS and share-level permissions consistently from source to destination, scripting is almost the only way to go.

Copying data

I’m sure there are plenty of tools that can perform this task accurately and efficiently. But if you don’t have any special requirements, such as data at transit encryption, Robocopy is probably the simplest tool to use. It comes with every Windows Server installation and starting from Windows Server 2008 supports multithreading.

Below are the command line options I use:

robocopy \\file_server\source_folder D:\destination_folder /E /ZB /DCOPY:T /COPYALL /R:1 /W:1 /V /TEE /MT:128 /XD “System Volume Information” /LOG:D:\robocopy.log

Most of them are common, but there are a few worth pointing out:

  • /MT – use multithreading, 8 threads per Robocopy process by default. If you’re dealing with lots of small files, this can significantly improve performance.
  • /R:1 and /W:1 – Robocopy doesn’t copy locked files to avoid data inconsistencies. Default behaviour is to keep retrying until the file is unlocked. It’s important for the final data synchronisation, but for data seeding I recommend one retry and one second wait to avoid unnecessary delays.
  • /COPYALL and /DCOPY:T will copy all file and directory attributes, permissions, as well as timestamps.
  • /XD “System Volume Information” is useful if you’re copying an entire volume. If you don’t exclude the System Volume Information folder, you may end up copying deduplication and DFSR data, which in addition to wasting disk space, will break these features on the destination server.

Robocopy is typically scheduled to run at certain times of the day, preferably after hours. You can put it in a batch script and schedule using Windows Scheduler. Just keep in mind that if you specify the job to stop after running for a certain amount of hours, Windows Scheduler will stop only the batch script, but the Robocopy process will keep running. As a workaround, you can schedule another job with the following command to kill all Robocopy processes at a certain time of the day, say 6am in the morning:

taskkill /f /im robocopy.exe

Duplicating shares

For copying CIFS shares I’ve been using “sharedup” utility from EMC’s “CIFS Tools” collection. To get the tool, register a free account on https://support.emc.com. You can do that even if you’re not an EMC customer and don’t own an EMC storage array. From there you will be able to search for and download CIFT Tools.

If your source and destination file servers are completely identical, you can use sharedup to duplicate CIFS shares in one command. But it’s rarely the case. Often you want to exclude some of the shares or change paths if your disk drives or directory structure have changed. Sharedup supports input and output file command line options. You can generate a shares list first, which you can edit and then import shares to the destination file server.

To generate the list of shares first run:

sharedup \\source_server \\destination_server ALL /SD /LU /FO:D:\shares.txt /LOG:D:\sharedup.log

Resulting file will have records similar to this:

#
@Drive:E
:Projects ;Projects ;C:\Projects;
#
@Drive:F
:Home;Home;C:\Home;

Delete shares you don’t want to migrate and update target path from C:\ to where your data actually is. Don’t change “@Drive:E” headers, they specify location of the source share, not destination. Also worth noting that you won’t see permissions listed anywhere in this file. This file lists shares and share paths only, permissions are checked and copied at runtime.

Once you’re happy with the list, use the following command to import shares to the destination file server:

sharedup \\source_server \\destination_server ALL /R /SD /LU /FI:D:\shares.txt /LOG:D:\sharedup.log

For server local users and groups, sharedup will check if they exist on destination. So if you run into an error similar to the following, make sure to first create those groups on the destination file server:

10:13:07 : WARNING : The local groups “WinRMRemoteWMIUsers__” and “source_server_WinRMRemoteWMIUsers__” do not exist on the \\destination_server server !
10:13:09 : WARNING : Please use lgdup utility to duplicate the missing local user(s) or group(s) from \\source_server to \\destination_server.
10:13:09 : WARNING : Unable to initialize the Security Descriptor translator

Conclusion

I created this post as a personal howto note, but I’d love to hear if it’s helped anyone else. Or if you have better tool suggestions to accomplish this task, please let me know!

vRealize Automation Disaster Recovery

January 14, 2018

Introduction

VMware has invested a lot of time and effort in vRealize Automation high availability. For medium and large deployment scenarios VMware recommends using a load balancer (Citrix, F5, NSX) to distribute traffic between vRA appliance and infrastructure components, as well as database clustering (such as MS SQL availability groups) for database high availability. Additionally, in vRA 7.3 VMware added support for automatic failover of vRA appliance’s embedded PostgreSQL database, which was a manual process prior to that.

There is a clear distinction, however, between high availability and disaster recovery. Generally speaking, HA covers redundancy within the site and is not intended to protect from full site failure. Site Recovery Manager (or another replication product) is required to protect vRA in a DR scenario, which is described in more detail in the following document:

In my opinion, there are two important aspects that are missing from the aforementioned document, which I want to cover in this blog post: restoring VM UUIDs and changing vRA IP address. I will cover them in the order that these tasks would usually be performed if you were to fail over vRA to DR:

  1. Exporting VM UUIDs
  2. Changing IP addresses
  3. Importing VM UUIDs

I will also only touch on how to change VM reservations. Which is also an important step, but very well covered in VMware documentation already.

Note: this blog post does not provide configuration guidelines for VM replication software, such as Site Recovery Manager, Zerto or RecoverPoint and is focused only on DR aspects related to vRA itself. Refer to official documentation of corresponding products to determine how to set up VM replication to your disaster recovery site.

Exporting VM UUIDs

VMware uses two UUIDs to identify a VM. BIOS UUID (uuid.bios in .vmx file) was the original VM identifier implemented to identify a VM and is derived from the hardware VM is provisioned on. But it’s not unique. If VM is cloned, the clone will have the same BIOS UUID. So the second identifier was introduced called Instance UUID (vc.uuid in .vmx file), which is generated by vCenter and is unique within a single vCenter (two VMs in different vCenters can have the same Instance UUID).

When VMs are failed over, Instance UUIDs change. Compare VirtualMachine.Admin.AgentID (Instance UUID) and VirtualMachine.Admin.UUID (BIOS UUID) on original and failed over VMs.

Why does this matter? Because vRA uses Instance UUIDs to keep track of managed VMs.  If Instance UUIDs change, vRA will show the corresponding VMs as missing under Infrastructure > Managed Machines. And you won’t be able to manage them.

So it’s important to export VM Instance UUIDs before failover, which can then be used to restore the original values. This is how you can get the Instance UUID of a given VM using PowerCLI:

> (Get-VM vm_name).extensiondata.config.InstanceUUID

Here, on my GitHub page, you can find a script that I have put together to export Instance UUIDs of all VMs in CSV format.

Changing IP addresses

Once you’ve saved the Instance UUIDs, you can move on to failover. vRA components should be started in the following order:

  1. MS SQL database
  2. vRA appliance
  3. IaaS server

If network subnets, that all components are connected to, are stretched between two sites, when VMs are brought up at DR, there are no additional reconfiguration required. But usually it’s not the case and servers need to be re-IP’ed. IaaS server network setting are changed the same as on any other Windows server machine.

vRealize Appliance network settings are changed in vRA appliance management interface, that can be accessed at https://vra-appliance-hostname:5480, under Network > Address tab. The problem is, if IP addresses change at DR, it will be challenging to reach vRA appliance over the network. To work around that, connect to vRA VM console and run the following script from CLI to change appliance’s network settings:

# /opt/vmware/share/vami/vami_config_net

Don’t forget to update the DNS record for vRA appliance in DNS. For IaaS server it’s not needed, as long as you allow Dynamic DNS (DDNS) updates.

Importing VM UUIDs

After the failover all of your VMs will have missing status in vRA. To make vRA recognize failed over VMs you will need to revert Instance UUIDs back to the original values. In PowerCLI this can be done in the following way:

> $spec = New-Object VMware.Vim.VirtualMachineConfigSpec
> $spec.instanceUuid = ’52da9b14-0060-dc51-4733-3b01e912edd2′
> $vm = Get-VM -Name vm_name
> $vm.Extensiondata.ReconfigVM_Task($spec)

I’ve written another script, that will perform this task for you, which you can find on my GitHub page.

You will need two files to make the script work. The vm_vc_uuids.csv file you generated before, with the list of original VM Instance UUIDs. As well as the list of missing VMs in CSV format, that you can export from vRA after the failover on the Infrastructure > Managed Machines page:

This is an example of the script command line options and the output:

You will need to run an inventory data collection from the Infrastructure > Compute Resources > Compute Resources page. vRA will discover VMs and update their status to “On”.

Updating reservations

If you try to run any Day 2 operation on a VM with the old reservation in place, you will get an error similar to this:

Error processing [Shutdown], error details:
Error getting property ‘runtime.powerState’ from managed object (null)
Inner Exception: Object reference not set to an instance of an object.

To manually update VM reservation, on Infrastructure > Managed Machines page hover over the VM and select Change Reservation:

This process is obviously not scalable, as it can take hours, if you have hundreds of VMs. VMware offers an alternative solution that lets you update all VMs by using Bulk Import feature available from Infrastructure > Administration > Bulk Imports. The idea is that you can export all VM configuration details in a CSV file, update compute and storage reservation columns and import back to vRA. vRealize Suite 7.0 Disaster Recovery by Using Site Recovery Manager 6.1 gives very detailed instruction on how to do that in “Bulk Import, Update, or Migrate Virtual Machines” section.

Conclusion

I hope this blog post helped to cover some gaps in VMware documentation. If you have any questions or comments, as always, feel free to leave them in the comments sections below.

References

[SOLVED] Migrating vCenter Notifications

January 6, 2018

Why is this a problem?

VMware upgrades and migrations still comprise a large chunk of what I do in my job. If it is an in-place upgrade it is often more straightforward. The main consideration is making sure all compatibility checks are made. But if it is a rebuild, things get a bit more complicated.

Take for example a vCenter Server to vCenter Server Appliance migration. If you are migrating between 5.5, 6.0 and 6.5 you are covered by the vCenter Server Migration Tool. Recently I came across a customer using vSphere 5.1 (yes, it is not as uncommon as you might think). vCenter Server Migration Tool does not support migration from vSphere 5.1, which is fair enough, as it is end of support was August 2016. But as a result, you end up being on your own with your upgrade endeavours and have to do a lot of the things manually. One of such things is migrating vCenter notifications.

You can go and do it by hand. Using a few PowerCLI commands you can list the currently configured notifications and then recreate them on the new vCenter. But knowing how clunky and slow this process is, I doubt you are looking forward to spend half a day configuring each of the dozens notifications one by one by hand (I sure am not).

I offer an easy solution

You may have seen a comic over on xkcd called “Is It Worth The Time?“. Which gives you an estimate of how long you can work on making a routine task more efficient before you are spending more time than you save (across five years). As an example, if you can save one hour by automating a task that you do monthly, even if you spend two days on automating it, you will still brake even in five years.

Knowing how often I do VMware upgrades, it is well worth for me to invest time in automating it by scripting. Since you do not do upgrades that often, for you it is not, so I wrote this script for you.

If you simply want to get the job done, you can go ahead and download it from my GitHub page here (you will also need VMware PowerCLI installed on your machine for it to work) and then run it like so:

.\copy-vcenter-alerts-v1.0.ps1 -SourceVcenter old-vc.acme.com -DestinationVcenter new-vc.acme.com

Script includes help topics, that you can view by running the following command:

Get-Help -full .\copy-vcenter-alerts-v1.0.ps1

Or if you are curious, you can read further to better understand how script works.

How does this work?

First of all, it is important to understand the terminology used in vSphere:

  • Alarm trigger – a set of conditions that must be met for an alarm warning and alert to occur.
  • Alarm action – operations that occur in response to triggered alarms. For example, email notifications.

Script takes source and destination vCenter IP addresses or host names as parameters and starts by retrieving the list of existing alerts. Then it compares alert definitions and if alert doesn’t exist on the destination, it will be skipped, so be aware of that. Script will show you a warning and you will be able to make a decision about what to do with such alert later.

Then for each of the source alerts, that exists on the destination, script recreates actions, with exact same triggers. Trigger settings, such as repeats (enabled/disabled) and trigger state changes (green to yellow, yellow to red, etc) are also copied.

Script will not attempt to recreate an action that already exists, so feel free to run the script multiple times, if you need to.

What script does not do

  1. Script does not copy custom alerts – if you have custom alert definitions, you will have to recreate them manually. It was not worth investing time in such feature at this stage, as custom alerts are rare and even if encountered, there us just a handful, that can be moved manually.
  2. Only email notification actions are supported – because they are the most common. If you use other actions, like SNMP traps, let me know and maybe I will include them in the next version.

PowerCLI cmdlets used

These are some of the useful VMware PowerCLI cmdlets I used to write the script:

  • Get-AlarmDefinition
  • Get-AlarmAction
  • Get-AlarmActionTrigger
  • New-AlarmAction
  • New-AlarmActionTrigger

Extracting vRealize Operations Data Using REST API

September 17, 2017

Scripting today is an important skill if you’re a part of IT operations team. It is common to use PowerShell or any other scripting language of your choice to automate repetitive tasks and be efficient in what you do. Another use case for scripting and automation, which is often missed, is the fact that they let you do more. Public APIs offered by many software and hardware solutions let you manipulate their data and call functions in the way you need, without being bound by the workflows provided in GUI.

Recently I was asked to extract data from vRealize Operations Manager that was not available in GUI or a report in the format I needed. At first it looked like a non-trivial task as it required scripting and using REST APIs to pull the data. But after some research it turned out to be much easier than I thought.

Using Python this can be done in a few lines of code using existing Python libraries that do most of the work for you. The goal of this blog post is to show that scripting does not have to be hard and using the right tools for the right job you can get things done in a matter of minutes, not hours or days.

Scenario

To demonstrate an example of using vRealize Operations Manager REST APIs we will retrieve the list of vROps adapters, which vROps uses to pull information from many hardware and software solutions it supports, such as Nimble Storage or Microsoft SQL Server.

vROps APIs are obviously much more powerful than that and you can use the same approach to pull other information such as: active and inactive alerts, performance statistics, recommendations. Full vROps API documentation can be found at https://your-vrops-hostname/suite-api/.

Install Python and Libraries

We will be using two Python libraries: “Requests” to make REST calls and “ElementTree” for XML parsing. ElementTree comes with Python, so we will need to install the Requests package only.

I already made a post here on how to install Python interpreter and Python libraries, so we will dive right into vROps APIs.

Retrieve the List of vROps Adapters

To get the list of all installed vROps adapters we need to make a GET REST call using the “get” method from Requests library:

import requests
from requests.auth import HTTPBasicAuth

akUrl = 'https://vrops/suite-api/api/adapterkinds'
ak = requests.get(akUrl, auth=HTTPBasicAuth('user', 'pass'))

In this code snippet using the “import” command we specify that we are using Requests library, as well as its implementation of basic HTTP authentication. Then we request the list of vROps adapters using the “get” method from Request library, and save the XML response into the “ak” variable. Add “verify=False” to the list of the get call parameters if you struggle with SSL certificate issues.

As a result you will get the full list of vROps adapters in the format similar to the following. So how do we navigate that? Using ElementTree XML library.

Parsing XML Response Sequentially

vRealize Operations Manager returns REST API responses in XML format. ElementTree lets you parse these XML responses to find the data you need, which you can output in a human-readable format, such as CSV and then import into an Excel spreadsheet.

Parsing XML tree requires traversing from top to bottom. You start from the root element:

import xml.etree.ElementTree as ET

akRoot = ET.fromstring(ak.content)

Then you can continue by iterating through child elements using nested loops:

for adapter in akRoot:
  print adapter.tag, adapter.attrib['key']
    for adapterProperty in adapter:
      print adapterProperty.name, adapterProperty.text

Childs of <ops:adapter-kinds> are <ops:adapter-kind> elements. Childs of <ops:adapter-kind> elements are <ops:name>, <ops:adapterKindType>, <ops:describeVersion> and <ops:resourceKinds>. So the output of the above code will be:

adapter-kind CITRIXNETSCALER_ADAPTER
name Citrix NetScaler Adapter
adapterKindType GENERAL
describeVersion 1
resourceKinds citrix_netscaler_adapter_instance
resourceKinds appliance
…

As you could’ve already noticed, all XML elements have tags and can additionally have attributes and associated text. From above example:

  • Tags: adapter-kind, name, adapterKindType
  • Attribute: key
  • Text: Citrix NetScaler Adapter, GENERAL, 1

Finding Interesting Elements

Typically you are looking for specific information and don’t need to traverse the whole XML tree. So instead of walking through the tree sequentially, you can iterate trough interesting elements using the “iterfind” method. For instance if we are looking only for adapter names, the code would look as the following:

ns = {'vrops': 'http://webservice.vmware.com/vRealizeOpsMgr/1.0/'}
for akItem in akRoot.iterfind('vrops:adapter-kind', ns):
  akNameItem = akItem.find('vrops:name', ns)
  print akNameItem.text

All elements in REST API responses are usually prefixed with a namespace. To avoid using the long XML element names, such as http://webservice.vmware.com/vRealizeOpsMgr/1.0/adapter-kind, ElementTree methods support using namespaces, that can be then passed as a variable, as the “ns” variable in this code snippet.

Resulting output will be similar to:

Citrix NetScaler Adapter
Container
Dell EMC PowerEdge
Dell Storage Adapter
EP Ops Adapter
F5 BIG-IP Adapter
HP Servers Adapter

Additional Information

I intentionally tried to keep this post short to give you all information required to start using Python to parse REST API responses in XML format.

I have written two scripts that are more practical and shared them on my GitHub page here:

  • vrops_object_types_1.0.py – extracts adapters, object types and number of objects. Script gives you an idea of what is actually being monitored in vROps, by providing the number of objects you have in your vROps instance for each adapter and object type.
  • vrops_alert_definitions_1.0.py – extracts adapters, object types, alert names, criticality and impact. As opposed to the first script, this script provides the list of alerts for each adapter and object type, which is helpful to identify potential alerts that can be triggered in vROps.

Feel free to download these scripts from GitHub and play with them or adapt them according to your needs.

Helpful Links

Python for Windows: Quick Installation

September 7, 2017

Only recently I added Python to the list of tools I use in my job. I had always used PowerShell if I needed to script something, until I saw how easy Python is to use. I will be keeping it in my arsenal from now on.

In near future I plan to make a blog post on how to use Python with REST APIs and in this blog post I wanted to provide quick instructions on how to install Python in Windows, that I can later use as a reference.

Installing Python

Latest version of Python for Windows can be downloaded from https://www.python.org/downloads/windows/. Executable installer is probably the easiest. When installing, make sure to check “Add Python 3.6 to PATH” option, it makes life much easier.

Installing Libraries

Python installation already includes lots of libraries that you can use for scripting. ElementTree library, for example, which is used for XML parsing, comes with the interpreter.

Depending on what you want to use Python for, you may need to install additional libraries. For instance, if you want to call REST APIs, you may need Requests – library for HTTP cals.

Python uses a package manager called “pip”. If it is not already in your PATH variable, find pip under Python installation directory and run from command line as administrator:

pip install requests

Once the library is installed you can use it by importing it into your scripts:

import requests

Writing Code

At this point you can call Python interpreter in Windows command line and start running Python commands. If you want to write a script, however, you will need an IDE. Nothing wrong in using Notepad, but there are more efficient ways to do that.

Python for Windows comes with IDE, which is called simply IDLE. It is very basic, but it provides all essential features, such as as code completion, syntax highlighting and a primitive debugger. It is not perfect but it has everything to get you started.

Conclusion

That is a quick crash course with three simple steps to get Python up and running. I tried to keep it short to demonstrate how you can start using Python with minimum effort.

vSphere 6 Dump / Syslog Collector: PowerCLI Script

November 17, 2015

This is a quick update for a post I previously wrote on configuring vSphere 5 Syslog and Network Dump Collectors. You can find it here. This post will be about the changes in version 6.

Scripts I reposted for version 5 no longer work for version 6, so I thought I’d do an update. If you’re looking just for the updated scripts, simply scroll down to the end of the post.

What’s new in vSphere 6

If you look at the scripts all that’s changed is the order and number of the arguments. Which is not overly exciting.

What’s more interesting is that with vSphere 6 Syslog and ESXi Dump Collectors are no longer a separate install. They’re bundled with vCenter and you won’t see them as separate line items in the vCenter installer.

What I’ve also noticed is that ESXi Dump Collector service is not started automatically, so make sure to go to the services on the vCenter VM and start it manually.

Dump Collector vCenter plugin doesn’t seem to exist any more as well. But you are still able to see Syslog Collector settings in vCenter.

syslog_dump_collectors

Another thing worth mentioning here is also the directories where the logs and dumps are kept. In vCenter 6 they can be found by these paths:

C:\ProgramData\VMware\vCenterServer\data\vmsyslogcollector

C:\ProgramData\VMware\vCenterServer\data\netdump\Data

 

PowerShell Get-EsxCli Cmdlet

Also want to quickly touch on the fact that the below scripts are written using the Get-EsxCli cmdlet to get a EsxCli object and then directly invoke its methods.  Which I find not very ideal, as it’s not clear what each of the arguments actually mean and because the script gets broken every time the number or order of the arguments changes. Which is exactly what’s happened here.

There are Set-VMHostSyslogConfig and Set-VMHostDumpCollector cmdlets, which use argument names such as -SyslogServer and -Protocol, which are self explanatory. I may end up rewriting these scripts if I have time. But at the end of the day both ways will get the job done.

Maybe one hint is if you’re lost and not sure about the order of the arguments, run this cmdlet on a EsxCli object to find out what each argument actually mean:

$esxcli.system.coredump.network | Get-Member

get-member

ESXi Dump Collector PowerCLI script:

Foreach ($vmhost in (get-vmhost))
{
$esxcli = Get-EsxCli -vmhost $vmhost
$esxcli.system.coredump.network.get()
}

Foreach ($vmhost in (get-vmhost))
{
$esxcli = Get-EsxCli -vmhost $vmhost
$esxcli.system.coredump.network.set($null, “vmk0”, $null, “10.10.10.10”, 6500);
$esxcli.system.coredump.network.set($true)
}

There are a couple commands to check the ESXi Dump Collector configuration, as it’s not always clear if it’s able to write a core dump until a PSOD actually happens.

First command checks if Dump Collector service on a ESXi host can connect to the Dump Collector server and the second one actually forces ESXi host to purple screen if you want to be 100% sure that a core dump is able to be written. Make sure to put the ESXi host into maintenance mode if you want to go that far.

# esxcli system coredump network check

# vsish
# set /reliability/crashMe/Panic

Syslog Collector PowerCLI script:

Foreach ($vmhost in (get-vmhost))
{
$esxcli = Get-EsxCli -vmhost $vmhost
$esxcli.system.syslog.config.get()
}

Foreach ($vmhost in (get-vmhost))
{
$esxcli = Get-EsxCli -vmhost $vmhost
$esxcli.system.syslog.config.set($null, $null , $null, $null, $null, $null, $null, $null, “udp://vcenter.domain.local:514”, $null, $null);
$esxcli.network.firewall.ruleset.set($null, $true, “syslog”)
$esxcli.system.syslog.reload()
}

For the Syslog Collector it’s important to remember that there’s a firewall rule on each ESXi host, which needs to be enabled (the firewall ruleset command in the script).

For the Dump Collector there’s no firewall rule. So if you looking for it and can’t find, it’s normal to not have it by default.

vSphere Dump / Syslog Collector: PowerCLI Script

March 12, 2015

Overview

If you install ESXi hosts on say 2GB flash cards in your blades which are smaller than required 6GB, then you won’t have what’s called persistent storage on your hosts. Both your kernel dumps and logs will be kept on RAM drive and deleted after a reboot. Which is less than ideal.

You can use vSphere Dump Collector and Syslog Collector to redirect them to another host. Usually vCenter machine, if it’s not an appliance.

If you have a bunch of ESXi hosts you’ll have to manually go through each one of them to set the settings, which might be a tedious task. Syslog can be done via Host Profiles, but Enterprise Plus licence is not a very common things across the customers. The simplest way is to use PowerCLI.

Amendments to the scripts

These scripts originate from Mike Laverick’s blog. I didn’t write them. Original blog post is here: Back To Basics: Installing Other Optional vCenter 5.5 Services.

The purpose of my post is to make a few corrections to the original Syslog script, as it has a few mistakes:

First – typo in system.syslog.config.set() statement. It requires additional $null argument before the hostname. If you run it as is you will probably get an error which looks like this.

Message: A specified parameter was not correct.
argument[0];
InnerText: argument[0]

Second – you need to open outgoing syslog ports, otherwise traffic won’t flow. It seems that Dump Collector traffic is enabled by default even though there is no rule for it in the firewall (former netDump rule doesn’t exist anymore). Odd, but that’s how it is. Syslog on the other hand requires explicit rule, which is reflected in the script by network.firewall.ruleset.set() command.

Below are the correct versions of both scripts. If you copy and paste them everything should just work.

vSphere Dump Collector

Foreach ($vmhost in (get-vmhost))
{
$esxcli = Get-EsxCli -vmhost $vmhost
$esxcli.system.coredump.network.get()
}

Foreach ($vmhost in (get-vmhost))
{
$esxcli = Get-EsxCli -vmhost $vmhost
$esxcli.system.coredump.network.set($null, “vmk0”, “10.0.0.1”, “6500”)
$esxcli.system.coredump.network.set($true)
}

vSphere Syslog Collector

Foreach ($vmhost in (get-vmhost))
{
$esxcli = Get-EsxCli -vmhost $vmhost
$esxcli.system.syslog.config.get()
}

Foreach ($vmhost in (get-vmhost))
{
$esxcli = Get-EsxCli -vmhost $vmhost
$esxcli.system.syslog.config.set($null, $null, $null, $null, $null, “udp://10.0.0.1:514”)
$esxcli.network.firewall.ruleset.set($null, $true, “syslog”)
$esxcli.system.syslog.reload()
}

Consistent VMware snapshots on NetApp

March 16, 2012

If you use NetApp as a storage for you VMware hard drives, it’s wise to utilize NetApp’s powerful snapshot capabilities as an instant backup tool. I shortly mentioned in my previous post that you should disable default snapshot schedule. Snapshot is done very quickly on NetApp, but still it’s not instantaneous. If VM is running you can get .vmdks which have inconsistent data. Here I’d like to describe how you can perform consistent snapshots of VM hard drives which sit on NetApp volumes exported via NFS. Obviously it won’t work for iSCSI LUNs since you will have LUNs snapshots which are almost useless for backups.

What makes VMware virtualization platform far superior to other well-known solutions in the market is VI APIs. VI API is a set of Web services hosted on Virtual Center and ESX hosts that provides interfaces for all components and operations. Particularly, there is a Perl interface for VI API which is called VMware Infrastructure Perl Toolkit. You can download and install it for free. Using VI Perl Toolkit you can write a script which will every day put your VMs in a so called hot backup mode and make NetApp snapshots as well. Practically, hot backup mode is also a snapshot. When you create a VM snapshot, original VM hard drive is left intact and VMware starts to write delta in another file. It means that VM hard drive won’t change when making NetApp snapshot and you will get consistent .vmdk files. Now lets move to implementation.

I will write excerpts from the actual script here, because lines in the script are quite long and everything will be messed up on the blog page. I uploaded full script on FileDen. Here is the link. I apologize if you read this blog entry far later than it was published and my account or the FileDen service itself no longer exist.

VI Perl Toolkit is effectively a set of Perl scripts which you run as ready to use utilities. We will use snapshotmanager.pl which lets you create VMware VM snapshots. In the first step you make snapshots of all VMs:

\”$perl_path\perl\” -w \”$perl_toolkit_path\snapshotmanager.pl\” –server vc_ip –url https://vc_ip/sdk/vimService –username snapuser –password 123456  –operation create –snapshotname \”Daily Backup Shapshot\”

For the sake of security I created Snapshot Manager role and respective user account in Virtual Center with only two allowed operations: Create Snapshot and Remove Snapshot. Run line is self explanatory. I execute it using system($run_line) command.

After VM snapshots are created you make a NetApp snapshot:

“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap create vm_sata snap_name

To connect to NetApp terminal I use PuTTY ssh client. putty.exe itself has a GUI and plink.exe is for batch scripting. Using this command you create snapshot of particular NetApp volume. Those which hold .vmdks in our case.

To get all VMs from hot backup mode run:

\”$perl_path\perl\” -w \”$perl_toolkit_path\snapshotmanager.pl\” –server vc_ip –url https://vc_ip/sdk/vimService –username snapuser –password 123456  –operation remove –snapshotname \”Daily Backup Shapshot\”  –children 0

By –children 0 here we tell not to remove all children snapshots.

After we familiarized ourselves with main commands, lets move on to the script logic. Apparently you will want to have several snapshots. For example 7 of them for each day of the week. It means each day, before making new snapshot you will need to remove oldest and rename others. Renaming is just for clarity. You can name your snapshots vmsnap.1, vmsnap.2, … , vmsnap.7. Where vmsnap.7 is the oldest. Each night you put your VMs in hot backup mode and delete the oldest snapshot:

“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap delete vm_sata vmsnap.7

Then you rename other snapshots:

“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap rename vm_sata vmsnap.6 vmsnap.7
“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap rename vm_sata vmsnap.5 vmsnap.6
“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap rename vm_sata vmsnap.4 vmsnap.5
“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap rename vm_sata vmsnap.3 vmsnap.4
“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap rename vm_sata vmsnap.2 vmsnap.3

And create the new one:

“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap create vm_sata vmsnap.1

As a last step you bring your VMs out of hot backup mode.

Using this technique you can create short term backups of your virtual infrastructure and use them for long term retention with help of standalone backup solutions. Like backing up data from snapshots to tape library using Symantec BackupExec. I’m gonna talk about this in my later posts.

Run scheduled job with user logged off

February 16, 2012

Lots of people have problem with Windows Server 2008 scheduler when their job isn’t run if they use an option “Run whether user is logged in or not”. In my case problem was related to output from script. It seems that since job is run in background it’s not allowed to neither read from input nor write to output. I commented out all output and job started to run with this option switched on.