Using VMware SDK Support

December 5, 2019

I predict this post will get a single hit over its lifetime, but if it helps at least one person desperately trying to find out how to open a VMware SDK support request, that’s good enough for me.

Quick Overview

Not everyone knows, but VMware, along with support for vSphere, NSX and all the other software products, also provides SDK and API support. If you are a partner developing a solution that integrates with a VMware product or even a customer writing your own vSphere plug-in using vSphere Management SDK, you can reach out to VMware for help.

It’s a paid service. You can find detailed description of it on its landing page here: VMware SDK and API Support

How to open SRs

One thing that is not very obvious about the SDK support is how to open support requests if you’re a customer. The goal of this short post is to demonstrate where to find it on VMware support portal:

  1. Log in to My VMware portal using your account credentials
  2. Under the Support section click Get Support
  3. On the opened page, under “Technical” category, choose your issue type, such as “Fault/Crash”
  4. In the provided list of Supported Products expand SDK Support Services
  5. Select VMware SDK Support
  6. Click Continue and proceed with describing your issue and opening the ticket, as usual

This is a screenshot of what it will look like, if your account has been entitled to SDK support:

If you’re working with SpringSource, there is also a range of support option under the SprinSource Open Source Support sub-category.

Conclusion

I’ve had only brief interaction with SDK Support team, but can only say good things about them. One of the examples was a question I had on parameter specifications of a particular vSphere Web Management SDK function and I not only got an answer to my question, but I was also provided with code snippets, which I didn’t even ask for. So if you are serious about using VMware SDKs and think you may require technical support, I can certainly recommend this service.

Multiple vCenter Connections in PowerCLI

November 30, 2019

Connect-VIServer is a PowerCLI cmdlet which most of the PowerCLI scripts out there start with. It creates a connection to a vCenter server, which you can then use to run queries against a particular vSphere environment.

If you only need one vCenter connection, you don’t need to worry about how your subsequent PowerCLI cmdlets are ordered. They will all use this single vCenter by default for all requests. But if you need multiple simultaneous connections, you have to be more careful with you code or you can accidentally end up running commands against a wrong vCenter. Not a situation you want to find yourself in, especially if you’re making changes to the environment.

Goal of this blog post is to explain PowerCLI behaviour, when multiple vCenter connections are used. And how to best write your code to avoid common mistakes.

vCenter server parameter

If you are connected to more than one vCenter, the easiest way to specify which vCenter to use in a particular PowerCLI cmdlet is by using the -Server parameter. For example:

Get-VM -Server vcenter1.domain.local

By doing so, you will avoid any ambiguity in your code, since vCenter is always specified explicitly. This is the theory. In reality, I find, you will always have that one command where you forgot to provide -Server parameter, which will cause problems when you least expect it.

DefaultVIServer variable

If you’ve done any PowerCLI scripting before, you’ve most likely come across the $global:DefaultVIServer variable. When you connect to a vCenter using Connect-VIServer cmdlet, DefaultVIServer variable is set to that vCenter name. That way when you run PowerCLI cmdlets without using -Server parameter, they implicitly use vCenter from the DefaultVIServer variable as a target for the query. When you disconnect from the vCenter using Disconnect-VIServer, this variable is emptied.

The challenge with this approach is that the DefaultVIServer variable can only hold one vCenter name. If you connect/disconnect to/from multiple vCenter servers, you may end up in a situation where DefaultVIServer variable becomes empty, even though you still have an active vCenter connection. It’s easier to demonstrate it in the following script output:

Connecting to vCenter server vcenter1.domain.local
DefaultVIServer value: vcenter1.domain.local
Connecting to vCenter server vcenter2.domain.local
DefaultVIServer value: vcenter2.domain.local
Disconnecting from vCenter server vcenter2.domain.local
DefaultVIServer value:
Get-VM : 29/11/2019 6:17:41 PM Get-VM You are not currently connected to any servers. Please connect first using a Connect cmdlet.

As you can see, after we disconnect from the current default vCenter server vcenter2.domain.local, the variable becomes blank, until we connect to any other vCenter again. As a result, Get-VM cmdlet fails.

The error message is misleading. Connection is there and you can still run commands against the vcenter1.domain.local by specifying it in the -Server parameter. However, it defeats the purpose of DefaultVIServer variable, when using multiple simultaneous vCenter connections.

There is a way to fix that.

DefaultVIServer mode

This leads us up to the third option – changing DefaultVIServer mode. PowerCLI supports multiple default vCenter servers if you change DefaultVIServer to Multiple (default is Single):

Set-PowerCLIConfiguration -DefaultVIServerMode Multiple

As a result of this change, PowerCLI will start using DefaultVIServers array, which tracks all currently connected vCenters.

There are two implications of using Multiple connection mode:

  1. If you run a PowerCLI cmdlet without the -Server parameter, it will run against all connected vCenters at the same time – which is fine, as long as this is what you want.
  2. If you expect other users to run this script and they use Single connection mode, it can break your script. If that’s the case, make sure to explicitly set DefaultVIServer mode in the beginning of your script to avoid any unexpected behaviour.

Conclusion

Each option has its pros and cons. It’s up to you to choose what works best in your particular situation. But if you asked me, I would recommend disconnecting vCenter server session as soon as you no longer needed it. That way you can avoid any potential ambiguity in your code. Use multiple simultaneous vCenter connections only if you absolutely need it.

Joining ESXi to AD in Disjoint Namespace

November 4, 2019

What is Disjoint Namespace?

Typically, when using Microsoft Active Directory you use AD-integrated DNS and your AD domain name matches you DNS domain name, but you don’t have to. This is quite rare, but I’ve seen cases where the two don’t match. For example, you might have a Linux-based DNS, where you register an esx01.example.com DNS record for your ESXi host and then you join it to an Active Directory domain called corp.local.

That’s called a disjoint namespace. You can read this Microsoft article if you want to know more details: Disjoint Namespace.

In my personal opinion, using a disjoint namespace is asking for trouble, but it will still work if you really want to use it.

Problem

If you end up going down that route, there’s one caveat you should be aware of. When you joining a machine to AD, among other things, it needs to populate DNS name field property of the AD computer object. This is an example of ESXi computer object in Active Directory Users and Computers snap-in:

If you configure example.com domain in your ESXi Default TCP/IP stack, like so:

And then you try to, for example, join your ESXi host to corp.local AD domain, it will attempt to use esx-01a.example.com for computer object DNS name field. If you’re using a domain account with privileges restricted only to domain join, this operation will fail.

This is how the problem manifested itself in my case in ESXi host logs:

Failed to run provider specific request (request code = 8, provider = ‘lsa-activedirectory-provider’) -> error = 40315, symbol = LW_ERROR_LDAP_CONSTRAINT_VIOLATION, client pid = 2099303

If you’re using host profiles to join ESXi host to the domain, remediation will fail and you will see the following in /var/log/syslog.log:

WARNING: Domain join failed; retry count 1.

WARNING: Domain join failed; retry count 2.

Likewise (ActiveDirectory) Domain Join operation failed while joining new domain via username and password..

Note: this problem is specific to joining domain using a restricted service account. If you use domain administrator account, it will force the controller to add the computer object with a DNS name, which doesn’t match the AD name.

Solution

Make sure ESXi domain name setting matches the Active Directory domain name, not DNS domain name. You can still use the esx-01a.example.com record to add the ESXi hosts to vCenter, but you have to specify corp.local domain in DNS settings (or leave it blank), because this is what is going to be used to add the host to AD, like so:

This way your domain controller will be happy and ESXi host will successfully join the domain.

Additional Notes

While troubleshooting this issue I saw a few errors in ESXi host logs, which were a distraction, ignore them, as they don’t constitute an error.

This just means that the ESXi host Active Directory service is running, but host is not joined to a domain yet:

lsass: Failed to run provider specific request (request code = 12, provider = ‘lsa-activedirectory-provider’) -> error = 2692, symbol = NERR_SetupNotJoined, client pid = 2111366

IPC is inter-process communication. Likewise consists of multiple services that talk to each other. They open and close connections, this is normal:

lsass-ipc: (assoc:0x8ed7e40) Dropping: Connection closed by peer

I also found this command to be useful for deeper packet inspection between an ESXi host and AD domain controllers:

tcpdump-uw -i vmk0 not port 22 and not arp

References

Host Profile Customization Issue

November 1, 2019

vSphere Host Profiles is a great feature for consistent ESXi host configuration and compliance checks, but can at times be flaky.

I’ve noticed an issue recently with Host Profiles in vSphere 6.7, where after providing host customization values the following error is shown in vSphere Web Client:

The “Update host customizations” operation failed for the entity with the following error message.

Host settings validation failed.

This is how the error message looks like in the client:

Even though it’s a bit annoying, I found it to be a furphy. Customizations are actually saved successfully and the error can be ignored. You can find the following messages in ESXi host’s /var/log/syslog.log file, which confirm that it works:


INFO: Execute completed
INFO: Validating AnswerFile Status1 = success
INFO: Cleaned up Host Configuration
INFO: GetAnswerFile completed

I’ve also found that this error doesn’t appear when you provide host customization values first time straight after attaching a profile to the host. Only when you update them. It also doesn’t show up in HTML5, only Web Client. I guess, one more reason to switch to HTML5.

Hope this blog post helps someone who searched in Google, but couldn’t find any information related to this error message.

Reminder: Disable Firewall on NSX ECMP Edge

October 15, 2019

ECMP and Stateful Services

It’s not new, this topic has already been discussed many times before, examples are here, here, here and here. When NSX Edges are configured in ECMP mode, none of the stateful services like VPN, NAT or Load Balancing are supported.

From NSX Design Guide:

In ECMP mode, only routing service is available. Stateful services cannot be supported due to asymmetric routing inherent in ECMP-based forwarding.

Even if you didn’t read documentation, but have networking skills, you’d know that protocols like NAT need to track network session state and even if you configure the same NAT rule on all of your ECMP-enabled edges, it won’t work, because due to ECMP, traffic can flow through one ESG on ingress and another ESG on egress. Since NAT tables are not synchronized, ESGs won’t be able to find the corresponding network flow in translation table and will drop the traffic.

ECMP and Firewall

But there’s another issue that doesn’t always come across or simply get forgotten about. You can deploy ESGs in ECMP mode, not configure any of the stateful services like VPN, NAT or LB, but still get network communication issues. Why? Because when you deploy an ESG, you always end up with firewall in enabled state. Firewall is also considered a stateful service.

From VVD 5.1 documentation:

SDDC-VISDN-032: For all ESGs deployed as ECMP North-South routers, disable the firewall. Use of ECMP on the ESGs is a requirement. Leaving the firewall enabled, even in allow all traffic mode, results in sporadic network connectivity. Services such as NAT and load balancing cannot be used when the firewall is disabled.

In fact, firewall is what actually tracks sessions and drops packets that don’t match existing network flows, not NAT itself. That’s also the reason why services like NAT and LB don’t work without firewall being enabled.

It often throws people off, because even having no rules in the firewall and setting default policy to accept will not prevent this issue from happening.

Demo

Here is a quick demonstration. I’m trying to establish an SSH session to a VM connected to a DLR behind two ESGs in ECMP mode.

I’m showing packet debug on both ESGs using the following command:

> debug packet display follow interface vNic_1 port_22

As you can see ingress traffic goes through E1 and egress traffic goes through E2:

E1: Packet Capture

E2: Packet Capture

Since session originated on E1, E2 interprets packets as invalid and immediately drops them:

From NSX Troubleshooting Guide:

Check for an incrementing value of a DROP invalid rule in the POST_ROUTING section of the show firewall command. Typical reasons include:

  • Asymmetric routing issues

Conclusion

It’s easy to end up in this situation, because firewall is enabled by default on a newly deployed ESG. And it’s hard to troubleshoot this issue, since it’s not quite obvious what’s actually going on unless you’ve already worked with ECMP before. So the best advice in this case is just to remember, if you want to use ECMP in NSX, make sure to disable firewall on ECMP-enabled ESGs. Use distributed firewall (DFW) instead.

Troubleshooting vSphere Guest Operations API

October 4, 2019

What is vSphere Guest Operations

Recently I’ve been heavily utilizing vSphere Guest Operations API for automating vCenter patching. vSphere Guest Operations (GuestOps) is an API, which allows you to run commands on a virtual machine without needing to connect to it over the network. All you need is credentials to the vCenter managing the virtual machine and to the virtual machine itself.

GuestOps can be called by using an Invoke-VMScript PowerCLI cmdlet in the following format:

> Invoke-VMScript -ScriptText “uname -a” -vm vc01 -GuestUser root -GuestPassword VMware1!

Cmdlet will talk to the vCenter, vCenter will talk to ESXi host, ESXi host will talk to VMware Tools and, eventually, VMware Tools will run the command on the Guest OS.

It worked well for me when I was running commands on VCSA 6.0 VM (managed by another vCenter), but after patching and upgrading this VM to VCSA 6.7 I encountered the following error:

Error occured while executing script on guest OS in VM ‘vc01’. Could not locate “Powershell” script interpreter in any of the expected locations. Probably you do not have enough permissions to execute command within guest.

It’s obvious from the error message that cmdlet is doing something wrong, since it’s supposed to use bash in Linux, not PowerShell.

Enable Debugging in VMware Tools

To better understand what was going on, I logged in to VCSA via SSH and enabled VMware Tools debugging (see KB1007873 for instructions on how to do that) and restarted Open VM Tools:

# systemctl restart vmtoolsd.service

After running the Invoke-VMScript cmdlet again, this is what I noticed in vmsvc.log debug log:

[vix] VixTools_StartProgram: User: root args: progamPath: ‘cmd.exe’, arguments: ‘/C powershell -NonInteractive -EncodedCommand cABvAHcAZQByAHMAaABl…

So it wasn’t just a misleading PowerCLI error message, Invoke-VMScript was actually trying to call a PowerShell command using Windows command interpreter on a Linux VM.

Solution

My guess is that since VMware has changed underlying operating system on VCSA from SUSE Linux to Photon OS, Invoke-VMScript can no longer properly identify the underlying OS and defaults to Windows.

Simple solution to this problem is to give a helping hand to Invoke-VMScript cmdlet and specify interpreter using -ScriptType Bash parameter. This is what a proper resulting debug log message will look like:

[vix] VixToolsStartProgramImpl: started ‘”/bin/bash” -c “bash > /tmp/vmware-root/powerclivmware159 2>&1 -c \”uname -a\””‘, pid 7456

Run CLI Commands on NSX Manager Using REST API

August 29, 2019

Over the last few years I’ve had a chance to work with NSX-V REST APIs in many different shapes and forms. Directly from vRealize Orchestrator and PowerShell/PowerNSX, indirectly from vRealize Automation or simply by making calls from Postman, which is sometimes required during NSX deployment and upgrades.

To date I haven’t been able to find any gaps in the API and can say only good things about it. It is very well documented. You can find detailed descriptions of all requests in NSX API Guide PDF or interactively browse it in API explorer on https://code.vmware.com.

But at the end of the day, NSX REST API is only a subset of what you can do from CLI and there are situations where it’s not sufficient. I’ll give you an example. Let’s say you want to know how much storage is available on NSX Manager appliance log partition. There’s a REST API call, which will give you a response similar to this:

GET https://nsxm/api/1.0/appliance-management/system/storageinfo

<storageInfo>
  <totalStorage>86G</totalStorage>
  <usedStorage>22G</usedStorage>
  <freeStorage>64G</freeStorage>
  <usedPercentage>25</usedPercentage>
</storageInfo>

As you can see, it answers the question of how much total space is available on the appliance, but doesn’t provide a full per partition breakdown available from the CLI via “show filesystem”:

Filesystem      Size  Used Avail Use% Mounted on
/dev/root       5.6G  1.2G  4.1G  23% /
tmpfs           7.9G  232K  7.9G   1% /run
devtmpfs        7.9G     0  7.9G   0% /dev
/dev/sda6        44G   19G   24G  44% /common
/dev/loop0       16G   45M   15G   1% /common/vdisk_mnt

So what are the options here? What is not widely known is that you can use NSX central command-line interface to remotely invoke appliance CLI commands using REST API.

Invoking CLI Commands

NSX REST API has a handy POST call https://nsxm/api/1.0/nsx/cli?action=execute. All you need to provide in addition to Authorization credentials using “Basic Auth” option is the following body in XML format:

<nsxcli>
  <command>show filesystem</command>
</nsxcli>

In response you will get a body in “text/plain” format, which is the only drawback of this method. You will need to parse the response in your scripting language of choice. In PowerShell, if you made the original call using Invoke-WebRequest cmdlet and saved it into the $response variable, it can look something like this:

$responseRows = $response.Content -split "`n"
foreach($row in $responseRows) {
  if($row -Like "*/dev/sda6*") {
    $pctUsed = $row.Split(" ",[StringSplitOptions]"RemoveEmptyEntries")[4]
    $pctUsedValue = $pctUsed.Substring(0, $pctUsed.Length-1)
    Write-Host "Space utilization on the log partition is $pctUsed."
    break
  }
}

Conclusion

For most use cases NSX REST API provides all the necessary information about NSX component configuration in structured JSON or XML format. This method is more of an exception rather than a rule, but it’s a nice tool to have in your tool belt, when you run out of options.

vSphere 6.0 REST API: A History Lesson

August 23, 2019

I’m glad to see how VMware products are becoming more and more automation-focused these days. NSX has always had rich REST API capabilities, which I can’t complain about. And vSphere is now starting to catch up. vSphere 6.5 was the first release where REST API started getting much more attention. See these official blog posts for example:

But not many people know that vSphere 6.5 wasn’t the first release where REST API became available. Check this forum thread on VMTN “Does vCenter 6.0 support RESTFUL api?”:

I think its only supported for 6.5 as below blogs has a customer asked the same question and reply is no..

It’s not entirely true, even though I know why the OP got a “No” answer. Let me explain.

vSphere 6.0 REST API

VMware started to make first steps towards REST API starting from 6.0 release. If you have a legacy vSphere 6.0 environment you can play with, you can easily test that by opening the following URL:

https://vcenter/rest/com/vmware/vapi/metadata/cli/command

You will get a long list of commands available in 6.0 release:

It may look impressive, but if you look closely you will quickly notice that they are all Content Library or Tagging related. Quote from the referenced blog post:

VMware vCenter has received some new extensions to its REST based API. In vSphere 6.0, this API set provides the ability to manage the Content Library and Tagging but now also includes the ability to manage and configure the vCenter Server Appliance (VCSA) based functionality and basic VM management.

That is right, in vSphere 6.0 REST API is very limited, you won’t get inventory data, backup or update API. All you can do is manage Content Library and Tagging, which, frankly, is not very practical.

Making REST API Calls

If Content Library and Tagging use cases are applicable to you or you are just feeling adventurous this is an example of how you can make a call to vSphere 6.0 REST API via Postman.

All calls are POST-based and action (get, list, create, etc.) is specified as a parameter, so pay close attention to request format.

First you will need to generate authentication token by making a POST call to https://vcenter/rest/com/vmware/cis/session, using “Basic Auth” for Authorization and you will get a token in response:

Then change Authorization to “No auth” and specify the token in “vmware-api-session-id” header in your next call. In this example I’m getting a list of all content libraries (you will obviously get an empty response if you haven’t actually created one):

Some commands require a body, to determine the body format use the following POST call to https://vcenter/rest/com/vmware/vapi/metadata/cli/command?~action=get, with the following body in JSON format:

{
	"identity": {
        "name": "get",
        "path": "com.vmware.content.library"
	}
}

Where “path” is the operation and “name” is the action from the https://vcenter/rest/com/vmware/vapi/metadata/cli/command call above.

If you’re looking for more detailed information, I found this blog post by Mitch Tulloch very useful:

Conclusion

There you have it. vSphere 6.0 does support REST API, it’s just not very useful, that’s why no one talks about it.

This blog post won’t help you if you are stuck in a stone age and need to manage vSphere 6.0 via REST API, but it at least gives you a definitive answer of whether REST API is supported in vSphere 6.0 and what you can do with it.

If you do find yourself in such situation, I recommend to fall back on PowerCLI, if possible.

Connecting to PostgreSQL Database Backing VMware Products

August 19, 2019

Most of the VMware products these days are standardised on PostgreSQL. Yes, you can still deploy vCenter for Windows, for instance, and use MS SQL or Oracle as a back-end database, but it’s now deprecated and vSphere 6.7 is the last release where it’s supported. Other products, like vRealize Automation are moving in the same direction.

VCSA, vRA, vRO are all distributed as appliances and shouldn’t be modified in any way by the end user. But I’ve had times before when I needed to directly connect to the PostgreSQL database to better understand certain parts of the product. One of the recent examples was encryption in vRO. I needed to ensure that the passwords I save in SecureString attributes (the ones shown as asterisks) in my workflows are not kept as plain text in vRO. So let’s see how I validated this assumption by looking at the vRO database.

vRO Database

I first SSH’ed into the appliance and connected to the database using PostgreSQL interactive terminal:

# psql vmware postgres

I then listed all database table names:

> SELECT * FROM pg_catalog.pg_tables;

When I found the table I was looking for, I listed its contents:

> SELECT * FROM vmo_workflowcontent;

And simply searched for my attribute name in the output, which was encrypted indeed.

Exporting the Database

You won’t always know what table you’re looking for, so the easiest way to go about it is to simply export the whole database in plain text and use search in a text file:

# su -m -c “/opt/vmware/vpostgres/current/bin/pg_dump -Fp vmware > /tmp/vmware.sql” postgres

“-Fp” here is for plain text (default is custom format, which is compressed), “vmware” is the database and “postgres” is the user.

VCSA and vRA Databases

You will find that database names aren’t the same for different products, for instance vCenter’s database name is “VCDB” (capital letters) and vRA is “vcac” (username is also “vcac”). So if you need to connect to VCSA database you will use the following syntax:

# psql VCDB postgres

For vRA it will look like this:

# psql vcac vcac

Then you can use the same approach demonstrated for vRO to read table data or simply export the whole database.

Conclusion

I hope it helps you with your tinkering adventures. Just make sure to use this only for research and not change anything in the database, unless specifically advised by GSS.

NSX Optimistic Locking and PowerNSX

August 3, 2019

Recently, when working on some NSX-V automation, I came across an interesting issue, which I want to discuss here, since there’s almost no information on the Internet (while I’m writing this), that would help to solve it or even point you in the right direction. It has to do with PowerNSX and Optimistic Locking in NSX (which technically is not even a locking mechanism), but let’s start from the beginning.

If you ever used PowerNSX module to automate NSX via PowerShell you noticed that most of the code examples use pipelines to run PowerNSX cmdlets. So instead of using variables, like so:

$Edge = Get-NsxEdge vRA7 _ edge
$LoadBalancer = Get-NsxLoadBalancer -Edge $Edge
Set-NsxLoadBalancer -LoadBalancer $LoadBalancer -enabled
New-NsxLoadBalancerApplicationProfile -LoadBalancer $LoadBalancer -Name $WebAppProfileName -Type $VipProtocol –SslPassthrough

all commands are run this way instead:

Get-NsxEdge vRA7_edge | Get-NsxLoadBalancer | Set-NsxLoadBalancer -enabled
Get-NsxEdge vRA7_dge | Get-NsxLoadBalancer | New-NsxLoadBalancerApplicationProfile -Name $WebAppProfileName -Type $VipProtocol –SslPassthrough

What’s the difference you may ask, besides the fact the the second variant is slower, because you retrieve edge and load balancer objects multiple times, instead of once, compared to the first variant? There’s actually a strong reason for it. More specifically, it is the following error that you gonna get if you don’t use pipelines:

invoke-nsxwebrequest : Invoke-NsxWebRequest : The NSX API response received indicates a failure. 409 : Conflict : Response Body: {“errorCode”:101, “details”:”Concurrent object access error. Refresh UI or fetch the latest copy of the object and retry the operation.”, “rootCauseString”:null, “moduleName”:null, “errorData”:null}

See, NSX uses Optimistic Locking (yes, there’s Pessimistic Locking as well) to handle concurrency. Its purpose is to make sure that if you’re making a change to an object in NSX you are aware of its current state. In the above example, you saved load balancer into a variable, changed the load balancer state to enabled and then tried to create an application profile, supplying load balancer saved in a variable as a parameter to the cmdlet. But the load balancer (and edge) state has changed and you’re basically using an old (stale) version of the object. You either have to retrieve the current state of the object again or avoid this issue all together by simply using pipelines and retrieve the up-to-date version of the object with every call.

Read this article if you want to know more about Optimistic Locking:

If you found this useful, please leave a comment, smash that like button and hit notification bell to not miss new blog posts ever again.