Posts Tagged ‘solution’

HCX Perftest Issue

May 9, 2021

Introduction

VMware HCX is a great tool, which simplifies VM migrations between on-prem to on-prem or on-prem to cloud at scale. I’ve worked with many different VM migration tools before and what I particularly like about HCX is it’s ability to stretch network subnets between source and destination environments. It reduces (or completely removes) the need to re-IP VMs, which simplifies the migration and reduces the risk of inadvertently introducing issues into migrated applications.

Perftest Tool

HCX is a complex set of technologies and getting initial deployment right is key to building a reliable migration fabric. Perftest is a CLI tool available on interconnect (IX) and network extension (NE) HCX appliances, which allows you to perform validation testing to ensure everything is functioning correctly, as well as provide you a performance baseline. To run this tool you will need to SSH into HCX Manager, enter CCLI and then go to one of your IX or NE appliances:

# ccli
# list
# go 0
# perftest all

Issue Description

There is one issue you can come across, when running perftest, where it partially completes with the following errors:

Message Error: map[string]interface {}{“grpc_code”:14, “http_code”:503, “http_status”:”Service Unavailable”, “message”:”rpc error: code = Unavailable desc = transport is closing”}

and

Internal failure happens. Err: http.Post(https://appliance_ip:9443/perftest/stoptest) return statusCode: 503

Solution

The reason for this error is blocked connectivity on port TCP/4500. HCX uses ports UDP/4500 and UDP/500 for establishing tunnels between IX and NE appliance pairs, but that’s not enough for perftest.

In the very beginning of the perftest it gives you a hint, but it’s easy to overlook. This requirement is not well documented (at least at the time of writing), so keep that in mind next time you deploy HCX.

Advertisement

Host Profile Customization Issue

November 1, 2019

vSphere Host Profiles is a great feature for consistent ESXi host configuration and compliance checks, but can at times be flaky.

I’ve noticed an issue recently with Host Profiles in vSphere 6.7, where after providing host customization values the following error is shown in vSphere Web Client:

The “Update host customizations” operation failed for the entity with the following error message.

Host settings validation failed.

This is how the error message looks like in the client:

Even though it’s a bit annoying, I found it to be a furphy. Customizations are actually saved successfully and the error can be ignored. You can find the following messages in ESXi host’s /var/log/syslog.log file, which confirm that it works:


INFO: Execute completed
INFO: Validating AnswerFile Status1 = success
INFO: Cleaned up Host Configuration
INFO: GetAnswerFile completed

I’ve also found that this error doesn’t appear when you provide host customization values first time straight after attaching a profile to the host. Only when you update them. It also doesn’t show up in HTML5, only Web Client. I guess, one more reason to switch to HTML5.

Hope this blog post helps someone who searched in Google, but couldn’t find any information related to this error message.

Unable to Delete vCenter Endpoint in vRealize Automation

December 7, 2018

vRealize Automation Error

More than once in my experience I’ve had a need to delete an endpoint in vRealize Automation. Maybe configuration has changed or you simply made a typo in vCenter hostname or credentials. Once you’ve specified vCenter address and saved the endpoint you can no longer change it (only delete and re-add).

But even when you try to delete it, you will get an error something along the lines of:

You cannot delete this endpoint because 1 compute resources and 0 storage paths use it.

CloudClient Error

There is a KB article that walks you through the process of how to do that using a special tool called CloudClient: Error “This endpoint is being used by # compute resources and # storage paths and cannot be deleted” when you attempt to delete an endpoint in vRA 7.x (2150548)

But even that approach not always work. When you run this command from the KB article “vra computeresource inactive list” you may get the following error:

Error: Something went wrong while processing your request. Please check the application logs for details.

Solution

There is almost no mention of this second error on the Internet and I can see how someone can keep banging his head trying to solve it, so I thought I’d share a solution here. And it’s simple – open a GSS ticket. They can delete the endpoint for you. If you see this error, there’s no other way that I know of to solve this problem without involving GSS.

Clean-up

You can see an error similar to the following in vRA logs if you didn’t stop proxy agents before deleting the endpoint:

Error processing ping response
Error occurred while executing stored proc usp_InsertUpdateHost The INSERT statement conflicted with the FOREIGN KEY constraint “FK_ManagementEndpoint_Host”. The conflict occurred in database “vRa_IaaS”, table “dbo.ManagementEndpoints”, column ‘ManagementEndpointID’.
The statement has been terminated.
Inner Exception: The INSERT statement conflicted with the FOREIGN KEY constraint “FK_ManagementEndpoint_Host”. The conflict occurred in database “vRa_IaaS”, table “dbo.ManagementEndpoints”, column ‘ManagementEndpointID’.
The statement has been terminated.

All you need to do to get rid of it is restart your proxy agents.

Conclusion

Hope this post saves someone the hassle of hours searching for the answer in blogs and forums.

Error When Deploying VCSA or PSC

October 31, 2017

Recently when helping a customer to deploy a new greenfield VMware 6.5 environment I ran into an issue where brand new vCenter Server Appliance and Platform Service Controller 6.5 build 5973321 fail to deploy to an ESXi host build 5969303.

Stage 1 (install) of the deployment completes successfully. In Stage 2 (setup) VCSA installer both for vCenter and PSC first shows a prompt asking for credentials.

PSC Issue Description

After providing credentials, when installing an external PSC, installation fails with the following error:

Error:
Unable to connect to vCenter Single Sign-On: Failed to connect to SSO; uri:https://psc-hostname/sts/STSService/vsphere.local
Failed to register vAPI Endpoint Service with CM
Failed to configure vAPI Endpoint Service at the firstboot time

Resolution:
Please file a bug against VAPI

Installation wizard shows the following resulting error:

Failure:
A problem occurred during setup. Refresh this page and try again.

A problem occurred during setup. Services might not be working as expected.

A problem occurred while – Starting VMware vAPI Endpoint…

Appliance shows the following error in console:

Failed to start services. Firstboot Error.

Alternatively PSC can fail with the following error:

Error:
Unexpected failure: }
Failed to register vAPI Endpoint Service with CM
Failed to configure vAPI Endpoint Service at the firstboot time

Resolution:
Please file a bug against VAPI

VCSA Issue Description

After providing credentials, when installing vCenter with embedded PSC, installation fails with the following error:

Error:
Unable to start the Service Control Agent.

Resolution:
Search for these symptoms in the VMware knowledge base for any known issues and possible workarounds. If none can be found, collect a support bundle and open a support request.

Installation wizard shows the following resulting error:

Failure:
A problem occurred during setup. Refresh this page and try again.

A problem occurred during setup. Services might not be working as expected.

A problem occurred while – Starting VMware Service Control Agent…

Appliance shows the same error in console.

Alternatively VCSA can fail with the following error:

Error:
Encountered an internal error. Traceback (most recent call last): File “/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py”, line 1852, in main vmidentityFB.boot() File “/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py”, line 359, in boot self.checkSTS(self.__stsRetryCount, self.__stsRetryInterval) File “/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py”, line 1406, in checkSTS raise Exception(‘Failed to initialize Secure Token Server.’) Exception: Failed to initialize Secure Token Server.

Resolution:
This is an unrecoverable error, please retry install. If you run into this error again, please collect a support bundle and open a support request.

Issue Workaround

This issue happens when VCSA or PSC installation was cancelled and is attempted for the second time to the same ESXi host.

Identified workaround for this issue is to use another ESXi host, which has never been used to deploy PSC or VCSA to.

Issue Resolution

VMware is aware of the bug and working on the resolution.

Dell Repository Manager: Bootable ISO Issues

May 23, 2016

problem_solutionIn one of my previous posts I described the process of upgrading a Dell FX2 chassis firmware using Dell Repository Manager (DRM).

In an ideal world you just follow the process and in an hour or two you can get your chassis upgraded. You may sometimes run into issues. I want to go through some of them in this post, including possible remediation.

Issue Description

When exporting firmware to a bootable ISO you can find DRM not being able to download some of the bundle components with the following error in the Job Result:

Processing failed:
Failed downloading files:
Diagnostics_Application_PWMC8_LN64_OSC_1.1_A00.BIN

And errors in the Log:


60. 24/03/2016 5:58:50 PM Export to Bootable ISO : Downloaded 34 / 56
61. 24/03/2016 5:59:44 PM Export to Bootable ISO : Error downloading some files
62. 24/03/2016 5:59:45 PM Export to Bootable ISO : Failed exporting to Bootable ISO.

Workaround #1: Skip the Component

You can try the following option “Continue download irrespective of any error (in the selected components)” in the export dialog. It won’t help to get the component downloaded, but you will got a bootable ISO.

However, DRM will still keep the failed component in the bundle and try to install it during the upgrade, which will obviously fail (update 16/56):

failed_update

Once the upgrade is finished you will get the following error at the end:

Note: Some update requires machine reboot. Please reboot to CD/DVD to continue for the failed update because of the dependency…

upgrade_status

No matter how many times you reboot you will obviously get the same errors. You can ignore it if you 100% sure this is what causes the upgrade to fail or use Workaround #2.

Workaround #2: Create Custom ISO

When you create a repository in DRM it’s populated with pre-built components and bundles. But you can create custom repositories. The idea is that you can exclude the failed component from the repository by creating it manually.

Assuming you already have the base repository configured, do the following:

  • Open the existing repository and click on the Components tab
  • Deselect the failed component in the component list (in my case it was Diagnostics_Application_PWMC8_LN64_OSC_1.1_A00.BIN)
  • Click on the “Copy To” button:

custom_components

  • In the opened dialogue select “Create NEW Repository and copy component(s) into it”
  • Follow the wizard and when you click finish, components will be copied to the newly created repository
  • Open the new repository and click on the Componenets tab
  • Select all components and click on the “Copy to” button once again
  • This time select “Create a NEW Bundle in the same repository and add component(s) into it”
  • On the next screen give the bundle a name and make sure to choose “Linux 32-bit and 64-bit” in the OS Type

custom_bundle

As a result you should get a new bundle created which you can export to a bootable ISO using the same process.

Workaround #3: Use Server Update Utility

If none of the above helps you can fall back to a proven upgrade approach and use Server Update Utility (SUU). SUU is a huge 12GB ISO to download, but you can use Dell Download Manager, which supports resuming interrupted downloads. Make sure to disable proxy! Dell Download Manager does not support resuming an interrupted download if you’re using a proxy server.

SUU is not a bootable ISO. Previously you had to use Dell Systems Build and Update Utility (SBUU) to boot from it first and then mount the ISO to proceed with the upgrade. Starting with Dell 11G servers you don’t need it anymore and can upgrade firmware straight form Dell Lifecycle Controller (LC).

You’ll need to boot into the Lifecycle Controller and choose Firmware Update > Launch Firmware Update > Local Drive(CD or DVD or USB). Mount the SUU ISO and the rest is fairly straightforward. LC will upgrade the firmware and reboot the blade.

lc_upgrade

Conclusion

Dell Repository Manager is the recommended approach to upgrade firmware on Dell hardware. Unlike SUU, DRM downloads the latest updates and only the necessary components. It is also capable of making a bootable ISO.

If you have issues, rely on Server Update Utility as it’s bulletproof and always work. But be prepared to download a 12GB ISO image and make sure you have an option to bypass proxy.