Posts Tagged ‘host’

Requirements for Unmounting a VMware Datastore

December 30, 2015

I have come across issues unmounting VMware datastores myself multiple times. In recent vSphere versions vCenter shows you a warning if some of the requirements are not fulfilled. It is not the case in the older vSphere versions, which makes it harder to identify the issue.

Interestingly, there are some pre-requisites which even vCenter does not prompt you about. I will discuss all of the requirements in this post.

General Requirements

In this category I combine all requirements which vCenter checks against, such as:

Requirement: No virtual machine resides on the datastore.

Action: You have to make sure that the host you are unmounting the datastore from has no virtual machines (running or stopped) registered on this datastore.  If you are unmounting just one datastore from just one host, you can simply vMotion all VMs residing on the datastore from this host to the remaining hosts. If you are unmounting the datastore from all hosts, you’ll have to either Storage vMotion all VMs to the remaining datastores or shutdown the VMs and unregister them from vCenter.

unmount_vmfs2

Requirement: The datastore is not part of a Datastore Cluster.

Requirement: The datastore is not managed by storage DRS.

Action: Drag and drop the datastore from the Datastore Cluster in vCenter to move it out of the Datastore Cluster. Second requirement is redundant, because SDRS is enabled on a datastore which is configured withing a Datastore Cluster. By removing a datastore from a Datastore Cluster you atomatically disable storage DRS on it.

Requirement: Storage I/O control is disabled for this datastore.

Action: Go to the datastore properties and uncheck Storage I/O Control option. On a SIOC-enabled datastore vSphere creates a folder named after the block device ID and keeps a file called “slotsfile” in it. Its size will change to 0.00 KB once SIOC is disabled.

Requirement: The datastore is not used for vSphere HA heartbeat.

Action: vSphere HA automatically selects two VMware datastores, creates .vSphere-HA folders and use them to keep HA heartbeats. If you have more than two datastores in your cluster, you can control which datastores are selected. Go to cluster properties > Datastore Heartbeating (under vSphere HA section) and select preferred datastores from the list. This will work if you are unmounting one datastore. If you need to unmount all datastores, you will have to disable HA on the cluster level altogether.

datastore_heartbeat

Additional Requirements

Requirements which fall in this category are not checked by vCenter, but are still have to be satisfied. Otherwise vCenter will not let you unmount the datastore.

Requirement: The datastore is not used for swap.

Action: When VM is powered on by default it creates a swap file in the VM directory with .vswp extension. You can change the default behavior and on a per host basis select a dedicated datastore where host will be creating swap files for virtual machines. This setting is enabled in cluster properties in Swapfile Location section. The datastore is then selected for each host in Virtual Machine Swapfile Location settings on the the host configuration tab.

What host also does when you enable this option is it creates a host local swap file, which is named something like this: sysSwap-hls-55de2f14-6c5d-4d50-5cdf-000c296fc6a7.swp

There are scenarios where you need to unmount the swap datastore, such as when you say need to reconnect all of your storage from FC to iSCSI. Even if you shutdown all of your VMs, datastore unmount will fail because the host swap files are still there and you will see an error such as this:

The resource ‘Datastore Name: iSCSI1 VMFS uuid: 55de473c-7f3ae2b5-f9f8-000c29ba113a’ is in use.

See the error stack for details on the cause of the problem.

Error Stack:

Call “HostStorageSystem.UnmountVmfsVolume” for object “storageSystem-29” on vCenter Server “VC.lab.local” failed.

Cannot unmount volume ‘Datastore Name: iSCSI1 VMFS uuid: 55de473c-7f3ae2b5-f9f8-000c29ba113a’ because file system is busy. Correct the problem to retry the operation.

The workaround is to change the setting on the cluster level to store VM swap file in VM directory and reboot all hosts. After a reboot the host .swp file will disappear.

If rebooting the hosts is not desirable, you can SSH to each host and type the following command:

# esxcli sched swap system set –hostlocalswap-enabled false

To confirm that the change has taken effect run:

# esxcli sched swap system get

Then check the datastore and the .swp files should no longer be there.

Conclusion

If you satisfy all of the above requirements you should have no problems when unmounting VMware datastores. vSphere creates a few additional system folders on each of the datastores, such as .sdd.sf and .dvsData, but I personally have never had issues with them.

Advertisements

vSphere Dump / Syslog Collector: PowerCLI Script

March 12, 2015

Overview

If you install ESXi hosts on say 2GB flash cards in your blades which are smaller than required 6GB, then you won’t have what’s called persistent storage on your hosts. Both your kernel dumps and logs will be kept on RAM drive and deleted after a reboot. Which is less than ideal.

You can use vSphere Dump Collector and Syslog Collector to redirect them to another host. Usually vCenter machine, if it’s not an appliance.

If you have a bunch of ESXi hosts you’ll have to manually go through each one of them to set the settings, which might be a tedious task. Syslog can be done via Host Profiles, but Enterprise Plus licence is not a very common things across the customers. The simplest way is to use PowerCLI.

Amendments to the scripts

These scripts originate from Mike Laverick’s blog. I didn’t write them. Original blog post is here: Back To Basics: Installing Other Optional vCenter 5.5 Services.

The purpose of my post is to make a few corrections to the original Syslog script, as it has a few mistakes:

First – typo in system.syslog.config.set() statement. It requires additional $null argument before the hostname. If you run it as is you will probably get an error which looks like this.

Message: A specified parameter was not correct.
argument[0];
InnerText: argument[0]

Second – you need to open outgoing syslog ports, otherwise traffic won’t flow. It seems that Dump Collector traffic is enabled by default even though there is no rule for it in the firewall (former netDump rule doesn’t exist anymore). Odd, but that’s how it is. Syslog on the other hand requires explicit rule, which is reflected in the script by network.firewall.ruleset.set() command.

Below are the correct versions of both scripts. If you copy and paste them everything should just work.

vSphere Dump Collector

Foreach ($vmhost in (get-vmhost))
{
$esxcli = Get-EsxCli -vmhost $vmhost
$esxcli.system.coredump.network.get()
}

Foreach ($vmhost in (get-vmhost))
{
$esxcli = Get-EsxCli -vmhost $vmhost
$esxcli.system.coredump.network.set($null, “vmk0”, “10.0.0.1”, “6500”)
$esxcli.system.coredump.network.set($true)
}

vSphere Syslog Collector

Foreach ($vmhost in (get-vmhost))
{
$esxcli = Get-EsxCli -vmhost $vmhost
$esxcli.system.syslog.config.get()
}

Foreach ($vmhost in (get-vmhost))
{
$esxcli = Get-EsxCli -vmhost $vmhost
$esxcli.system.syslog.config.set($null, $null, $null, $null, $null, “udp://10.0.0.1:514”)
$esxcli.network.firewall.ruleset.set($null, $true, “syslog”)
$esxcli.system.syslog.reload()
}

VMware ESXi Core Processes

July 12, 2013

vmware_esxiThere are not much information on VMware ESXi architecture out there. There is an old whitepaper “The Architecture of VMware ESXi” which dates back to 2007. In fact, from the management perspective there are only two OS processes critical two ESXi management. These are: hostd (vmware-hostd in ESXi 4) and vpxa (vmware-vpxa in ESXi 4) which are called “Management Agents”.

hostd process is a communication layer between vmkernel and the outside world. It invokes all management operations on VMs, storage, network, etc by directly talking to the OS kernel. If hostd dies you won’t be able to connect to the host neither from vCenter nor VI client. But you will still have an option to connect directly to the host console via SSH.

vpxa is an intermediate layer between vCenter and hostd and is called “vCenter Server Agent”. If you have communication failure between vCenter and ESXi host, it’s the first thing to blame.

Say you have a storage LUN failure and hostd can’t release the datastore. You can restart hostd or both of the processes using these scripts:

# vmware-hostd restart
# vmware-vpxa restart

Basic UPC compiler installation

October 8, 2012

There were times when I used to work heavily on one UPC-related project. I had several issues with installation of the Berkeley UPC compiler. I don’t want that information to be wasted, so I will share it here with everyone in several posts. I worked with Berkeley UPC versions until 2.14.0. So this post can already be obsolete for you.

Compilation

Berkeley UPC compiler consists of a runtime and a translator (you can use online translator if you want). They are installed separately. I used several flags in configure stage I’d like to explain.

First flag is --without-mpi-cc. UPC supports several underlying transports to exchange messages between threads. The most basic is udp, I worked primarily on ibv (InfiniBand). UPC also installs mpi transport by default. It’s slow and it requires MPI installation, so I never used it and prefered to disable it.

Flag --disable-aligned-segments is ususally a must in Linux environments. There is a security feature which randomizes the virtual address space. This doesn’t allow UPC threads to use the same base memory address on all nodes. It introduces some additional pointer arithmetic in the deference of a UPC pointer-to-shared. So you either disable Linux virtual address space randomization feature or use this flag.

It is stated that UPC can have issues with GCC 4.0.x through 4.2.x as a backened compiler. GCC can misoptimize a shared-local access such that it deterministically read or write an incorrect value. So you cannot install UPC without using the --enable-allow-gcc4 flag. I didn’t have any issues with GCC ever, so you can safely use it.

Post-installation tasks

After installation is completed you need to point UPC runtime to your locally installed translator. Otherwise it will try to use online translator on the Berkeley web-site. Under each UPC build subdirectory (opt, dbg, etc) replace translator directive in etc/upcc.conf to:

translator = /opt/translator-installation-dir/targ

You need to correctly configure NFS and SHH on your nodes, so that they could access and run your application binary files without password. If you use firewall you need to open all necessary ports. For me they were:

111 tcp, udp for portmapper
2049 tcp for nfs
892 tcp, udp for mountd
32803 tcp, 32769 udp for lockd
662 tcp,udp for statd

Since lockd uses dynamic ports, uncomment static port configuration in /etc/sysconfig/nfs:

LOCKD_TCPPORT=32803
LOCKD_UDPPORT=32769
MOUNTD_PORT=892
STATD_PORT=662

SSH is also just a walk in the park:

# su – fred
> ssh-keygen -t rsa
> cp /home/fred/.ssh/id_rsa.pub /home/fred/.ssh/authorized_keys
> chmod /home/fred/.ssh/authorized_keys 600
> chown fred:fred /home/fred/.ssh/authorized_keys

Usage example

> upcc --network=udp source_code.c
> UPC_NODES=”node1 node2 node3 node4″ upcrun -n 32 bin_file

You choose conduit by using --network flag, UPC_NODES environment variable sets hosts which will run the code and -n sets the number of threads.

Possible problems

You can encounter the following error when you run UPC application:

*** FATAL ERROR: Got an xSocket while spawning slave process: connect() failed while creating a connect socket (111:Connection refused)
bash: line 1: 10535 Aborted ‘./a.out’ ‘__AMUDP_SLAVE_PROCESS__’ ‘node1:49655’

This could happen if you use firewall and didn’t uncomment static port configuration for lockd daemon. Each time it uses random port which doesn’t match to what you entered in firewall configuration and fail to communicate.

If you get an error which starts with:

Address node1_ip_address maps to node1, but this does not map back to the address – POSSIBLE BREAK-IN ATTEMPT!
AMUDP sendPacket returning an error code: AM_ERR_RESOURCE (Problem with requested resource)
from function sendPacket
at /root/install/berkeley_upc-2.8.0/gasnet/other/amudp/amudp_reqrep.cpp:99
reason: Invalid argument

or

AMUDP sendPacket returning an error code: AM_ERR_RESOURCE (Problem with requested resource)
from function sendPacket
at /root/install/berkeley_upc-2.8.0/gasnet/other/amudp/amudp_reqrep.cpp:99
reason: Invalid argument

then you have /etc/hosts misconfiguration. Don’t add compute node hostname to 127.0.0.1 line in /etc/hosts. There should be only real address line. /etc/hosts on each node should look something like this:

# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
10.0.0.1 node1
10.0.0.2 node2
10.0.0.3 node3