Posts Tagged ‘linux’

Troubleshooting vSphere Guest Operations API

October 4, 2019

What is vSphere Guest Operations

Recently I’ve been heavily utilizing vSphere Guest Operations API for automating vCenter patching. vSphere Guest Operations (GuestOps) is an API, which allows you to run commands on a virtual machine without needing to connect to it over the network. All you need is credentials to the vCenter managing the virtual machine and to the virtual machine itself.

GuestOps can be called by using an Invoke-VMScript PowerCLI cmdlet in the following format:

> Invoke-VMScript -ScriptText “uname -a” -vm vc01 -GuestUser root -GuestPassword VMware1!

Cmdlet will talk to the vCenter, vCenter will talk to ESXi host, ESXi host will talk to VMware Tools and, eventually, VMware Tools will run the command on the Guest OS.

It worked well for me when I was running commands on VCSA 6.0 VM (managed by another vCenter), but after patching and upgrading this VM to VCSA 6.7 I encountered the following error:

Error occured while executing script on guest OS in VM ‘vc01’. Could not locate “Powershell” script interpreter in any of the expected locations. Probably you do not have enough permissions to execute command within guest.

It’s obvious from the error message that cmdlet is doing something wrong, since it’s supposed to use bash in Linux, not PowerShell.

Enable Debugging in VMware Tools

To better understand what was going on, I logged in to VCSA via SSH and enabled VMware Tools debugging (see KB1007873 for instructions on how to do that) and restarted Open VM Tools:

# systemctl restart vmtoolsd.service

After running the Invoke-VMScript cmdlet again, this is what I noticed in vmsvc.log debug log:

[vix] VixTools_StartProgram: User: root args: progamPath: ‘cmd.exe’, arguments: ‘/C powershell -NonInteractive -EncodedCommand cABvAHcAZQByAHMAaABl…

So it wasn’t just a misleading PowerCLI error message, Invoke-VMScript was actually trying to call a PowerShell command using Windows command interpreter on a Linux VM.

Solution

My guess is that since VMware has changed underlying operating system on VCSA from SUSE Linux to Photon OS, Invoke-VMScript can no longer properly identify the underlying OS and defaults to Windows.

Simple solution to this problem is to give a helping hand to Invoke-VMScript cmdlet and specify interpreter using -ScriptType Bash parameter. This is what a proper resulting debug log message will look like:

[vix] VixToolsStartProgramImpl: started ‘”/bin/bash” -c “bash > /tmp/vmware-root/powerclivmware159 2>&1 -c \”uname -a\””‘, pid 7456

Basic UPC compiler installation

October 8, 2012

There were times when I used to work heavily on one UPC-related project. I had several issues with installation of the Berkeley UPC compiler. I don’t want that information to be wasted, so I will share it here with everyone in several posts. I worked with Berkeley UPC versions until 2.14.0. So this post can already be obsolete for you.

Compilation

Berkeley UPC compiler consists of a runtime and a translator (you can use online translator if you want). They are installed separately. I used several flags in configure stage I’d like to explain.

First flag is --without-mpi-cc. UPC supports several underlying transports to exchange messages between threads. The most basic is udp, I worked primarily on ibv (InfiniBand). UPC also installs mpi transport by default. It’s slow and it requires MPI installation, so I never used it and prefered to disable it.

Flag --disable-aligned-segments is ususally a must in Linux environments. There is a security feature which randomizes the virtual address space. This doesn’t allow UPC threads to use the same base memory address on all nodes. It introduces some additional pointer arithmetic in the deference of a UPC pointer-to-shared. So you either disable Linux virtual address space randomization feature or use this flag.

It is stated that UPC can have issues with GCC 4.0.x through 4.2.x as a backened compiler. GCC can misoptimize a shared-local access such that it deterministically read or write an incorrect value. So you cannot install UPC without using the --enable-allow-gcc4 flag. I didn’t have any issues with GCC ever, so you can safely use it.

Post-installation tasks

After installation is completed you need to point UPC runtime to your locally installed translator. Otherwise it will try to use online translator on the Berkeley web-site. Under each UPC build subdirectory (opt, dbg, etc) replace translator directive in etc/upcc.conf to:

translator = /opt/translator-installation-dir/targ

You need to correctly configure NFS and SHH on your nodes, so that they could access and run your application binary files without password. If you use firewall you need to open all necessary ports. For me they were:

111 tcp, udp for portmapper
2049 tcp for nfs
892 tcp, udp for mountd
32803 tcp, 32769 udp for lockd
662 tcp,udp for statd

Since lockd uses dynamic ports, uncomment static port configuration in /etc/sysconfig/nfs:

LOCKD_TCPPORT=32803
LOCKD_UDPPORT=32769
MOUNTD_PORT=892
STATD_PORT=662

SSH is also just a walk in the park:

# su – fred
> ssh-keygen -t rsa
> cp /home/fred/.ssh/id_rsa.pub /home/fred/.ssh/authorized_keys
> chmod /home/fred/.ssh/authorized_keys 600
> chown fred:fred /home/fred/.ssh/authorized_keys

Usage example

> upcc --network=udp source_code.c
> UPC_NODES=”node1 node2 node3 node4″ upcrun -n 32 bin_file

You choose conduit by using --network flag, UPC_NODES environment variable sets hosts which will run the code and -n sets the number of threads.

Possible problems

You can encounter the following error when you run UPC application:

*** FATAL ERROR: Got an xSocket while spawning slave process: connect() failed while creating a connect socket (111:Connection refused)
bash: line 1: 10535 Aborted ‘./a.out’ ‘__AMUDP_SLAVE_PROCESS__’ ‘node1:49655’

This could happen if you use firewall and didn’t uncomment static port configuration for lockd daemon. Each time it uses random port which doesn’t match to what you entered in firewall configuration and fail to communicate.

If you get an error which starts with:

Address node1_ip_address maps to node1, but this does not map back to the address – POSSIBLE BREAK-IN ATTEMPT!
AMUDP sendPacket returning an error code: AM_ERR_RESOURCE (Problem with requested resource)
from function sendPacket
at /root/install/berkeley_upc-2.8.0/gasnet/other/amudp/amudp_reqrep.cpp:99
reason: Invalid argument

or

AMUDP sendPacket returning an error code: AM_ERR_RESOURCE (Problem with requested resource)
from function sendPacket
at /root/install/berkeley_upc-2.8.0/gasnet/other/amudp/amudp_reqrep.cpp:99
reason: Invalid argument

then you have /etc/hosts misconfiguration. Don’t add compute node hostname to 127.0.0.1 line in /etc/hosts. There should be only real address line. /etc/hosts on each node should look something like this:

# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
10.0.0.1 node1
10.0.0.2 node2
10.0.0.3 node3

Migrating physical Linux host to the VMware ESXi

May 2, 2012

Well, perhaps the easiest way to accomplish that is using VMware Converter from the start. I believe there is a Linux version. However, I took another route. I already had an Acronis backup image. So my solution was to use this image as a source, which I fed into Windows version of VMware Converter, which in its turn converted it to VMware format and created VMware virtual machine on ESXi server automatically.

Using this simple procedure you can get a working system. Not in my case. Original OS used a software RAID of two hard drives. So I had to boot from a live CD. Then I changed fstab and GRUB’s menu.lst and set /dev/sda1 (root volume) instead of /dev/md0 and /dev/sda2 (swap) in place of /dev/md1. Additionally, I had to reinject GRUB’s boot files:

grub-install –root-directory=/media/sda1/boot /dev/sda

Then, if it’s SUSE you will have to change “resume” switch in GRUB’s boot menu line to /dev/null. Then after you boot into the system, recreate swap partition and point to it in “resume” switch. If you won’t do that, you will end up with the following error during boot process:

Kernel panic – not syncing: I/O error reading memory image

One tricky issue I had in all this story was related to kernel. As I’ve already mentioned original operating system worked on top of software RAID. And its initrd image won’t detect ordinary virtual SCSI hard drive during boot. So I had to boot from the SUSE installation CD and install standard kernel on top of original system. It solved the issue. Additionally I had to choose Russian language as primary during kernel installation, otherwise I ended up with unreadable symbols inside the system. But it’s not necessary for majority of cases.

I hope my experience will be helpful for other sysadmins.

Present NetApp iSCSI LUN to Linux host

March 7, 2012

Consider the following scenario (which is in fact a real case). You have a High Performance Computing (HPC) cluster where users usually generate hellova research data. Local hard drives on a frontend node are almost always insufficient. There are two options. First is presenting a NFS share both to frontend and all compute nodes. Since usually compute nodes  connect only to private network for communication with the frontend and don’t have public ip addresses it means a lot of reconfiguration. Not to mention possible security implications.

The simpler solution here is to use iSCSI.  Unlike NFS, which requires direct communication, with iSCSI you can mount a LUN to the frontend and then compute nodes will work with it as ordinary NFS share through the private network. This implies configuration of iSCSI LUN on a NetApp filer and bringing up iSCSI initiator in Linux.

iSCSI configuration consists of several steps. First of all you need to create FlexVol volume where you LUN will reside and then create a LUN inside of it. Second step is creation of initiator group which will enable connectivity between NetApp and a particular host.  And as a last step you will need to map the LUN to the initiator group. It will let the Linux host to see this LUN. In case you disabled iSCSI, don’t forget to enable it on a required interface.

vol create scratch aggrname 1024g
lun create -s 1024g -t linux /vol/scratch/lun0
igroup create -i -t linux hpc
igroup add hpc linux_host_iqn
lun map /vol/scratch/lun0 hpc
iscsi interface enable if_name

Linux host configuration is simple. Install iscsi-initiator-utils packet and add it to init on startup. iSCSI IQN which OS uses for connection to iSCSI targets is read from /etc/iscsi/initiatorname.iscsi upon startup. After iSCSI initiator is up and running you need to initiate discovery process, and if everything goes fine you will see a new hard drive in the system (I had to reboot). Then you just create a partition, make a file system and mount it.

iscsiadm -m discovery -t sendtargets -p nas_ip
fdisk /dev/sdc
mke2fs -j /dev/sdc1
mount /dev/sdc1 /state/partition1/home

I use it for the home directories in ROCKS cluster suite. ROCKS automatically export /home through NFS to compute nodes, which in their turn mount it via autofs. If you intend to use this volume for other purposes, then you will need to configure you custom NFS export.

Dovecot failing regularly

February 22, 2012

I ran into an issue when Dovecot fails when ntpd moves time backwards on a Linux server. Here is the message which appears in logs:

dovecot: Fatal: Time just moved backwards by 15 seconds. This might cause a lot of problems, so I’ll just kill myself now. http://wiki.dovecot.org/TimeMovedBackwards

Possible solution here is to add -x flag to ntpd daemon run line. I did that using /etc/sysconfig/ntp file. Now each time ntpd will change time it will do it smoothly.

The root cause of the problem could be an onboard lithium battery. But I hope it’ll solve the problem without taking chassis cover off.

Installing Symantec Backup Exec Agent for Linux

October 7, 2011

Symantec Backup Exec Linux/Unix agent is called RALUS which stands for Remote Agent for Linux and Unix Servers. I obtained my RALUS installation from official Symantec CDs. If you don’t have them you probably can download them from Symantec web site. Here is the sequence:

  1. Mount CD or iso image to your Linux host.
  2. Run ./installralus script and follow instructions. I use defaults. The only thing you should enter is Media Server IP address. Installation script add itself to rc*.d levels automatically.
  3. After installations is completed create backup user, add it to beoper group and set its password: # useradd backup -c “User for Symantec Backup Exec”;  # usermod -G beoper backup; # passwd backup.
  4. Start BE agent manually for the first time: # /etc/init.d/VRTSralus.init start

That’s it. Now you can see your server under Linux/Unix Servers section when creating backup job.

Add #1: If agent doesn’t start and you get an error with libstdc++.so.5 missing in /var/VRTSralus/beremote.service.log then install compat-libstdc++-33.

Add #2: If you have active firewall then you need to open additional ports. For me it was tcp 10000-10200. It’s 10000 plus port range you can find on media server in Tools->Options->Network and Security tab. For CentOS firewall rule would be:

-A RH-Firewall-1-INPUT -m tcp -p tcp -s media_server_ip –dport 10000:10200 -j ACCEPT

Add #3: In case you also write firewall rules to OUTPUT chain then open output tcp 10000:

-A RH-Firewall-1-OUTPUT -m tcp -p tcp -d media_server_ip –dport 10000 -j ACCEPT

If you don’t have RH-Firewall-1-OUTPUT add also:

:RH-Firewall-1-OUTPUT – [0:0]
-A OUTPUT -j RH-Firewall-1-OUTPUT

I leave possibility of me being wrong, but SBE documentation says:

Symantec recommends having port 10000 open and available on the Backup Exec media
server as well as on the remote systems.

Additional connections from the media server to the Remote Agent will be initiated on any available port.

I understand that as both agent and media server may connect to each other’s 10000 port and additional 10001:10200 connections are initiated from medias server.

VMware Tools update issue

September 20, 2011

Recently I decided to update VMware Tools on VMs because most of them showed Out of date in VI client. For some reason several Linux VMs didn’t update even though VI client showed no error. I tried to update from inside VM by running /usr/sbin/vmware-tools-upgrade and it showed that there is not enough space in /tmp. I enlarged /tmp from 128 to 512MB and update went fine this time.

Take into account that:

  1. Windows VM will most likely be rebooted after update.
  2. In Linux VMmware Tools may not start automatically. If it’s the case start it manually by calling /etc/init.d/vmware-tools start.
  3. Network interfaces in Linux may go down after VMware Tools update. Boot them manually.

 

Enable XDMCP under CentOS 5

September 20, 2011

XDMCP is a handy tool for graphical login into Linux from Windows workstation. I use Xming for that. But before you could login you have to do some extra configurations.

First thing to do is enabling XDMCP in /usr/share/gdm/defaults.conf. Add following line under [xdmcp] section:

Enable=true

And second is opening firewall ports. XDMCP works through UDP 177, TCP 6000-6005 and TCP 7100. I did that from graphical interface. If you don’t have access to graphics then edit /etc/sysconfig/iptables.

Don’t forget to restart X server.

SuSE autostart

September 20, 2011

Sometimes when you are too lazy to write init.d script or startup sequence is complex and already described in another hand-written script as in my case for IBM WebSphere, it’s handy to put it in system autostart script. Each OS has its own location for this file. It’s /etc/rc.d/rc.local for Slackware and CentOS. This is just a quick post to remember location for SuSE, it’s/etc/init.d/boot.local.