Posts Tagged ‘communication’

VMware ESXi Core Processes

July 12, 2013

vmware_esxiThere are not much information on VMware ESXi architecture out there. There is an old whitepaper “The Architecture of VMware ESXi” which dates back to 2007. In fact, from the management perspective there are only two OS processes critical two ESXi management. These are: hostd (vmware-hostd in ESXi 4) and vpxa (vmware-vpxa in ESXi 4) which are called “Management Agents”.

hostd process is a communication layer between vmkernel and the outside world. It invokes all management operations on VMs, storage, network, etc by directly talking to the OS kernel. If hostd dies you won’t be able to connect to the host neither from vCenter nor VI client. But you will still have an option to connect directly to the host console via SSH.

vpxa is an intermediate layer between vCenter and hostd and is called “vCenter Server Agent”. If you have communication failure between vCenter and ESXi host, it’s the first thing to blame.

Say you have a storage LUN failure and hostd can’t release the datastore. You can restart hostd or both of the processes using these scripts:

# vmware-hostd restart
# vmware-vpxa restart

Advertisement

DFS Replication Troubleshooting

June 25, 2013

conceptual 3d rendered image of arrow isolated on whiteDFS Replication service doesn’t give you much information on how it’s replicating. It’s good to know some general commands to troubleshoot communication and data transfer issues.

Useful Commands

In Windows Server 2008 a new command was introduced to check what DFSR is doing at the moment. You won’t find it in Windows Server 2003:

> dfsrdiag replicationstate

If replication link isn’t feeling well you get lots of files in the backlog. To check if you have a backlog, run:

> dfsrdiag backlog /rgname:rgroup_name /rfname:folder_name /sendingmember:sending_server /receivingmember:receiving_server

If there are heaps of files in the backlog the best way to find the reason for it is to simply check the logs. DFSR logs are located in C:\Windows\debug. To get the most verbose information change the log severity level:

> wmic /namespace:\\root\microsoftdfs path dfsrmachineconfig set debuglogseverity=5

DFSR uses GUIDs to identify the replicated files, which look like: AC759213-00AF-4578-9C6E-EA0764FDC9AC. To get the meaningful data from the GUID use:

> dfsrdiag guid2name /guid:guid_identifier /rgname:group_name

There is one more command which allows you to find the exact path to the file in question. You should feed the uid field from the DFSR debug log to this command, which looks like {9EBE0A27-8AA9-4263-B942-DA9A92F30671}-v240880:

> wmic.exe /namespace:\\root\microsoftdfs path dfsridrecordinfo.Uid=”uid_identifier” call getfullfilepath

Sample Errors

1. When replicating between Windows Server 2008 R2 and Windows Server 2003 R2. On the source: “Ghosting is not enabled”. On the destination: “A failure was reported by the remote partner”.

I solved this error by applying the following patch: KB2462352. The reason for the issue is incompatibilities between protocol implementations.

2. The following error pops up in logs: “The system cannot find the file specified”.

Solution is described in KB951010. In Windows Server 2003 ConflictAndDeleted folder sometimes fills up above the 660MB quota and ConflictAndDeletedManifest.xml file may get corrupted. To solve the problem you need to cleanup the folder and delete the file by issuing:

> wmic /namespace:\\root\microsoftdfs path dfsrreplicatedfolderinfo where “replicatedfolderguid='<GUID>'” call cleanupconflictdirectory

To get the GUIDs of replicated folders run:

> wmic /namespace:\\root\microsoftdfs path dfsrreplicatedfolderconfig get replicatedfolderguid,replicatedfoldername

3. Near 100% CPU usage and the same error is written millions of times in the log files: “Failed to create stage file for GVSN gvsn_identitifer”.

I solved this issue by looking for the file specified by gvsn_identifier, which looks like {2ED37126-12C7-4617-AE6B-34509F467FEB}-v20748 and deleting it. These are files that are located in the staging folder.

Other Hepful Tools

You can create a Health Report from the DFS Management Console to see how many files have been transfered between replication members since the DFS service start. And if there are any DFS errors in the members’ event logs.

You can also use DFSRMon tool. But I personally don’t find it very useful.