DFS Replication Troubleshooting

June 25, 2013

conceptual 3d rendered image of arrow isolated on whiteDFS Replication service doesn’t give you much information on how it’s replicating. It’s good to know some general commands to troubleshoot communication and data transfer issues.

Useful Commands

In Windows Server 2008 a new command was introduced to check what DFSR is doing at the moment. You won’t find it in Windows Server 2003:

> dfsrdiag replicationstate

If replication link isn’t feeling well you get lots of files in the backlog. To check if you have a backlog, run:

> dfsrdiag backlog /rgname:rgroup_name /rfname:folder_name /sendingmember:sending_server /receivingmember:receiving_server

If there are heaps of files in the backlog the best way to find the reason for it is to simply check the logs. DFSR logs are located in C:\Windows\debug. To get the most verbose information change the log severity level:

> wmic /namespace:\\root\microsoftdfs path dfsrmachineconfig set debuglogseverity=5

DFSR uses GUIDs to identify the replicated files, which look like: AC759213-00AF-4578-9C6E-EA0764FDC9AC. To get the meaningful data from the GUID use:

> dfsrdiag guid2name /guid:guid_identifier /rgname:group_name

There is one more command which allows you to find the exact path to the file in question. You should feed the uid field from the DFSR debug log to this command, which looks like {9EBE0A27-8AA9-4263-B942-DA9A92F30671}-v240880:

> wmic.exe /namespace:\\root\microsoftdfs path dfsridrecordinfo.Uid=”uid_identifier” call getfullfilepath

Sample Errors

1. When replicating between Windows Server 2008 R2 and Windows Server 2003 R2. On the source: “Ghosting is not enabled”. On the destination: “A failure was reported by the remote partner”.

I solved this error by applying the following patch: KB2462352. The reason for the issue is incompatibilities between protocol implementations.

2. The following error pops up in logs: “The system cannot find the file specified”.

Solution is described in KB951010. In Windows Server 2003 ConflictAndDeleted folder sometimes fills up above the 660MB quota and ConflictAndDeletedManifest.xml file may get corrupted. To solve the problem you need to cleanup the folder and delete the file by issuing:

> wmic /namespace:\\root\microsoftdfs path dfsrreplicatedfolderinfo where “replicatedfolderguid='<GUID>'” call cleanupconflictdirectory

To get the GUIDs of replicated folders run:

> wmic /namespace:\\root\microsoftdfs path dfsrreplicatedfolderconfig get replicatedfolderguid,replicatedfoldername

3. Near 100% CPU usage and the same error is written millions of times in the log files: “Failed to create stage file for GVSN gvsn_identitifer”.

I solved this issue by looking for the file specified by gvsn_identifier, which looks like {2ED37126-12C7-4617-AE6B-34509F467FEB}-v20748 and deleting it. These are files that are located in the staging folder.

Other Hepful Tools

You can create a Health Report from the DFS Management Console to see how many files have been transfered between replication members since the DFS service start. And if there are any DFS errors in the members’ event logs.

You can also use DFSRMon tool. But I personally don’t find it very useful.


Configuring remote access to AIX

May 16, 2012

I work on an old AIX 5.1:

# oslevel -r

By default it has only telnet preinstalled. Which works out of the box without additional configuration. However, there are several recommended steps to do.


Firstly check if you have stable network connection. I had problems connecting to AIX box after connection timeout. It seemed that telnet session somehow hang on the OS side and didn’t allow me to reconnect. To prevent that, you have two options. If you use PuTTY then go to Settings->Connection and set amount of seconds between keepalive packets to say 60 seconds. And PuTTY will maintain connection automatically. Another workaround is to edit TMOUT variable in /etc/profile. By default AIX uses ksh shell which uses this parameter to detect idle sessions. If set this variable to 120, then after two minutes ksh will throw a warning that session will be closed in 60 seconds. This means that if your telnet session breaks, ksh will automatically terminate its shell. (I checked that and it turned out that TMOUT doesn’t help here.)

TCP Wrapper

By default telnet access in AIX is opened for everyone. It’s not what you want for sure. AIX has built-in firewall (called AIX TCP/IP Filters) but it’s rather cumbersome to use it just to restrict telnet access. I’d prefer TCP Wrapper, which is standard for Linux, but optional for AIX. You can get AIX LPP package from Bull AIX freeware site here: Then simply:

chmod +x tcp_wrappers-

Extract package contents by running the executable. Then run smit from directory where you extracted files and go to Software Installation and Maintenance -> Install and Update Software ->  Install Software. Set current directory in “INPUT device / directory for software”. You can view software available, if you press F4 in “SOFTWARE to install” field. Change “ACCEPT new license agreements?” to yes and press Enter.

When package is installed, edit /etc/inetd.conf. Find telnet line and change it:

#telnet stream tcp6 nowait root /usr/sbin/telnetd telnetd -a
telnet stream tcp6 nowait root /usr/local/bin/tcpd telnetd -a

And restart inetd service:

stopsrc -s inetd && startsrc -s inetd

Now to limit telnet access create /etc/hosts.allow:


and /etc/hosts.deny:


Secure Shell

Telnet is completely outdated and insecure protocol. So you’d probably prefer ssh on the server side. I believe SSH is bundled with AIX 5.1, but I simply downloaded it from Bull site. Additionally to OpenSSH package you will have to setup OpenSSL prerequisite. Here are the links:

Install OpenSSL simply by:

rpm -i openssl-0.9.7l-1.aix5.1.ppc.rpm

In case of OpenSSH you will need to gunzip it, untar it and setup using smit. But if you work on AIX with old maintenance level (ML3 in my case) you can run into the following error when running ssh service:

getnameinfo failed: Invalid argument

You can see it if you run sshd with -D and -d flags. Solution here is to download AIX 5.1 ML9 and POSTML9 fixes from IBM Fix Central, extract them and setup in Software Installation and Maintenance -> Install and Update Software ->  Update Installed Software to Latest Level (Update All).

SSH is a standalone service, so you do not need to edit /etc/inetd.conf. Just add new sshd line to /etc/hosts.allow and you are good to go. However, if your ssh was built without wrapper support, then you have a problem. You can check that by calling:

# dump -H /usr/sbin/sshd


                        ***Loader Section***
                      Loader Header Information
VERSION#         #SYMtableENT     #RELOCent        LENidSTR
0x00000001       0x00000115       0x00000601       0x00000096

#IMPfilID        OFFidSTR         LENstrTBL        OFFstrTBL
0x00000006       0x00006224       0x0000075a       0x000062ba

                        ***Import File Strings***
INDEX  PATH                          BASE                MEMBER
0      /usr/lib:/lib:/opt/freeware/lib
1                                    libc.a              shr.o
2                                    libpthreads.a       shr_comm.o
3                                    libpthreads.a       shr_xpg5.o
4                                    libcrypto.a
5                                    libz.a    

If there is no libwrap.a, then the only option you have is to run sshd under tcpd which is run by inetd. To accomplish that add the first line into /etc/services and second into /etc/inetd.conf:

ssh 22/tcp
ssh stream tcp6 nowait root /usr/local/bin/tcpd sshd -i

Switch ‘-i’ tells sshd to generate smaller keys. Otherwise you will wait significant amount of time for login prompts. Also don’t forget to remove sshd startup and shutdown scripts from /etc/rc.d/rc2.d.

Time Synchronization with AD

October 19, 2011

I had weird issue with Active Directory service where there were no Windows Time service at all. w32time.dll and w32tm.exe files were in place. So I just registered w32time.dll by:

w32tm /register

After that Windows Time service appeared under ‘services’. Start service by:

net start w32time

Then to check that time service work on DC run from any client:

w32tm /stripchart /computer:dc_ip /samples:5 /dataonly

To synchronize with DC right away on client run:

w32tm /resync

In case client synchronizes with outside NTP instead of DC edit HKEY_LOCAL_MACHINE\ SYSTEM\ CurrentControlSet\ Services\ W32Time\ Parameters. If it has ‘NTP’ in ‘Type’ parameter then change it to ‘NT5DS’ and run:

w32tm /config /update