Archive for the ‘Windows’ Category

Joining ESXi to AD in Disjoint Namespace

November 4, 2019

What is Disjoint Namespace?

Typically, when using Microsoft Active Directory you use AD-integrated DNS and your AD domain name matches you DNS domain name, but you don’t have to. This is quite rare, but I’ve seen cases where the two don’t match. For example, you might have a Linux-based DNS, where you register an esx01.example.com DNS record for your ESXi host and then you join it to an Active Directory domain called corp.local.

That’s called a disjoint namespace. You can read this Microsoft article if you want to know more details: Disjoint Namespace.

In my personal opinion, using a disjoint namespace is asking for trouble, but it will still work if you really want to use it.

Problem

If you end up going down that route, there’s one caveat you should be aware of. When you joining a machine to AD, among other things, it needs to populate DNS name field property of the AD computer object. This is an example of ESXi computer object in Active Directory Users and Computers snap-in:

If you configure example.com domain in your ESXi Default TCP/IP stack, like so:

And then you try to, for example, join your ESXi host to corp.local AD domain, it will attempt to use esx-01a.example.com for computer object DNS name field. If you’re using a domain account with privileges restricted only to domain join, this operation will fail.

This is how the problem manifested itself in my case in ESXi host logs:

Failed to run provider specific request (request code = 8, provider = ‘lsa-activedirectory-provider’) -> error = 40315, symbol = LW_ERROR_LDAP_CONSTRAINT_VIOLATION, client pid = 2099303

If you’re using host profiles to join ESXi host to the domain, remediation will fail and you will see the following in /var/log/syslog.log:

WARNING: Domain join failed; retry count 1.

WARNING: Domain join failed; retry count 2.

Likewise (ActiveDirectory) Domain Join operation failed while joining new domain via username and password..

Note: this problem is specific to joining domain using a restricted service account. If you use domain administrator account, it will force the controller to add the computer object with a DNS name, which doesn’t match the AD name.

Solution

Make sure ESXi domain name setting matches the Active Directory domain name, not DNS domain name. You can still use the esx-01a.example.com record to add the ESXi hosts to vCenter, but you have to specify corp.local domain in DNS settings (or leave it blank), because this is what is going to be used to add the host to AD, like so:

This way your domain controller will be happy and ESXi host will successfully join the domain.

Additional Notes

While troubleshooting this issue I saw a few errors in ESXi host logs, which were a distraction, ignore them, as they don’t constitute an error.

This just means that the ESXi host Active Directory service is running, but host is not joined to a domain yet:

lsass: Failed to run provider specific request (request code = 12, provider = ‘lsa-activedirectory-provider’) -> error = 2692, symbol = NERR_SetupNotJoined, client pid = 2111366

IPC is inter-process communication. Likewise consists of multiple services that talk to each other. They open and close connections, this is normal:

lsass-ipc: (assoc:0x8ed7e40) Dropping: Connection closed by peer

I also found this command to be useful for deeper packet inspection between an ESXi host and AD domain controllers:

tcpdump-uw -i vmk0 not port 22 and not arp

References

Advertisement

Scripted CIFS Shares Migration

March 8, 2018

I don’t usually blog about Windows Server and Microsoft products in general, but the need for file server migration comes up in my work quite frequently, so I thought I’d make a quick post on that topic.

There are many use cases, it can be migration from a NAS storage array to a Windows Server or between an on-premises file server and cloud. Every such migration involves copying data and recreating shares. Doing it manually is almost impossible, unless you have only a handful of shares. If you want to replicate all NTFS and share-level permissions consistently from source to destination, scripting is almost the only way to go.

Copying data

I’m sure there are plenty of tools that can perform this task accurately and efficiently. But if you don’t have any special requirements, such as data at transit encryption, Robocopy is probably the simplest tool to use. It comes with every Windows Server installation and starting from Windows Server 2008 supports multithreading.

Below are the command line options I use:

robocopy \\file_server\source_folder D:\destination_folder /E /ZB /DCOPY:T /COPYALL /R:1 /W:1 /V /TEE /MT:128 /XD “System Volume Information” /LOG:D:\robocopy.log

Most of them are common, but there are a few worth pointing out:

  • /MT – use multithreading, 8 threads per Robocopy process by default. If you’re dealing with lots of small files, this can significantly improve performance.
  • /R:1 and /W:1 – Robocopy doesn’t copy locked files to avoid data inconsistencies. Default behaviour is to keep retrying until the file is unlocked. It’s important for the final data synchronisation, but for data seeding I recommend one retry and one second wait to avoid unnecessary delays.
  • /COPYALL and /DCOPY:T will copy all file and directory attributes, permissions, as well as timestamps.
  • /XD “System Volume Information” is useful if you’re copying an entire volume. If you don’t exclude the System Volume Information folder, you may end up copying deduplication and DFSR data, which in addition to wasting disk space, will break these features on the destination server.

Robocopy is typically scheduled to run at certain times of the day, preferably after hours. You can put it in a batch script and schedule using Windows Scheduler. Just keep in mind that if you specify the job to stop after running for a certain amount of hours, Windows Scheduler will stop only the batch script, but the Robocopy process will keep running. As a workaround, you can schedule another job with the following command to kill all Robocopy processes at a certain time of the day, say 6am in the morning:

taskkill /f /im robocopy.exe

Duplicating shares

For copying CIFS shares I’ve been using “sharedup” utility from EMC’s “CIFS Tools” collection. To get the tool, register a free account on https://support.emc.com. You can do that even if you’re not an EMC customer and don’t own an EMC storage array. From there you will be able to search for and download CIFT Tools.

If your source and destination file servers are completely identical, you can use sharedup to duplicate CIFS shares in one command. But it’s rarely the case. Often you want to exclude some of the shares or change paths if your disk drives or directory structure have changed. Sharedup supports input and output file command line options. You can generate a shares list first, which you can edit and then import shares to the destination file server.

To generate the list of shares first run:

sharedup \\source_server \\destination_server ALL /SD /LU /FO:D:\shares.txt /LOG:D:\sharedup.log

Resulting file will have records similar to this:

#
@Drive:E
:Projects ;Projects ;C:\Projects;
#
@Drive:F
:Home;Home;C:\Home;

Delete shares you don’t want to migrate and update target path from C:\ to where your data actually is. Don’t change “@Drive:E” headers, they specify location of the source share, not destination. Also worth noting that you won’t see permissions listed anywhere in this file. This file lists shares and share paths only, permissions are checked and copied at runtime.

Once you’re happy with the list, use the following command to import shares to the destination file server:

sharedup \\source_server \\destination_server ALL /R /SD /LU /FI:D:\shares.txt /LOG:D:\sharedup.log

For server local users and groups, sharedup will check if they exist on destination. So if you run into an error similar to the following, make sure to first create those groups on the destination file server:

10:13:07 : WARNING : The local groups “WinRMRemoteWMIUsers__” and “source_server_WinRMRemoteWMIUsers__” do not exist on the \\destination_server server !
10:13:09 : WARNING : Please use lgdup utility to duplicate the missing local user(s) or group(s) from \\source_server to \\destination_server.
10:13:09 : WARNING : Unable to initialize the Security Descriptor translator

Conclusion

I created this post as a personal howto note, but I’d love to hear if it’s helped anyone else. Or if you have better tool suggestions to accomplish this task, please let me know!

DFS Replication Troubleshooting

June 25, 2013

conceptual 3d rendered image of arrow isolated on whiteDFS Replication service doesn’t give you much information on how it’s replicating. It’s good to know some general commands to troubleshoot communication and data transfer issues.

Useful Commands

In Windows Server 2008 a new command was introduced to check what DFSR is doing at the moment. You won’t find it in Windows Server 2003:

> dfsrdiag replicationstate

If replication link isn’t feeling well you get lots of files in the backlog. To check if you have a backlog, run:

> dfsrdiag backlog /rgname:rgroup_name /rfname:folder_name /sendingmember:sending_server /receivingmember:receiving_server

If there are heaps of files in the backlog the best way to find the reason for it is to simply check the logs. DFSR logs are located in C:\Windows\debug. To get the most verbose information change the log severity level:

> wmic /namespace:\\root\microsoftdfs path dfsrmachineconfig set debuglogseverity=5

DFSR uses GUIDs to identify the replicated files, which look like: AC759213-00AF-4578-9C6E-EA0764FDC9AC. To get the meaningful data from the GUID use:

> dfsrdiag guid2name /guid:guid_identifier /rgname:group_name

There is one more command which allows you to find the exact path to the file in question. You should feed the uid field from the DFSR debug log to this command, which looks like {9EBE0A27-8AA9-4263-B942-DA9A92F30671}-v240880:

> wmic.exe /namespace:\\root\microsoftdfs path dfsridrecordinfo.Uid=”uid_identifier” call getfullfilepath

Sample Errors

1. When replicating between Windows Server 2008 R2 and Windows Server 2003 R2. On the source: “Ghosting is not enabled”. On the destination: “A failure was reported by the remote partner”.

I solved this error by applying the following patch: KB2462352. The reason for the issue is incompatibilities between protocol implementations.

2. The following error pops up in logs: “The system cannot find the file specified”.

Solution is described in KB951010. In Windows Server 2003 ConflictAndDeleted folder sometimes fills up above the 660MB quota and ConflictAndDeletedManifest.xml file may get corrupted. To solve the problem you need to cleanup the folder and delete the file by issuing:

> wmic /namespace:\\root\microsoftdfs path dfsrreplicatedfolderinfo where “replicatedfolderguid='<GUID>'” call cleanupconflictdirectory

To get the GUIDs of replicated folders run:

> wmic /namespace:\\root\microsoftdfs path dfsrreplicatedfolderconfig get replicatedfolderguid,replicatedfoldername

3. Near 100% CPU usage and the same error is written millions of times in the log files: “Failed to create stage file for GVSN gvsn_identitifer”.

I solved this issue by looking for the file specified by gvsn_identifier, which looks like {2ED37126-12C7-4617-AE6B-34509F467FEB}-v20748 and deleting it. These are files that are located in the staging folder.

Other Hepful Tools

You can create a Health Report from the DFS Management Console to see how many files have been transfered between replication members since the DFS service start. And if there are any DFS errors in the members’ event logs.

You can also use DFSRMon tool. But I personally don’t find it very useful.

Windows MPIO with IBM storage

September 17, 2012

IBM mid-range storage systems (like DS3950) work in active/passive mode. It means that access to each LUN is given through one controller, in constrast to active/active storage where data between host and two controllers can flow in round-robin fashion. So redundant path here is used only as a failover. Software which provides this failover functionality is called Multipath I/O (MPIO) and has implementations for all operating systems. I’ll desribe how to configure MPIO version for Windows.

Installation

Prior to Windows Server 2008, Microsoft didn’t have its own MPIO implementation and MPIO was distributed with IBM DS Storage Manager product. Now you can install MPIO from “Feautures” sub-menu of Windows Server 2008 Server Manager. After installation is complete you will find MPIO configuration options under Control Panel and in Administrative Tools.

IBM storage works well with default Windows MPIO implementation, however it’s recommended to install IBM MPIO (device-specific module) from Storage Manager installation bundle. In my case MPIO installation file was called SMIA-WSX64-01.03.0305.0608.

Enable multipathing

Initially you will see two hard drives for each LUN in Device Manager. You can enable MPIO for particular hardware ID (in other words, storage system) on Discover Multi-Paths tab of MPIO control panel. You can’t do that with LUN granularity. After you add selected devices and reboot, you will see them on “MPIO Devices” tab. Now each LUN will be seen as a single hard drive in Device Manager.

Configure preferred path

MPIO supports several load-balancing policies, which are configured on a LUN basis from MPIO tab of a hard drive in Device Manager. As a Load Balance Policy select Fail Over Only. Then for each path select which is Active/Optimized and which is a Standby path. Also make active path Preferred, so that after failover it failbacks to it.

Don’t be confused by iSCSI on the figure. It’s the same for pure FC. It’s just for reference.

Check configuration

When you configure active and passive paths you assume that first path listed is to controller A and second path is to controller B. But, in fact, there is no indication of that from the configuration page and you can neither confirm nor deny it. The only ID you see is adapter ports but they don’t even map to the actual ports on HBAs.

To be able to check your configuration you need to install IBM SMdevices utility which comes with IBM DS Storage Manager. Run DS SM installation and go for Custom Installation. There you need to check only the Utilities part. In SMdevices output you can see which path is preferred for this LUN and if it’s configured as active (In Use):

C:\Program Files\IBM_DS\util>SMdevices
IBM System Storage DS Storage Manager Devices
. . .
\\.\PHYSICALDRIVE1 [Storage Subsystem ITSO5300, Logical
Drive 1, LUN 0, Logical Drive ID
<600a0b80002904de000005a246d82e17>, Preferred Path
(Controller-A): In Use]

References

The best reference I found on that topic is IBM Midrange System Storage Hardware Guide (SG24-7676-01), from p.453: DS5000 logical drive representation in Windows Server 2008. As well as Installing and Configuring MPIO guide from Microsoft.

Permanently map network drive in Windows

May 2, 2012

Have you ever run into an issue when after mapping a network drive and saving login/password you end up with disconnected drive after a reboot? To overcome this problem use command line with the following switches to net use routine:

net use w: \\server\share /savecred /persistent:yes

Then enter your username and password and that seems to be it.

But I had a problem when network drive doesn’t map with error: “Invalid username/password”. Even though they are correct. If you run into a similar problem include username and password into the command like this:

net use w: \\server\share password /savecred /persistent:yes /user:username

How to reset Active Directory DRSM password

March 29, 2012

Login to the domain controller, run ntdsutil from the command line. Then enter:

set dsrm password
reset password on server null

After that you will be asked for the new password.

GFS backup scheme in Symantec Backup Exec

March 23, 2012

Grandfather-Father-Son is an industry standard backup scheme, where you have 5 daily backups, 5 weekly backups and as many monthly as you need. Symantec Backup Exec has prebuilt policy for GFS, but before going into configuring backup scheme itself, lets talk a little bit about general backup job configuration in Backup Exec.

Basic Terminology

Inside user interface you see Jobs, Policies, Selection Lists and Media Sets. First of all you need to create Selection List, which describes what you want to backup. There you select files and folders from your Windows, Unix or NDMP servers. Then you create Media Set, which is a collection of tapes with particular append and retention periods. Append period specifies how long data can be added to the same tape and retention period tells for how long data cannot be overwritten. Retention period starts form the time of last append to the tape. Then you create Policy. Policy, by means of templates, defines when backup jobs are run, where backups are stored and what is the type of backup – incremental, differential or full. One policy can consist of several templates. In template you specify backup date and time, as well as target tape library.

GFS Implementation

Backup Exec has a template for GFS backup rotation scheme. Click “New policy using wizard”, choose GFS scheme and then select schedule, target backup device and media sets for daily, weekly and monthly backups. By default Backup Exec suggests the following configuration.

Three tape media sets:

  • Daily Media Set – 1 week overwrite, 1 week append
  • Weekly Media Set – 5 weeks overwrite, 5 weeks append
  • Monthly Media Set – 1 year overwrite, 1 year append

Policy with three templates:

  • Daily Backup – Monday to Friday, Incremental
  • Weekly Backup – every Friday, Full
  • Monthly Backup – first Saturday of each month, Full

Also Backup Exec automatically creates rules to resolve conflicts. For example when both Daily and Weekly backups try to run on Friday, jobs do not conflict, because weekly backups always supersede daily. Same for monthly.

I personally prefer another schedule. First of all, if you run your jobs after midnight, you will need to shift your schedules from Mon – Fri to Tue – Sat. Additionally, I run monthly backup on the first Saturday of the month. Backup Exec by default (taking into consideration my one day shift) would suggest first Sunday for the monthly backup. However, it doesn’t make much sense to have weekly on Saturday and then monthly next day on Sunday. You would just consume more space without any benefit. Also, you can schedule monthly on the last Saturday of the month, but if the last day is Thursday, for example, then you will loose four business days from your monthly backup.

After the policy is created, you need to create backup jobs using this policy by clicking on New jobs using policy. All three jobs will be created automatically according to Selection List, as well as Policy Schedule, Target, and Backup Type parameters.

I’d also recommend everyone to configure notifications. There are general Alerts properties as well as inside each job.

Highly available Windows network infrastructure

February 27, 2012

When number of computers in company starts to grow, IT services become critical for company operation, every IT department starts to think how to make their network infrastructure highly available. If it’s a Windows environment, then the first step is usually an additional domain controller. Bringing second DC up and running is rather simple. The only thing you need to do is to run dcpromo and follow the instructions given by the wizard. Then make additional DC a Global Catalog, so that it will serve authentication requests, by going to Active Directory Sites and Services and in NTDS settings on General tab check Global Catalog option. Windows File Replication Services (FRS) will do the rest.

However, it’s usually not enough. Computers rely on DNS service to resolve servers names and in case of primary DC failure your network will be paralyzed. Dcpromo don’t automatically install and configure additional DNS server. You need to do that manually. Moreover, if you use DHCP service to provide network settings to client computers and it’s located on the same server you will also have major issues. The problem here is that you can’t have two active DHCP servers giving out same addresses. But this problem also have its solution.

In case of DNS you should go to Add or Remove Windows Components and find DNS in Networking Services. Install it as AD integrated. Then on the primary DNS, for all your forward and reverse lookup zones, in properties add secondary DNS IP on Name Servers tab. After that DNS will automatically replicate all data. Don’t also forget to add your secondary DNS to DHCP configuration, otherwise clients won’t know about it.

When it comes to DHCP you have an option to use so called 80/20 rule to divide scope between DHCP servers (if you work on Windows server 2008 platform you can build HA DHCP cluster). Simply configure your first DHCP server to lease first 80% of network IP addresses and leave 20% to the second DHCP server. Then in case of first server failure most of computers will already have their IP addresses and you will still have 20% to distribute. In my case network is quite small and I split scope in 50/50. Just make equal configurations for two servers (reservations, exclusions, scope options, etc), but configure scope to have non-overlapping ranges. Then if you use 80/20 rule, you want your primary server to lease IP address in normal circumstances. If both servers will lease addresses with equal rights then you will quickly run out of addresses on 20% server and in case of primary server failure you won’t have enough addresses to lease. To solve that, tweak Conflict detection attempts option.

Basically, this is it. Of course, you will still have many points of failure, like network switch, UPS, etc. But this topic goes beyond this post.

Old share mapping upon reboot

February 25, 2012

When you change share mapping in users AD logon scripts, for example if you move share to another server, after a reboot users can still see an old path in their drive mappings. Usually it happens if you forget to unmount drive before mounting. If share was already mounted before it won’t remount to a new one. To get rid of that, always add something like net use N: /d /y before mounting line.

However, sometimes it screws up. In case you steel see an old share go to HKEY_CURRENT_USER\ Software\ Microsoft\ Windows\ CurrentVersion\ Explorer\ MountPoints2 and remove cached record for this share. It will look something like ##SERVERNAME#SHARENAME. After that you will hopefully have your share automatically mounted to the correct path.