Posts Tagged ‘domain’

Joining ESXi to AD in Disjoint Namespace

November 4, 2019

What is Disjoint Namespace?

Typically, when using Microsoft Active Directory you use AD-integrated DNS and your AD domain name matches you DNS domain name, but you don’t have to. This is quite rare, but I’ve seen cases where the two don’t match. For example, you might have a Linux-based DNS, where you register an esx01.example.com DNS record for your ESXi host and then you join it to an Active Directory domain called corp.local.

That’s called a disjoint namespace. You can read this Microsoft article if you want to know more details: Disjoint Namespace.

In my personal opinion, using a disjoint namespace is asking for trouble, but it will still work if you really want to use it.

Problem

If you end up going down that route, there’s one caveat you should be aware of. When you joining a machine to AD, among other things, it needs to populate DNS name field property of the AD computer object. This is an example of ESXi computer object in Active Directory Users and Computers snap-in:

If you configure example.com domain in your ESXi Default TCP/IP stack, like so:

And then you try to, for example, join your ESXi host to corp.local AD domain, it will attempt to use esx-01a.example.com for computer object DNS name field. If you’re using a domain account with privileges restricted only to domain join, this operation will fail.

This is how the problem manifested itself in my case in ESXi host logs:

Failed to run provider specific request (request code = 8, provider = ‘lsa-activedirectory-provider’) -> error = 40315, symbol = LW_ERROR_LDAP_CONSTRAINT_VIOLATION, client pid = 2099303

If you’re using host profiles to join ESXi host to the domain, remediation will fail and you will see the following in /var/log/syslog.log:

WARNING: Domain join failed; retry count 1.

WARNING: Domain join failed; retry count 2.

Likewise (ActiveDirectory) Domain Join operation failed while joining new domain via username and password..

Note: this problem is specific to joining domain using a restricted service account. If you use domain administrator account, it will force the controller to add the computer object with a DNS name, which doesn’t match the AD name.

Solution

Make sure ESXi domain name setting matches the Active Directory domain name, not DNS domain name. You can still use the esx-01a.example.com record to add the ESXi hosts to vCenter, but you have to specify corp.local domain in DNS settings (or leave it blank), because this is what is going to be used to add the host to AD, like so:

This way your domain controller will be happy and ESXi host will successfully join the domain.

Additional Notes

While troubleshooting this issue I saw a few errors in ESXi host logs, which were a distraction, ignore them, as they don’t constitute an error.

This just means that the ESXi host Active Directory service is running, but host is not joined to a domain yet:

lsass: Failed to run provider specific request (request code = 12, provider = ‘lsa-activedirectory-provider’) -> error = 2692, symbol = NERR_SetupNotJoined, client pid = 2111366

IPC is inter-process communication. Likewise consists of multiple services that talk to each other. They open and close connections, this is normal:

lsass-ipc: (assoc:0x8ed7e40) Dropping: Connection closed by peer

I also found this command to be useful for deeper packet inspection between an ESXi host and AD domain controllers:

tcpdump-uw -i vmk0 not port 22 and not arp

References

Issue Joining VNX1 and VNX2 Unisphere Domains

March 21, 2016

no_SSLThe main benefit of using Unisphere Domains is that they give you ability to manage all of your VNXs by connecting to just one array. If you have an old Clariion you’ll have to use a so called Multi-Domain. VNX1 and VNX2 arrays can join a single domain.

Recently I’ve encountered an issue where this didn’t work quite well. When joining VNX1 to a VNX2 I got the following error:

CIMOM Can’t get the VNX hardware class from – ip 172.10.10.10 – Error Connecting SSL. Error details: A system call error (errno=10057).

join_error

As it turned out EMC disabled SSL 3.0 support in recent Block OE versions. As a result it’s broken Unisphere Domain connectivity with arrays running Flare 32 Patch 209 or older, which still use SSL 3.0.

Solution is to upgrade Block OE to a version higher than Flare 32 Patch 209 where SSL 3.0 is disabled. Or as a workaround you can connect arrays in a Multi-Domain. To find out how, read one of my earlier blog posts: How to Configure VNX Unisphere Domains

How to Configure VNX Unisphere Domains

March 7, 2016

unisphere_domainVNX storage arrays have a concept of Unisphere Domains which let you manage multiple arrays from one Unisphere GUI. To manage two or more arrays from a single pain of glass you need to join their storage domains. There are typically two scenarios:

  1. Joining arrays of the same generation, such as VNX to VNX or Clariion to Clariion (VNX1 and VNX2 are considered as one generation)
  2. Joining arrays of different generations, such as Clariion to VNX

Same generation arrays

When joining same generation arrays to a single domain you get the benefit of having consistent domain-wide settings across all arrays in the domain, such as: DNS, NTP, LDAP and Global Users. If you go to Unisphere Home screen and click on the Domain button you will find where all domain-wide settings are configured. Once they are set up these settings propagate to all systems within the domain.

domain_settings

There’s also a concept of Domain Master, which keeps and distributes domain-wide settings. Domain Master can be changed manually if you wish to do so by using the Select Domain Master wizard.

To add a new system to the same domain simply click on Add/Remove Systems and follow the wizard.

Different generation arrays

It’s very uncommon to see a Clariion these days, but if you still have one and want to have a single management interface across both your Clariion and VNX arrays you have to use Multi-Domains. You won’t get the benefit of having the same domain-wide settings, but if you have just 2 or 3 arrays it’s not really that hard to set them up manually.

To add a new domain to Multi-Domain configuration click on Manage Multi-Domain Configurations, specify VNX Service Processor IP and assign a name. System will be added to the list of Selected Domains.

add_vnx

Always add another domain to Multi-Domain configuration on a system which is running the highest release of Unisphere within the Multi-Domain, otherwise you’ll get the following error:

This version of user interface software does not support the management server software versions on the provided system.

add_vnx_error

Once the system is added you will see both arrays in the systems list and will be able to manage both from one Unisphere interface. For the sake of demonstration I used two VNX arrays in the screenshot below. But the same process applies to Clarrion arrays.

two_arrays

Local and global users

Unisphere has two types of user accounts – local and global. Local account can manage the system you have connected to and global account can manage all systems within the same domain.

By default, when array is being installed, global security is initialized and one global user is created. There are no local user accounts on the system by default, which is fine, because each array is created as a member of its own local domain.

In a Multi-Domain configuration you need to make sure you’re logging in to Unisphere using an account, which exists in every domain being managed. Otherwise, each time you log in to Unisphere you will have to manually login to the remote domain on the domain tab, which is quite annoying.

If you have different accounts on each of the arrays, make sure to make them consistent across all systems.

domain_login

Conclusion

In this post in a few simple steps we went through the Unisphere single domain and multi-domain configurations. If you want to know more details about Unisphere Domain management refer to EMC white paper “Domain Management with EMC Unisphere for VNX“.

Troubleshooting Cisco UCS LDAP

December 4, 2015

If you ever configured LDAP integration on a blade chassis or a storage array, you know that troubleshooting authentication is painful on these things. It will accept all your configuration settings and if you’ve made a mistake somewhere all you get when you try to log in is “Authentication Error” message with no clue of what the actual error is.

Committing configuration changes

There three common places where you can make a mistake when setting up LDAP authentication on UCS. Number one is committing configuration changes to the Fabric Interconnects in UCS Manager.

There are four configuration options which you need to set to enable Active Directory authentication to your domain:

  • LDAP Providers – these are your domain controllers
  • LDAP Provider Groups – are used to group multiple domain controllers of the same domain
  • LDAP Group Maps – where you give permissions to your AD groups and users
  • Authentication Domains – final configuration step where you enable authentication via the domain

Now if you decide to delete a LDAP Provider Group which is configured under an Authentication Domain in attempt to change the settings, this may become an issue.

What is confusing here is UCS Manager will let you delete the LDAP Provider Group, save the changes and LDAP Provider Group will disappear from the list. And you may legitimately conclude that it’s deleted from UCS, but it’s actually not. This is what you’ll see in UCS Manager logs:

[FSM:STAGE:STALE-FAIL]: external aaa server configuration to primary(FSM-STAGE:sam:dme:AaaEpUpdateEp:SetEpLocal)
[FSM:STAGE:REMOTE-ERROR]: Result: resource-unavailable Code: ERR-ep-set-error Message: Re-ordering/Deletion of Providers cannot be applied while ldap is used for yourdomain.com(Domain) authentication(sam:dme:AaaEpUpdateEp:SetEpLocal)

The record will stay on the UCS and you may encounter very confusing issues where you change your LDAP Provider settings but changes are not reflected on UCS. So make sure to delete the object from the higher level entity first.

Distinguished Name typos

There are two ways to group Active Directory entities on a domain controller – Security Groups and Organizational Units. When configuring your AD bind account in LDAP Providers section and setting up permissions in LDAP Group Maps, make sure to not confuse the two. The best advice I can give – always use ADSI Edit tool to find the exact DN. Why? As an example let’s say you want to give permissions to the builtin administrator group and you use the following DN:

CN=Administrators,OU=Builtin,DC=yourdomain,DC=com

This won’t work, because even though Builtin container may look like a OU, it’s actually a CN in AD, as well as Users and Computers containers.

adsi_edit.JPG

ADSI Edit will give you the exact Distinguished Name. Make sure to use it to save yourself the hassle.

Group Authorization settings

Last but not least are the following two LDAP Provider configuration settings:

  • Group Authorization – whether UCS searches within groups when authenticating
  • Group Recursion – whether UCS searches groups recursively

If you add an AD group which the user is a part of in LDAP Group Maps and do not enable Group Authorization, UCS simply won’t search within the group. Enable this option unless you give permissions only on a per user basis.

Second option enables recursive search within AD groups. If you have nested groups in AD (which most people have) enable recursive search or UCS won’t look deeper than 1 level.

If you get really stuck

If you’ve set all the settings up and are certain they the are correct, but authentication still doesn’t work, then there is a relatively easy way to localize the issue.

First step is to check whether UCS can bind to your LDAP Providers and authenticate users. Pick a user (LDAP Group Maps don’t matter at this point), SSH to a Fabric Interconnect and type the following:

ucs # connect nxos
ucs(nxos)# test aaa server ldap yourdc.yourdomain.com john password123

yourdc.yourdomain.com – is the domain controller you’ve configured in LDAP Providers section. If authentication doesn’t work, then the issue is in LDAP Provider settings.

If you can authenticate, then the next step is to make sure that UCS searches through the right AD groups. To check that you will need to enable LDAP authentication logging on a Fabric Interconnect:

ucs # connect nxos
ucs(nxos)# debug ldap aaa-request-lowlevel

Now try to authenticate and look through the list of groups which UCS is searching through. If you can’t see the group which your user is a part of, then you most likely using a wrong DN in LDAP Group Maps.

In my case the settings are configured correctly and I can see that UCS is searching in the Builtin Administrators group:

2015 Dec 1 14:12:19.581737 ldap: value: CN=Enterprise Admins,CN=Users,DC=yourdomain,DC=com
2015 Dec 1 14:12:19.581747 ldap: ldap_add_to_groups: Discarding. group map not configured for CN=Enterprise Admins,CN=Users,DC=yourdomain,DC=com
2015 Dec 1 14:12:19.581756 ldap: value: CN=Administrators,CN=Builtin,DC=yourdomain,DC=com
2015 Dec 1 14:12:19.581767 ldap: ldap_add_to_groups: successfully added group:CN=Administrators,CN=Builtin,DC=yourdomain,DC=com
2015 Dec 1 14:12:19.581777 ldap: value: CN=Exchange Organization Administrators,OU=Microsoft Exchange Security Groups,DC=yourdomain,DC=com

Make sure to disable logging when you’re done:

ucs(nxos)# undebug all

References:

Highly available Windows network infrastructure

February 27, 2012

When number of computers in company starts to grow, IT services become critical for company operation, every IT department starts to think how to make their network infrastructure highly available. If it’s a Windows environment, then the first step is usually an additional domain controller. Bringing second DC up and running is rather simple. The only thing you need to do is to run dcpromo and follow the instructions given by the wizard. Then make additional DC a Global Catalog, so that it will serve authentication requests, by going to Active Directory Sites and Services and in NTDS settings on General tab check Global Catalog option. Windows File Replication Services (FRS) will do the rest.

However, it’s usually not enough. Computers rely on DNS service to resolve servers names and in case of primary DC failure your network will be paralyzed. Dcpromo don’t automatically install and configure additional DNS server. You need to do that manually. Moreover, if you use DHCP service to provide network settings to client computers and it’s located on the same server you will also have major issues. The problem here is that you can’t have two active DHCP servers giving out same addresses. But this problem also have its solution.

In case of DNS you should go to Add or Remove Windows Components and find DNS in Networking Services. Install it as AD integrated. Then on the primary DNS, for all your forward and reverse lookup zones, in properties add secondary DNS IP on Name Servers tab. After that DNS will automatically replicate all data. Don’t also forget to add your secondary DNS to DHCP configuration, otherwise clients won’t know about it.

When it comes to DHCP you have an option to use so called 80/20 rule to divide scope between DHCP servers (if you work on Windows server 2008 platform you can build HA DHCP cluster). Simply configure your first DHCP server to lease first 80% of network IP addresses and leave 20% to the second DHCP server. Then in case of first server failure most of computers will already have their IP addresses and you will still have 20% to distribute. In my case network is quite small and I split scope in 50/50. Just make equal configurations for two servers (reservations, exclusions, scope options, etc), but configure scope to have non-overlapping ranges. Then if you use 80/20 rule, you want your primary server to lease IP address in normal circumstances. If both servers will lease addresses with equal rights then you will quickly run out of addresses on 20% server and in case of primary server failure you won’t have enough addresses to lease. To solve that, tweak Conflict detection attempts option.

Basically, this is it. Of course, you will still have many points of failure, like network switch, UPS, etc. But this topic goes beyond this post.

DB2 fails to start after promoting to DC

February 24, 2012

Our backup database server is now also an additional domain controller. After DC promotion DB2 failed to start with error:

No mapping between account names and security IDs was done.

It’s an expected behavior, since server removes all local users groups during promotion, including DB2ADMNS and DB2USERS. These groups are used for extended security and in case it’s enabled (which is default) you will experience these kinds of problems. If you don’t change these groups before promotion then you won’t be able to use db2extsec to change them gracefully after promotion because database just won’t start and all CLI commands won’t work.

To solve this problem you need to disable extended security by changing DB2_EXTSECURITY registry variable to NO in HKLM\ SOFTWARE\ IBM\ DB2\ GLOBAL_PROFILE and HKLM\ SOFTWARE\ IBM\ DB2\ InstalledCopies\ DB2COPY1\ GLOBAL_PROFILE. Then create DB2ADMNS and DB2USERS active directory groups and point to them using:

db2extsec -u mydom\db2users -a mydom\db2admns

Bear in mind that using domain groups for extended security is supported starting from DB2 version 9 Fix pack 2. If you’re using an older version then you will have to disable this feature.

Postfixadmin. Removing old mailboxes.

September 21, 2011

Our organization mail server hosts several domains with more than 750 mailboxes. We use commercial antispam and antivirus software. License value is calculated from the number of mailboxes. Therefore it’s important to keep mailbox database actual and remove unused accounts.

Mail server setup is based upon postfix mail server daemon and Postfixadmin web-based administration interface which uses MySQL to store configuration data. I wrote simple shell script to find and delete mailboxes which were not used for more than 1 year:

#!/bin/sh

YEAR=31536000
MYSQL="/usr/bin/mysql -D postfix -u postfix -ppassword -e"
DOMAIN=your.domain

cd /var/spool/vmail/$DOMAIN
for i in *; do
    MDATE=`stat --format=%y $i/cur`
    MUTDATE=`stat --format=%Y $i/cur`
    CTIME=`date +%s`
    if [ `expr $CTIME - $MUTDATE` -ge $YEAR ]; then
        echo "Removing $i@$DOMAIN mailbox. Last pop3 delivery was on $MDATE"
        $MYSQL "DELETE FROM alias WHERE address = '$i@$DOMAIN'"
        $MYSQL "DELETE FROM alias WHERE goto = '$i@$DOMAIN'"
        $MYSQL "DELETE FROM mailbox WHERE username = '$i@$DOMAIN'"
        /bin/rm -rf $i
    fi
done

Date comparison is pain in bash scripting. Idea here was to use Unix time to avoid separate comparison of year, month and day. Thank stat can output time in different formats. YEAR is 1 year in seconds.

Each time user receives his mail from mail server letters are moved from new to cur dir. After mail has been received postfix deletes letters from cur. This script determine if user hasn’t been receiving mail for more than 1 year by checking modification time of cur dir of his mailbox. Script removes metadata from MySQL database and actual maildir of old box.

Using this script I found and removed more than 250 abandoned mailboxes.