Posts Tagged ‘Troubleshooting’

Reminder: Disable Firewall on NSX ECMP Edge

October 15, 2019

ECMP and Stateful Services

It’s not new, this topic has already been discussed many times before, examples are here, here, here and here. When NSX Edges are configured in ECMP mode, none of the stateful services like VPN, NAT or Load Balancing are supported.

From NSX Design Guide:

In ECMP mode, only routing service is available. Stateful services cannot be supported due to asymmetric routing inherent in ECMP-based forwarding.

Even if you didn’t read documentation, but have networking skills, you’d know that protocols like NAT need to track network session state and even if you configure the same NAT rule on all of your ECMP-enabled edges, it won’t work, because due to ECMP, traffic can flow through one ESG on ingress and another ESG on egress. Since NAT tables are not synchronized, ESGs won’t be able to find the corresponding network flow in translation table and will drop the traffic.

ECMP and Firewall

But there’s another issue that doesn’t always come across or simply get forgotten about. You can deploy ESGs in ECMP mode, not configure any of the stateful services like VPN, NAT or LB, but still get network communication issues. Why? Because when you deploy an ESG, you always end up with firewall in enabled state. Firewall is also considered a stateful service.

From VVD 5.1 documentation:

SDDC-VISDN-032: For all ESGs deployed as ECMP North-South routers, disable the firewall. Use of ECMP on the ESGs is a requirement. Leaving the firewall enabled, even in allow all traffic mode, results in sporadic network connectivity. Services such as NAT and load balancing cannot be used when the firewall is disabled.

In fact, firewall is what actually tracks sessions and drops packets that don’t match existing network flows, not NAT itself. That’s also the reason why services like NAT and LB don’t work without firewall being enabled.

It often throws people off, because even having no rules in the firewall and setting default policy to accept will not prevent this issue from happening.

Demo

Here is a quick demonstration. I’m trying to establish an SSH session to a VM connected to a DLR behind two ESGs in ECMP mode.

I’m showing packet debug on both ESGs using the following command:

> debug packet display follow interface vNic_1 port_22

As you can see ingress traffic goes through E1 and egress traffic goes through E2:

E1: Packet Capture

E2: Packet Capture

Since session originated on E1, E2 interprets packets as invalid and immediately drops them:

From NSX Troubleshooting Guide:

Check for an incrementing value of a DROP invalid rule in the POST_ROUTING section of the show firewall command. Typical reasons include:

  • Asymmetric routing issues

Conclusion

It’s easy to end up in this situation, because firewall is enabled by default on a newly deployed ESG. And it’s hard to troubleshoot this issue, since it’s not quite obvious what’s actually going on unless you’ve already worked with ECMP before. So the best advice in this case is just to remember, if you want to use ECMP in NSX, make sure to disable firewall on ECMP-enabled ESGs. Use distributed firewall (DFW) instead.

Troubleshooting Cisco UCS LDAP

December 4, 2015

If you ever configured LDAP integration on a blade chassis or a storage array, you know that troubleshooting authentication is painful on these things. It will accept all your configuration settings and if you’ve made a mistake somewhere all you get when you try to log in is “Authentication Error” message with no clue of what the actual error is.

Committing configuration changes

There three common places where you can make a mistake when setting up LDAP authentication on UCS. Number one is committing configuration changes to the Fabric Interconnects in UCS Manager.

There are four configuration options which you need to set to enable Active Directory authentication to your domain:

  • LDAP Providers – these are your domain controllers
  • LDAP Provider Groups – are used to group multiple domain controllers of the same domain
  • LDAP Group Maps – where you give permissions to your AD groups and users
  • Authentication Domains – final configuration step where you enable authentication via the domain

Now if you decide to delete a LDAP Provider Group which is configured under an Authentication Domain in attempt to change the settings, this may become an issue.

What is confusing here is UCS Manager will let you delete the LDAP Provider Group, save the changes and LDAP Provider Group will disappear from the list. And you may legitimately conclude that it’s deleted from UCS, but it’s actually not. This is what you’ll see in UCS Manager logs:

[FSM:STAGE:STALE-FAIL]: external aaa server configuration to primary(FSM-STAGE:sam:dme:AaaEpUpdateEp:SetEpLocal)
[FSM:STAGE:REMOTE-ERROR]: Result: resource-unavailable Code: ERR-ep-set-error Message: Re-ordering/Deletion of Providers cannot be applied while ldap is used for yourdomain.com(Domain) authentication(sam:dme:AaaEpUpdateEp:SetEpLocal)

The record will stay on the UCS and you may encounter very confusing issues where you change your LDAP Provider settings but changes are not reflected on UCS. So make sure to delete the object from the higher level entity first.

Distinguished Name typos

There are two ways to group Active Directory entities on a domain controller – Security Groups and Organizational Units. When configuring your AD bind account in LDAP Providers section and setting up permissions in LDAP Group Maps, make sure to not confuse the two. The best advice I can give – always use ADSI Edit tool to find the exact DN. Why? As an example let’s say you want to give permissions to the builtin administrator group and you use the following DN:

CN=Administrators,OU=Builtin,DC=yourdomain,DC=com

This won’t work, because even though Builtin container may look like a OU, it’s actually a CN in AD, as well as Users and Computers containers.

adsi_edit.JPG

ADSI Edit will give you the exact Distinguished Name. Make sure to use it to save yourself the hassle.

Group Authorization settings

Last but not least are the following two LDAP Provider configuration settings:

  • Group Authorization – whether UCS searches within groups when authenticating
  • Group Recursion – whether UCS searches groups recursively

If you add an AD group which the user is a part of in LDAP Group Maps and do not enable Group Authorization, UCS simply won’t search within the group. Enable this option unless you give permissions only on a per user basis.

Second option enables recursive search within AD groups. If you have nested groups in AD (which most people have) enable recursive search or UCS won’t look deeper than 1 level.

If you get really stuck

If you’ve set all the settings up and are certain they the are correct, but authentication still doesn’t work, then there is a relatively easy way to localize the issue.

First step is to check whether UCS can bind to your LDAP Providers and authenticate users. Pick a user (LDAP Group Maps don’t matter at this point), SSH to a Fabric Interconnect and type the following:

ucs # connect nxos
ucs(nxos)# test aaa server ldap yourdc.yourdomain.com john password123

yourdc.yourdomain.com – is the domain controller you’ve configured in LDAP Providers section. If authentication doesn’t work, then the issue is in LDAP Provider settings.

If you can authenticate, then the next step is to make sure that UCS searches through the right AD groups. To check that you will need to enable LDAP authentication logging on a Fabric Interconnect:

ucs # connect nxos
ucs(nxos)# debug ldap aaa-request-lowlevel

Now try to authenticate and look through the list of groups which UCS is searching through. If you can’t see the group which your user is a part of, then you most likely using a wrong DN in LDAP Group Maps.

In my case the settings are configured correctly and I can see that UCS is searching in the Builtin Administrators group:

2015 Dec 1 14:12:19.581737 ldap: value: CN=Enterprise Admins,CN=Users,DC=yourdomain,DC=com
2015 Dec 1 14:12:19.581747 ldap: ldap_add_to_groups: Discarding. group map not configured for CN=Enterprise Admins,CN=Users,DC=yourdomain,DC=com
2015 Dec 1 14:12:19.581756 ldap: value: CN=Administrators,CN=Builtin,DC=yourdomain,DC=com
2015 Dec 1 14:12:19.581767 ldap: ldap_add_to_groups: successfully added group:CN=Administrators,CN=Builtin,DC=yourdomain,DC=com
2015 Dec 1 14:12:19.581777 ldap: value: CN=Exchange Organization Administrators,OU=Microsoft Exchange Security Groups,DC=yourdomain,DC=com

Make sure to disable logging when you’re done:

ucs(nxos)# undebug all

References: