Posts Tagged ‘architecture’

Reminder: Disable Firewall on NSX ECMP Edge

October 15, 2019

ECMP and Stateful Services

It’s not new, this topic has already been discussed many times before, examples are here, here, here and here. When NSX Edges are configured in ECMP mode, none of the stateful services like VPN, NAT or Load Balancing are supported.

From NSX Design Guide:

In ECMP mode, only routing service is available. Stateful services cannot be supported due to asymmetric routing inherent in ECMP-based forwarding.

Even if you didn’t read documentation, but have networking skills, you’d know that protocols like NAT need to track network session state and even if you configure the same NAT rule on all of your ECMP-enabled edges, it won’t work, because due to ECMP, traffic can flow through one ESG on ingress and another ESG on egress. Since NAT tables are not synchronized, ESGs won’t be able to find the corresponding network flow in translation table and will drop the traffic.

ECMP and Firewall

But there’s another issue that doesn’t always come across or simply get forgotten about. You can deploy ESGs in ECMP mode, not configure any of the stateful services like VPN, NAT or LB, but still get network communication issues. Why? Because when you deploy an ESG, you always end up with firewall in enabled state. Firewall is also considered a stateful service.

From VVD 5.1 documentation:

SDDC-VISDN-032: For all ESGs deployed as ECMP North-South routers, disable the firewall. Use of ECMP on the ESGs is a requirement. Leaving the firewall enabled, even in allow all traffic mode, results in sporadic network connectivity. Services such as NAT and load balancing cannot be used when the firewall is disabled.

In fact, firewall is what actually tracks sessions and drops packets that don’t match existing network flows, not NAT itself. That’s also the reason why services like NAT and LB don’t work without firewall being enabled.

It often throws people off, because even having no rules in the firewall and setting default policy to accept will not prevent this issue from happening.

Demo

Here is a quick demonstration. I’m trying to establish an SSH session to a VM connected to a DLR behind two ESGs in ECMP mode.

I’m showing packet debug on both ESGs using the following command:

> debug packet display follow interface vNic_1 port_22

As you can see ingress traffic goes through E1 and egress traffic goes through E2:

E1: Packet Capture

E2: Packet Capture

Since session originated on E1, E2 interprets packets as invalid and immediately drops them:

From NSX Troubleshooting Guide:

Check for an incrementing value of a DROP invalid rule in the POST_ROUTING section of the show firewall command. Typical reasons include:

  • Asymmetric routing issues

Conclusion

It’s easy to end up in this situation, because firewall is enabled by default on a newly deployed ESG. And it’s hard to troubleshoot this issue, since it’s not quite obvious what’s actually going on unless you’ve already worked with ECMP before. So the best advice in this case is just to remember, if you want to use ECMP in NSX, make sure to disable firewall on ECMP-enabled ESGs. Use distributed firewall (DFW) instead.

Advertisements

VMware ESXi Core Processes

July 12, 2013

vmware_esxiThere are not much information on VMware ESXi architecture out there. There is an old whitepaper “The Architecture of VMware ESXi” which dates back to 2007. In fact, from the management perspective there are only two OS processes critical two ESXi management. These are: hostd (vmware-hostd in ESXi 4) and vpxa (vmware-vpxa in ESXi 4) which are called “Management Agents”.

hostd process is a communication layer between vmkernel and the outside world. It invokes all management operations on VMs, storage, network, etc by directly talking to the OS kernel. If hostd dies you won’t be able to connect to the host neither from vCenter nor VI client. But you will still have an option to connect directly to the host console via SSH.

vpxa is an intermediate layer between vCenter and hostd and is called “vCenter Server Agent”. If you have communication failure between vCenter and ESXi host, it’s the first thing to blame.

Say you have a storage LUN failure and hostd can’t release the datastore. You can restart hostd or both of the processes using these scripts:

# vmware-hostd restart
# vmware-vpxa restart

Migrating IBM DB2 from 32 to 64-bit platform

December 21, 2011

The best way to move your database from one server to another is a backup/restore procedure. You can also use db2move utility but it’s not much of help here because it moves only the tables.

If you use a built-in compression to reduce size of your backups which is a normal thing to do then if you’ll try to restore a backup made on a 32-bit architecture to a 64-bit platform using a command like this

RESTORE DATABASE mydb FROM “D:\dbbackup” TAKEN AT 20111215030019 TO “D:” WITH 2 BUFFERS BUFFER 1024 PARALLELISM 1 WITHOUT ROLLING FORWARD WITHOUT PROMPTING;

then you will get an error

SQL2570N  An attempt to restore on target OS “NT-64” from a backup created on source OS “NT-32” failed due to the incompatibility of operating systems or an incorrect specification of the restore command.  Reason-code: “2”.

The reason why this happens is a compression library. Each time you make a compressed backup DB2 puts a compression library into a backup itself. When restoring on a 64-bit platform DB2 refuses to use a 32-bit library. There are two solutions to this problem. First is to make a plane uncompressed backup. But if your backup file is quite large then it can be rather painful to move it between servers. Second solution is to add COMRLIB clause into the original query

RESTORE DATABASE mydb FROM “D:\dbbackup” TAKEN AT 20111215030019 TO “D:” WITH 2 BUFFERS BUFFER 1024 PARALLELISM 1 COMPRLIB C:\SQLLIB\BIN\db2compr.dll WITHOUT ROLLING FORWARD WITHOUT PROMPTING;

If you restore to an existing database you will get SQL2539 warning. Which just means that original database files will be deleted.

Use this workaround description at IBM site as a reference: IY71307: SQL2570N when restoring a compressed backup image to another platform or wordsize.