NetApp filers are active/active ALUA arrays. It means that you can access LUNs configured on one controller via the second one. But access to the partner’s LUNs is provided through the internal interconnect and is always slower. That’s why the paths to the controller through the partner are called “unoptimized”. Their primary usage is to provide backup paths in case of a failover.
Fixed path selection
VMware hosts by default use “VMW_SATP_DEFAULT_AA” Storage Array Type Policy (SATP) and “Fixed” Path Selection Policy (PSP) for active/active arrays. If ESXi host is configured with these SATP and PSP, it will access each LUN via one particular path, even if you have two FC ports on each of the controllers.
VMware host can’t automatically identify optimized path. So you can either set it manually or use functionality of NetApp Virtual Storage Console (VSC) plug-in for VMware. Just go to the Monitoring and Host Configuration -> Overview section of VSC, right click on ESXi host and click “Set Recommended Values”. If you don’t do that, ESXi hosts will run I/O traffic through a randomly identified path, which could turn out to be unoptimized. It means you will push heaps of I/O through the partner node and experience higher latencies.
You can check if you’re using non-optimized paths by looking for such warnings on NetApps:
filer_01> Mon May 6 10:30:45 EST [filer_01: ems.engine.inputSuppress:error]: Event ‘scsitarget.partnerPath.misconfigured’ suppressed 327 times since Mon May 6 09:30:48 EST 2013.
Mon May 6 10:30:45 EST [filer_01: scsitarget.partnerPath.misconfigured:error]: FCP Partner Path Misconfigured – Host I/O access through a non-primary and non-optimal path was detected.
Or run “lun stats -o” and look for huge numbers under “Partner Ops” and “Partner KBytes”.
ALUA configuration
If you want to utilize both links to the controller in a round robin fashion, you need to do some additional configuration. You should enable ALUA for your VMware ESXi hosts initiator group on NetApp:
igroup set <group> alua yes
Now you need to reboot ESXi host. After a reboot it will see that storage is ALUA-capable and change SATP to VMW_SATP_ALUA and PSP is “Most Recently Used”. To utilize load balancing between two controller paths you need to change PSP to “Round Robin”. Again, you can do that either manually or via VSC.
Note: Don’t ever use ALUA and VMW_SATP_ALUA if you have Windows Server 2003 MSCS or Windows Server 2008 Failover Cluster with shared RDM LUNs. It’s an unsupported configuration and you can run into a cluster failure situation. It’s described in many places:
- https://kb.netapp.com/support/index?page=content&id=2013316
- http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1010713
- http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=369866
In this case leave SATP as “VMW_SATP_DEFAULT_AA”, PSP as “Fixed” and make sure that you use optimized paths.
Tags: ALUA, ESXi, Failover Cluster, igroup, LUN, MPIO, MSCS, NetApp, optimized, Path Selection Policy, performance, PSP, RDM, Round Robin, SATP, Storage Array Type Policy, unoptimized, Virtual Storage Console, vmware, VSC
Leave a Reply