Posts Tagged ‘Fault Domain’

Dell Compellent is not an ALUA Storage Array

May 16, 2016

dell_compellentDell Compellent is Dell’s flagship storage array which competes in the market with such rivals as EMC VNX and NetApp FAS. All these products have slightly different storage architectures. In this blog post I want to discuss what distinguishes Dell Compellent from the aforementioned arrays when it comes to multipathing and failover. This may help you make right decisions when designing and installing a solution based on Dell Compellent in your production environment.

Compellent Array Architecture

In one of my previous posts I showed how Compellent LUNs on vSphere ESXi hosts are claimed by VMW_SATP_DEFAULT_AA instead of VMW_SATP_ALUA SATP, which is the default for all ALUA arrays. This happens because Compellent is not actually an ALUA array and doesn’t have the tpgs_on option enabled. Let’s digress for a minute and talk about what the tpgs_on option actually is.

For a storage array to be claimed by VMW_SATP_ALUA it has to have the tpgs_on option enabled, as indicated by the corresponding SATP claim rule:

# esxcli storage nmp satp rule list

Name                 Transport  Claim Options Description
-------------------  ---------  ------------- -----------------------------------
VMW_SATP_ALUA                   tpgs_on       Any array with ALUA support

This is how Target Port Groups (TPG) are defined in section 5.8.2.1 Introduction to asymmetric logical unit access of SCSI Primary Commands – 3 (SPC-3) standard:

A target port group is defined as a set of target ports that are in the same target port asymmetric access state at all times. A target port group asymmetric access state is defined as the target port asymmetric access state common to the set of target ports in a target port group. The grouping of target ports is vendor specific.

This has to do with how ports on storage controllers are grouped. On an ALUA array even though a LUN can be accessed through either of the controllers, paths only to one of them (controller which owns the LUN) are Active Optimized (AO) and paths to the other controller (non-owner) are Active Non-Optimized (ANO).

Compellent does not present LUNs through the non-owning controller. You can easily see that if you go to the LUN properties. In this example we have four iSCSI ports connected (two per controller) on the Compellent side, but we can see only two paths, which are the paths from the owning controller.

compellent_psp

If Compellent presents each particular LUN only through one controller, then how does it implement failover? Compellent uses a concept of fault domains and control ports to handle LUN failover between controllers.

Compellent Fault Domains

This is Dell’s definition of a Fault Domain:

Fault domains group front-end ports that are connected to the same Fibre Channel fabric or Ethernet network. Ports that belong to the same fault domain can fail over to each other because they have the same connectivity.

So depending on how you decided to go about your iSCSI network configuration you can have one iSCSI subnet / one fault domain / one control port or two iSCSI subnets / two fault domains / two control ports. Either of the designs work fine, this is really is just a matter of preference.

You can think of a Control Port as a Virtual IP (VIP) for the particular iSCSI subnet. When you’re setting up iSCSI connectivity to a Compellent, you specify Control Ports IPs in Dynamic Discovery section of the iSCSI adapter properties. Which then redirects the traffic to the actual controller IPs.

If you go to the Storage Center GUI you will see that Compellent also creates one virtual port for every iSCSI physical port. This is what’s called a Virtual Port Mode and is recommended instead of a Physical Port Mode, which is the default setting during the array initialization.

Failover scenarios

Now that we now what fault domains are, let’s talk about the different failover scenarios. Failover can happen on either a port level when you have a transceiver / cable failure or a controller level, when the whole controller goes down or is rebooted. Let’s discuss all of these scenarios and their variations one by one.

1. One Port Failed / One Fault Domain

If you use one iSCSI subnet and hence one fault domain, when you have a port failure, Compellent will move the failed port to the other port on the same controller within the same fault domain.

port_failed

In this example, 5000D31000B48B0E and 5000D31000B48B0D are physical ports and 5000D31000B48B1D and 5000D31000B48B1C are the corresponding virtual ports on the first controller. Physical port 5000D31000B48B0E fails. Since both ports on the controller are in the same fault domain, controller moves virtual port 5000D31000B48B1D from its original physical port 5000D31000B48B0E to the physical port 5000D31000B48B0D, which still has connection to network. In the background Compellent uses iSCSI redirect command on the Control Port to move the traffic to the new virtual port location.

2. One Port Failed / Two Fault Domains

Two fault domains scenario is slightly different as now on each controller there’s only one port in each fault domain. If any of the ports were to fail, controller would not fail over the port. Port is failed over only within the same controller/domain. Since there’s no second port in the same fault domain, the virtual port stays down.

port_failed_2

A distinction needs to be made between the physical and virtual ports here. Because from the physical perspective you lose one physical link in both One Fault Domain and Two Fault Domains scenarios. The only difference is, since in the latter case the virtual port is not moved, you’ll see one path down when you go to LUN properties on an ESXi host.

3. Two Ports Failed

This is the scenario which you have to be careful with. Compellent does not initiate a controller failover when all front-end ports on a controller fail. The end result – all LUNs owned by this controller become unavailable.

two_ports_failed_2

luns_down

This is the price Compellent pays for not supporting ALUA. However, such scenario is very unlikely to happen in a properly designed solution. If you have two redundant network switches and controllers are cross-connected to both of them, if one switch fails you lose only one link per controller and all LUNs stay accessible through the remaining links/switch.

4. Controller Failed / Rebooted

If the whole controller fails the ports are failed over in a similar fashion. But now, instead of moving ports within the controller, ports are moved across controllers and LUNs come across with them. You can see how all virtual ports have been failed over from the second (failed) to the first (survived) controller:

controller_failed

Once the second controller gets back online, you will need to rebalance the ports or in other words move them back to the original controller. This doesn’t happen automatically. Compellent will either show you a pop up window or you can do that by going to System > Setup > Multi-Controller > Rebalance Local Ports.

Conclusion

Dell Compellent is not an ALUA storage array and falls into the category of Active/Passive arrays from the LUN access perspective. Under such architecture both controller can service I/O, but each particular LUN can be accessed only through one controller. This is different from the ALUA arrays, where LUN can be accessed from both controllers, but paths are active optimized on the owning controller and active non-optimized on the non-owning controller.

From the end user perspective it does not make much of a difference. As we’ve seen, Compellent can handle failover on both port and controller levels. The only exception is, Compellent doesn’t failover a controller if it loses all front-end connectivity, but this issue can be easily avoided by properly designing iSCSI network and making sure that both controllers are connected to two upstream switches in a redundant fashion.

Advertisements

Dell Compellent iSCSI Configuration

November 20, 2015

I haven’t seen too many blog posts on how to configure Compellent for iSCSI. And there seem to be some confusion on what the best practices for iSCSI are. I hope I can shed some light on it by sharing my experience.

In this post I want to talk specifically about the Windows scenario, such as when you want to use it for Hyper-V. I used Windows Server 2012 R2, but the process is similar for other Windows Server versions.

Design Considerations

All iSCSI design considerations revolve around networking configuration. And two questions you need to ask yourself are, what your switch topology is going to look like and how you are going to configure your subnets. And it all typically boils down to two most common scenarios: two stacked switches and one subnet or two standalone switches and two subnets. I could not find a specific recommendation from Dell on whether it should be one or two subnets, so I assume that both scenarios are supported.

Worth mentioning that Compellent uses a concept of Fault Domains to group front-end ports that are connected to the same Ethernet network. Which means that you will have one fault domain in the one subnet scenario and two fault domains in the two subnets scenario.

For iSCSI target ports discovery from the hosts, you need to configure a Control Port on the Compellent. Control Port has its own IP address and one Control Port is configured per Fault Domain. When server targets iSCSI port IP address, it automatically discovers all ports in the fault domain. In other words, instead of using IPs configured on the Compellent iSCSI ports, you’ll need to use Control Port IP for iSCSI target discovery.

Compellent iSCSI Configuration

In my case I had two stacked switches, so I chose to use one iSCSI subnet. This translates into one Fault Domain and one Control Port on the Compellent.

IP settings for iSCSI ports can be configured at Storage Management > System > Setup > Configure iSCSI IO Cards.

iscsi_ports

To create and assign Fault Domains go to Storage Management > System > Setup > Configure Local Ports > Edit Fault Domains. From there select your fault domain and click Edit Fault Domain. On IP Settings tab you will find iSCSI Control Port IP address settings.

local_ports

control_port

Host MPIO Configuration

On the Windows Server start by installing Multipath I/O feature. Then go to MPIO Control Panel and add support for iSCSI devices. After a reboot you will see MSFT2005iSCSIBusType_0x9 in the list of supported devices. This step is important. If you don’t do that, then when you map a Compellent disk to the hosts, instead of one disk you will see multiple copies of the same disk device in Device Manager (one per path).

add_iscsi

iscsi_added

Host iSCSI Configuration

To connect hosts to the storage array, open iSCSI Initiator Properties and add your Control Port to iSCSI targets. On the list of discovered targets you should see four Compellent iSCSI ports.

Next step is to connect initiators to the targets. This is where it is easy to make a mistake. In my scenario I have one iSCSI subnet, which means that each of the two host NICs can talk to all four array iSCSI ports. As a result I should have 2 host ports x 4 array ports = 8 paths. To accomplish that, on the Targets tab I have to connect each initiator IP to each target port, by clicking Connect button twice for each target and selecting one initiator IP and then the other.

iscsi_targets

discovered_targets

connect_targets

Compellent Volume Mapping

Once all hosts are logged in to the array, go back to Storage Manager and add servers to the inventory by clicking on Servers > Create Server. You should see hosts iSCSI adapters in the list already. Make sure to assign correct host type. I chose Windows 2012 Hyper-V.

 

add_servers

It is also a best practice to create a Server Cluster container and add all hosts into it if you are deploying a Hyper-V or a vSphere cluster. This guarantees consistent LUN IDs across all hosts when LUN is mapped to a Server Cluster object.

From here you can create your volumes and map them to the Server Cluster.

Check iSCSI Paths

To make sure that multipathing is configured correctly, use “mpclaim” to show I/O paths. As you can see, even though we have 8 paths to the storage array, we can see only 4 paths to each LUN.

io_paths

Arrays such as EMC VNX and NetApp FAS use Asymmetric Logical Unit Access (ALUA), where LUN is owned by only one controller, but presented through both. Then paths to the owning controller are marked as Active/Optimized and paths to the non-owning controller are marked as Active/Non-Optimized and are used only if owning controller fails.

Compellent is different. Instead of ALUA it uses iSCSI Redirection to move traffic to a surviving controller in a failover situation and does not need to present the LUN through both controllers. This is why you see 4 paths instead of 8, which would be the case if we used an ALUA array.

References