Archive for the ‘Hardware’ Category

Dell Compellent is not an ALUA Storage Array

May 16, 2016

dell_compellentDell Compellent is Dell’s flagship storage array which competes in the market with such rivals as EMC VNX and NetApp FAS. All these products have slightly different storage architectures. In this blog post I want to discuss what distinguishes Dell Compellent from the aforementioned arrays when it comes to multipathing and failover. This may help you make right decisions when designing and installing a solution based on Dell Compellent in your production environment.

Compellent Array Architecture

In one of my previous posts I showed how Compellent LUNs on vSphere ESXi hosts are claimed by VMW_SATP_DEFAULT_AA instead of VMW_SATP_ALUA SATP, which is the default for all ALUA arrays. This happens because Compellent is not actually an ALUA array and doesn’t have the tpgs_on option enabled. Let’s digress for a minute and talk about what the tpgs_on option actually is.

For a storage array to be claimed by VMW_SATP_ALUA it has to have the tpgs_on option enabled, as indicated by the corresponding SATP claim rule:

# esxcli storage nmp satp rule list

Name                 Transport  Claim Options Description
-------------------  ---------  ------------- -----------------------------------
VMW_SATP_ALUA                   tpgs_on       Any array with ALUA support

This is how Target Port Groups (TPG) are defined in section 5.8.2.1 Introduction to asymmetric logical unit access of SCSI Primary Commands – 3 (SPC-3) standard:

A target port group is defined as a set of target ports that are in the same target port asymmetric access state at all times. A target port group asymmetric access state is defined as the target port asymmetric access state common to the set of target ports in a target port group. The grouping of target ports is vendor specific.

This has to do with how ports on storage controllers are grouped. On an ALUA array even though a LUN can be accessed through either of the controllers, paths only to one of them (controller which owns the LUN) are Active Optimized (AO) and paths to the other controller (non-owner) are Active Non-Optimized (ANO).

Compellent does not present LUNs through the non-owning controller. You can easily see that if you go to the LUN properties. In this example we have four iSCSI ports connected (two per controller) on the Compellent side, but we can see only two paths, which are the paths from the owning controller.

compellent_psp

If Compellent presents each particular LUN only through one controller, then how does it implement failover? Compellent uses a concept of fault domains and control ports to handle LUN failover between controllers.

Compellent Fault Domains

This is Dell’s definition of a Fault Domain:

Fault domains group front-end ports that are connected to the same Fibre Channel fabric or Ethernet network. Ports that belong to the same fault domain can fail over to each other because they have the same connectivity.

So depending on how you decided to go about your iSCSI network configuration you can have one iSCSI subnet / one fault domain / one control port or two iSCSI subnets / two fault domains / two control ports. Either of the designs work fine, this is really is just a matter of preference.

You can think of a Control Port as a Virtual IP (VIP) for the particular iSCSI subnet. When you’re setting up iSCSI connectivity to a Compellent, you specify Control Ports IPs in Dynamic Discovery section of the iSCSI adapter properties. Which then redirects the traffic to the actual controller IPs.

If you go to the Storage Center GUI you will see that Compellent also creates one virtual port for every iSCSI physical port. This is what’s called a Virtual Port Mode and is recommended instead of a Physical Port Mode, which is the default setting during the array initialization.

Failover scenarios

Now that we now what fault domains are, let’s talk about the different failover scenarios. Failover can happen on either a port level when you have a transceiver / cable failure or a controller level, when the whole controller goes down or is rebooted. Let’s discuss all of these scenarios and their variations one by one.

1. One Port Failed / One Fault Domain

If you use one iSCSI subnet and hence one fault domain, when you have a port failure, Compellent will move the failed port to the other port on the same controller within the same fault domain.

port_failed

In this example, 5000D31000B48B0E and 5000D31000B48B0D are physical ports and 5000D31000B48B1D and 5000D31000B48B1C are the corresponding virtual ports on the first controller. Physical port 5000D31000B48B0E fails. Since both ports on the controller are in the same fault domain, controller moves virtual port 5000D31000B48B1D from its original physical port 5000D31000B48B0E to the physical port 5000D31000B48B0D, which still has connection to network. In the background Compellent uses iSCSI redirect command on the Control Port to move the traffic to the new virtual port location.

2. One Port Failed / Two Fault Domains

Two fault domains scenario is slightly different as now on each controller there’s only one port in each fault domain. If any of the ports were to fail, controller would not fail over the port. Port is failed over only within the same controller/domain. Since there’s no second port in the same fault domain, the virtual port stays down.

port_failed_2

A distinction needs to be made between the physical and virtual ports here. Because from the physical perspective you lose one physical link in both One Fault Domain and Two Fault Domains scenarios. The only difference is, since in the latter case the virtual port is not moved, you’ll see one path down when you go to LUN properties on an ESXi host.

3. Two Ports Failed

This is the scenario which you have to be careful with. Compellent does not initiate a controller failover when all front-end ports on a controller fail. The end result – all LUNs owned by this controller become unavailable.

two_ports_failed_2

luns_down

This is the price Compellent pays for not supporting ALUA. However, such scenario is very unlikely to happen in a properly designed solution. If you have two redundant network switches and controllers are cross-connected to both of them, if one switch fails you lose only one link per controller and all LUNs stay accessible through the remaining links/switch.

4. Controller Failed / Rebooted

If the whole controller fails the ports are failed over in a similar fashion. But now, instead of moving ports within the controller, ports are moved across controllers and LUNs come across with them. You can see how all virtual ports have been failed over from the second (failed) to the first (survived) controller:

controller_failed

Once the second controller gets back online, you will need to rebalance the ports or in other words move them back to the original controller. This doesn’t happen automatically. Compellent will either show you a pop up window or you can do that by going to System > Setup > Multi-Controller > Rebalance Local Ports.

Conclusion

Dell Compellent is not an ALUA storage array and falls into the category of Active/Passive arrays from the LUN access perspective. Under such architecture both controller can service I/O, but each particular LUN can be accessed only through one controller. This is different from the ALUA arrays, where LUN can be accessed from both controllers, but paths are active optimized on the owning controller and active non-optimized on the non-owning controller.

From the end user perspective it does not make much of a difference. As we’ve seen, Compellent can handle failover on both port and controller levels. The only exception is, Compellent doesn’t failover a controller if it loses all front-end connectivity, but this issue can be easily avoided by properly designing iSCSI network and making sure that both controllers are connected to two upstream switches in a redundant fashion.

Advertisement

Changing the Default PSP for Dell Compellent

April 26, 2016

dell_compellentIf you’ve ever worked with Dell Compellent storage arrays you may have noticed that when you initially connect it to a VMware ESXi host, by default VMware Native Multipathing Plugin (NMP) uses Fixed Path Selection Policy (PSP) for all connected LUNs. And if you have two ports on each of the controllers connected to your storage area network (be it iSCSI or FC), then you’re wasting half of your bandwidth.

compellent_psp

Why does that happen? Let’s dig deep into VMware’s Pluggable Storage Architecture (PSA) and see how it treats Compellent.

How Compellent is claimed by VMware NMP

If you are familiar with vSphere’s Pluggable Storage Architecture (PSA) and NMP (which is the only PSA plug-in that every ESXi host has installed by default), then you may know that historically it’s always had specific rules for such Asymmetric Logical Unit Access (ALUA) arrays as NetApp FAS and EMC VNX.

Run the following command on an ESXi host and you will see claim rules for NetApp and DGC devices (DGC is Data General Corporation, which built Clariion array that has been later re-branded as VNX by EMC):

# esxcli storage nmp satp rule list

Name              Vendor  Default PSP Description
----------------  ------- ----------- -------------------------------
VMW_SATP_ALUA_CX  DGC                 CLARiiON array in ALUA mode
VMW_SATP_ALUA     NETAPP  VMW_PSP_RR  NetApp arrays with ALUA support

This tells NMP to use Round-Robin Path Selection Policy (PSP) for these arrays, which is always preferable if you want to utilize all available active-optimized paths. You may have noticed that there’s no default PSP in the VNX claim rule, but if you look at the default PSP for the VMW_SATP_ALUA_CX Storage Array Type Plug-In (SATP), you’ll see that it’s also Round-Robin:

# esxcli storage nmp satp list

Name              Default PSP  
----------------- -----------
VMW_SATP_ALUA_CX  VMW_PSP_RR

There is, however, no default claim rule for Dell Compellent storage arrays. There are a handful of the following non array-specific “catch all” rules:

Name                 Transport  Claim Options Description
-------------------  ---------  ------------- -----------------------------------
VMW_SATP_ALUA                   tpgs_on       Any array with ALUA support
VMW_SATP_DEFAULT_AA  fc                       Fibre Channel Devices
VMW_SATP_DEFAULT_AA  fcoe                     Fibre Channel over Ethernet Devices
VMW_SATP_DEFAULT_AA  iscsi                    iSCSI Devices

As you can see, the default PSP for VMW_SATP_ALUA is Most Recently Used (MRU) and for VMW_SATP_DEFAULT_AA it’s VMW_PSP_FIXED:

Name                Default PSP   Description
------------------- ------------- ------------------------------------------
VMW_SATP_ALUA       VMW_PSP_MRU
VMW_SATP_DEFAULT_AA VMW_PSP_FIXED Supports non-specific active/active arrays

Compellent is not an ALUA storage array and doesn’t have the tpgs_on option enabled. As a result it’s claimed by the VMW_SATP_DEFAULT_AA rule for the iSCSI transport, which is why you end up with the Fixed path selection policy for all LUNs by default.

Changing the default PSP

Now let’s see how we can change the PSP from Fixed to Round Robin. First thing you have to do before even attempting to change the PSP is to check VMware Compatibility List to make sure that the round robin PSP is supported for a particular array and vSphere combination.

vmware_hcl

As you can see, round robin path selection policy is supported for Dell Compellent storage arrays in vSphere 6.0u2. So let’s change it to get the benefit of being able to simultaneously use all paths to Compellent controllers.

For Compellent firmware versions 6.5 and earlier use the following command to change the default PSP:

# esxcli storage nmp satp set -P VMW_PSP_RR -s VMW_SATP_DEFAULT_AA

Note: technically here you’re changing PSP not specifically for the Compellent storage array, but for any array which is claimed by VMW_SATP_DEFAULT_AA and which also doesn’t have an individual SATP rule with PSP set. Make sure that this is not the case or you may accidentally change PSP for some other array you may have in your environment.

The above will change PSP for any newly provisioned and connected LUNs. For any existing LUNs you can change PSP either manually in each LUN’s properties or run the following command in PowerCLI:

# Get-Cluster ClusterNameHere | Get-VMHost | Get-ScsiLun | where {$_.Vendor -eq
“COMPELNT” –and $_.Multipathpolicy -eq “Fixed”} | Set-ScsiLun -Multipathpolicy
RoundRobin

This is what you should see in LUN properties as a result:

compellent_psp_2

Conclusion

By default any LUN connected from a Dell Compellent storage array is claimed by NMP using Fixed path selection policy. You can change it to Round Robin using the above two simple commands to make sure you utilize all storage paths available to ESXi hosts.

RecoverPoint VE: Common Deployment Issues

April 19, 2016

fixIn one of my previous posts I discussed iSCSI connectivity considerations when deploying RecoverPoint VE. In this post I want to describe common issues you may encounter when deploying RecoverPoint clusters, most of which are applicable to both physical appliance and virtual editions.

VNX MirrorView ports

I already touched on that briefly in my previous post. But it’s worth mentioning again that you can NOT use MirrorView ports for iSCSI connectivity between RPAs and VNX arrays. When you try to use a MirrorView iSCSI port for RecoverPoint, it gets upset and doesn’t communicate with the array.

If you make a mistake of connecting one port per SP and this port is a MirrorView port, you will have no communication with the array at all and get the following error in Unisphere for RecoverPoint:

Error Splitter ARRAYNAME-A is down
Error Splitter ARRAYNAME-B is down

splitter_error

If you connect two ports per SP, one of which is MirrorView port and use two iSCSI network subnets you may get the following error when running a SAN connectivity test from the RPA boxmgmt interface. In this case RPA can communicate with the array only over one subnet:

On array ABCD1234567890, all paths for device with UID=0x1234567890abcdef go through RPA Ethernet port eth2 …

multipathing_issue

The solution is as simple as moving the link from port 0 to port 1 on a 10Gb I/O module. And from port 0 to port 1,2 or 3 on a 1Gb I/O module.

If you don’t want to lose two iSCSI ports (1 per SP), especially if it’s 10Gb, and you’re not using MirrorView, you can uninstall MirrorView enabler from the array. Just keep in mind that it will require an array reboot. Service processors will be rebooted one at a time, so there is no downtime. But if it’s a heavily used storage array it’s recommended to schedule uninstallation out of hours to minimize the impact.

Error when redeploying a cluster

If you’ve made configuration mistakes while deploying a RecoverPoint cluster and want to blow the whole thing away and redeploy it from scratch you may encounter the following error when deploying for the second time:

VNX path set with IP 10.10.10.1 already exists in a different path set (RP_0x123abc456def789g_0_iSCSI1)

rpa_redeploy

The cause of the issue is iSCSI sessions which stayed on the VNX after you deleted RPA VMs. You need to connect to the VNX and delete them in Unisphere manually by right-clicking on the storage array name on the dashboard and selecting iSCSI > Connections Between Storage Systems. This is what duplicate sessions look like:

duplicate_rp

As you can see there’re three sets of RecoverPoint cluster iSCSI connections after three unsuccessful attempts.

You will need to delete old sessions before you are able to proceed with the deployment in RecoverPoint Deployment Manager.

Wrong initiator names

I’ve seen this on multiple occasions when RecoverPoint registers initiators on VNX with inconsistent hostnames.

As you’ve seen on the screenshots above, hostname field of every initiator consists of the cluster ID and RPA ID (not sure what the third field means), such as this:

RP_0x123abc456def789g_1_0

In this example you can see that RPA1 has two hostnames with suffixes _0_0 and _1_0.

wrong_initiators

This issue is purely cosmetic and doesn’t affect RecoverPoint operation, but if you want to fix it you will need to restart Management Servers on VNX service processors. It’s a non-disruptive procedure and can be performed by opening the following link http://SP_IP/setup and clicking on “Restart Management Server” button.

After a restart, array will update hostnames to reflect the actual configuration.

Joining two clusters with the licences already applied

This is just not going to work. Make sure to join production and DR clusters before applying RecoverPoint licences or Deployment Manager “Connect Cluster” wizard will fail.

It’s one of the prerequisites specified in RecoverPoint “Installation and Deployment Guide”:

If you plan to connect the new cluster immediately after preparing it for connection,
ensure:

  • You do not install a license in, or modify the settings of, the new cluster before
    connecting it to the existing system.

Conclusion

There’re always much more things that can potentially go wrong. But if any of the above helped you to solve your RecoverPoint deployment issues make sure to let me know in the comments below!

RecoverPoint VE: iSCSI Network Design

March 29, 2016

recoverpointRecoverPoint is a great storage replication product, which supports Continuous Data Protection (CDP) and gives you RPO figures measured in second compared to a standard asynchronous storage-based replication solutions, where RPO is measured in minutes or even hours.

RecoverPoint comes in three flavours:

  • RecoverPoint SE/EX/CL – physical appliance for replication between VNX (RecoverPoint/SE), VNX/VMAX/VPLEX (RecoverPoint/EX) or EMC and non-EMC (RecoverPoint CL) storage arrays.
  • RecoverPoint VE – virtual edition of RecoverPoint which is installed as a VM and supports the same SE/EX/CL versions.
  • RecoverPoint for Virtual Machines – also a virtual appliance but is array-agnostic and works at a hypervisor level by replicating VMs instead of LUNs.

In this blog post we will be discussing connectivity options for RecoverPoint VE (SE edition). Make sure to not confuse RecoverPoint VE and RecoverPoint for Virtual Machines as it’s two completely different products.

VNX MirrorView ports

MirrorView is an another EMC replication solution integrated into VNX arrays. If there’s a MirrorView enabler installed, it will claim itself the first FC port and the first iSCSI port. When patching VNX iSCSI ports make sure to NOT use the ports claimed by MirrorView.

mirrorview_ports

If you use 1GbE (4-port) I/O modules you can use three ports per SP (all except port 0) and if you have 10GbE (2-port) I/O modules you can use one port per SP. I will talk about workarounds for this in the next blog post.

RPA appliance iSCSI vNICs

Each RecoverPoint appliance has two iSCSI NICs, which can be configured on either one or two subnets. If you use one 10Gb port on each SP as in the example above, then you’re forced to use one subnet. Because you obviously need at least two ports on each SP to have two networks.

If you have 1Gb modules in your VNX array, then you will most likely have two 1Gb iSCSI ports connected on each SP. In that case you can use two iSCSI subnets to reduce the number of iSCSI sessions between RPAs and a VNX.

On the vSphere side you will need to create one or two iSCSI port groups, depending on how many subnets you’ve decided to allocate and connect RPA vNICs accordingly.

rpa_iscsi

VNX iSCSI Connections

RecoverPoint clusters are deployed and connected using a special tool called Deployment Manager. It assigns all IP addresses, connects RecoverPoint clusters to VNX arrays and joins sites together.

Once deployment is finished you will have iSCSI connections created on the VNX array. Depending on how many iSCSI subnets you’re using, iSCSI connections will be configured accordingly.

1. One Subnet Example

Lets look at the one subnet topology first. In this example you have one 10Gb port per VNX SP and two ports on each of the two RPAs all on one subnet. When you right click on the storage array in Unisphere and select iSCSI > Connections Between Storage Systems you should see something similar to this.

iscsi_connections

As you can see ports iSCSI1 and iSCSI2 on RPA0 and RPA1 are mapped to two ports on the storage array A-5 and B-5. Four RPA ports are connected to two VNX ports which gives you eight iSCSI initiator records on the VNX.

iscsi_initiators

2. Two Subnets Example

If you connect two 1Gb ports per VNX SP and decide to use two subnets, then each SP will have one port on each of the two subnets. Same goes for the RPAs. Each RPA will have one vNIC connected to each subnet.

iSCSI connections will be set up a little bit differently now. Because only the VNX and RPA ports which are on the same subnet should be able to talk to each other.

iscsi_connections2

Every RPA in this example has one IP on the xxx.xxx.46.0/255.255.255.192 subnet (iSCSI A) and one IP on the xxx.xxx.46.64/255.255.255.192 subnet (iSCSI B). Similarly, ports A-10 and B-10 on the VNX are configured on iSCSI A subnet. And ports A-11 and B-11 are configured on iSCSI B subnet. Because of that, iSCSI1 ports are mapped to ports A-10/B-10 and iSCSI2 ports are mapped to ports A-11/B-11.

As we are using two subnets in this example instead of 4 RPA ports by 4 VNX ports = 16 iSCSI connections, we will have 2 RPA ports by 2 VNX ports (subnet iSCSI A) + 2 RPA ports by 2 VNX ports (subnet iSCSI B) = 8 iSCSI connections.

iscsi_initiators2

Conclusion

The goal of this post was to discuss the points which are not very well explained in RecoverPoint documentation. It’s not a comprehensive guide by any means. You can find the full deployment procedure with prerequisites, installation and configuration steps in EMC RecoverPoint Installation and Deployment Guide.

Issue Joining VNX1 and VNX2 Unisphere Domains

March 21, 2016

no_SSLThe main benefit of using Unisphere Domains is that they give you ability to manage all of your VNXs by connecting to just one array. If you have an old Clariion you’ll have to use a so called Multi-Domain. VNX1 and VNX2 arrays can join a single domain.

Recently I’ve encountered an issue where this didn’t work quite well. When joining VNX1 to a VNX2 I got the following error:

CIMOM Can’t get the VNX hardware class from – ip 172.10.10.10 – Error Connecting SSL. Error details: A system call error (errno=10057).

join_error

As it turned out EMC disabled SSL 3.0 support in recent Block OE versions. As a result it’s broken Unisphere Domain connectivity with arrays running Flare 32 Patch 209 or older, which still use SSL 3.0.

Solution is to upgrade Block OE to a version higher than Flare 32 Patch 209 where SSL 3.0 is disabled. Or as a workaround you can connect arrays in a Multi-Domain. To find out how, read one of my earlier blog posts: How to Configure VNX Unisphere Domains

How to Configure VNX Unisphere Domains

March 7, 2016

unisphere_domainVNX storage arrays have a concept of Unisphere Domains which let you manage multiple arrays from one Unisphere GUI. To manage two or more arrays from a single pain of glass you need to join their storage domains. There are typically two scenarios:

  1. Joining arrays of the same generation, such as VNX to VNX or Clariion to Clariion (VNX1 and VNX2 are considered as one generation)
  2. Joining arrays of different generations, such as Clariion to VNX

Same generation arrays

When joining same generation arrays to a single domain you get the benefit of having consistent domain-wide settings across all arrays in the domain, such as: DNS, NTP, LDAP and Global Users. If you go to Unisphere Home screen and click on the Domain button you will find where all domain-wide settings are configured. Once they are set up these settings propagate to all systems within the domain.

domain_settings

There’s also a concept of Domain Master, which keeps and distributes domain-wide settings. Domain Master can be changed manually if you wish to do so by using the Select Domain Master wizard.

To add a new system to the same domain simply click on Add/Remove Systems and follow the wizard.

Different generation arrays

It’s very uncommon to see a Clariion these days, but if you still have one and want to have a single management interface across both your Clariion and VNX arrays you have to use Multi-Domains. You won’t get the benefit of having the same domain-wide settings, but if you have just 2 or 3 arrays it’s not really that hard to set them up manually.

To add a new domain to Multi-Domain configuration click on Manage Multi-Domain Configurations, specify VNX Service Processor IP and assign a name. System will be added to the list of Selected Domains.

add_vnx

Always add another domain to Multi-Domain configuration on a system which is running the highest release of Unisphere within the Multi-Domain, otherwise you’ll get the following error:

This version of user interface software does not support the management server software versions on the provided system.

add_vnx_error

Once the system is added you will see both arrays in the systems list and will be able to manage both from one Unisphere interface. For the sake of demonstration I used two VNX arrays in the screenshot below. But the same process applies to Clarrion arrays.

two_arrays

Local and global users

Unisphere has two types of user accounts – local and global. Local account can manage the system you have connected to and global account can manage all systems within the same domain.

By default, when array is being installed, global security is initialized and one global user is created. There are no local user accounts on the system by default, which is fine, because each array is created as a member of its own local domain.

In a Multi-Domain configuration you need to make sure you’re logging in to Unisphere using an account, which exists in every domain being managed. Otherwise, each time you log in to Unisphere you will have to manually login to the remote domain on the domain tab, which is quite annoying.

If you have different accounts on each of the arrays, make sure to make them consistent across all systems.

domain_login

Conclusion

In this post in a few simple steps we went through the Unisphere single domain and multi-domain configurations. If you want to know more details about Unisphere Domain management refer to EMC white paper “Domain Management with EMC Unisphere for VNX“.

NetApp System Manager TLS Issue

February 29, 2016

lova_javaYesterday while working on one of the customers’ NetApp array I hit an issue which looked like an SSL misconfiguration at first.

I needed to run Network Configuration Checker to check for any inconsistencies between the active and persistent network configuration settings in the /etc/rc file. I used NetApp OnCommand System Manager 3.1.2 with Java 8. When I tried to run a network configuration check I got this error:

‘netapp.domain.local’ is not configured for secure management with TLS

net_checker

When browsing to controllers management I also got this:

‘netapp.domain.local’ is not configured for secure management with TLS. Sensitive information you supply including passwords will be visible to other computers on the network.

Do you want to continue with non-secure connection ?

The second issue you can ignore by just skipping the warning, but the Network Configuration Checker error you can’t.

Potential Resolution

I googled it up and NetApp KB article 2021507 “OnCommand System Manager Java Compatibility issues” came up, which suggested that all you need to do is enable TLS on the 7-Mode controller (on Cluster Mode it is enabled by default):

options tls.enable on

This did not work for me, though.

Alternative Solution

The reason why System Manager no longer works with SSL and requires TLS instead, is because Java 7u75 (and later) implemented a change that disabled SSLv3 due to the POODLE security vulnerability.

So you either have to enable TLS for Java 7u75 and later (which didn’t work in my case) or downgrade to Java 7u72, which is the previous release from 7u75.

Once that done you should no longer get the error neither in Network Configuration Checker, nor when logging in to controllers in System Manager.

Merging Brocade Fabrics

February 23, 2016

fibreRecently I needed to merge two pairs of Brocade fibre channel fabrics for one of the customers. When I was doing a bit of my own research I realised that there is very scarce information on how to do that on the Interwebs. There were a few community posts on the Brocade forums, but there seemed to be some confusion around how zoning should be configured to let the switches merge successfully. I thought I would fill the gap with this post and share my own experience.

Prerequisites

First, make sure you have the right transceivers. Short wave 8Gb FC transceivers are limited to 190m when using OM4 fibre. If you need to connect switches over a longer distance, use long wave SFP+ modules, which have maximum distance of 10km.

Second, change the default switch Domain IDs. All switches within the same fabric must have unique IDs. By default Brocade switches come with the Domain ID set to 1. If you’re merging two redundant fabrics, make sure that the second pair of the switches have Domain IDs set to 2.

Third, verify that the switches you’re interconnecting have compatible zoning configuration. Brocade is very specific on how zoning should be configured for two fabrics to merge. There are at least nine different scenarios, but we’ll touch only on three most common ones. If you want to get more details, refer to the Brocade Fabric OS Administrator’s Guide and specifically the section called “Zone merging scenarios”.

Zone merging scenarios

Scenario 1: Switch A does not have a defined configuration. Switch B has a defined configuration.

This is the most straightforward scenario when you are adding a brand new Switch A to an existing fabric. As a result of the merge configuration from the Switch B propagates to the switch A.

Scenario 2: Switch A and Switch B have different defined configurations. Switch B has an effective configuration.

This is the scenario where you have two individual fabrics with their own set of aliases, zones and defined configurations. There is a catch here. If you want to merge such fabrics, you MUST have unique set of aliases, zones and configurations on each fabric. If this requirement is not met, fabrics won’t merge and you will end up with two segmented fabrics because of the zoning conflict. You also MUST disable effective zoning configuration on Switch A.

Outage is not required, because typically you have two redundant fabrics – fabric A and B in each location. And you can do one switch at a time. If you are still concerned, implement Scenario 3.

Scenario 3: Switch A and Switch B have the same defined and effective configuration.

This is the easiest path and is what Brocade calls a “clean merge”. Under this scenario you will have to recreate the same configs on both fabrics. That means you MUST have completely identical aliases, zones and configs on Switch A and Switch B.

This is the easiest and least disruptive path if you are worried that disabling effective configuration on the switches may cause issues.

Real world scenario

In my case I went with scenario 2 for two reasons: one – it was a DR site where I could temporarily bring down both fabrics and two – I didn’t need to manually add aliases/zones/configs to the switches as I would have to in scenario 3. Once fabrics are merged, zones from Switch B propagate to Switch A and you can simply combine them in one zone in the GUI, which is just a few mouse clicks.

site_topology

Here is the step by step process. First step is to change Domain IDs on the second pair of switches. You can do that both from GUI and CLI. Bear in mind that even if you’ve picked scenario 3 as the least disruptive approach for merging zones, changing Domain IDs will still be disruptive. Because switch has to be disabled before making the change.

From the Web Tools go to Switch Administration, disable the switch in the Switch Status section, type in the new Domain ID and re-enable the switch:

domain_id

If you want to take the CLI path, run the following. Switch will ask you a series of questions. You can accept all defaults, except for the Domain field:

> switchdisable
> configure
> switchenable
> fabricshow

Next disable the effective configuration on the Switch A either from GUI or CLI:

> cfgdisable
> cfgactvshow

At this point you can interconnect the switches and you should see the following log entry on Switch A:

The effective configuration has changed to SWITCHB_CONFIG

The fabrics are now merged an you should see both switches under the Web Tools. If you see the switch in the Segmented Switches section, it means that something went wrong:

merged_fabrics

Clean up steps

Once the fabrics are merged you will see all zones in the Zone Admin interface, however, the effective configuration will be configuration from the Switch B. You will need to create a new configuration which combines all zones to enable connectivity between the devices connected to the Switch A.

From the operational perspective you can now manage zoning on either of the switches and when you save or enable a configuration it will propagate to all switches in the fabric automatically.

If you have redundant fabrics, which you normally do, repeat the steps for the second pair of switches.

Conclusion

Steps described in this post are for a basic switch setup. If you have a non-standard switch configuration or using some of the advanced features, make sure to check “Zone Merging” section in the Fabric OS Administrator’s Guide for any additional considerations.

Let me know if this was helpful.

 

VNX Unisphere Java Issues

February 14, 2016

java_foreverIf you are a systems engineer, I’m sure you have to deal with Java compatibility issues all the time as I do. You install a certain Java version to fix one client and it immediately brakes another one.

I reached a new level of ridiculousness when I had a customer with two EMC storage arrays, VNX1 in production and VNX2 at DR. One Java version worked for Unsiphere client on VNX1, but broke VNX2 client and vice versa.

This was quite painful until I found a solution on EMC community forums. There is a Java version which works for both arrays and it’s Java 7u51. I have tested it on the following Block OEs and it works like a charm:

  • VNX1 Block OE 05.32.000.5.209
  • VNX2 Block OE 05.33.008.5.119

Let me know if this helped you as well.

Dell Compellent Enterprise Manager: SQL Server Setup

January 25, 2016

Dell Compellent Enterprise Manager is a separate piece of software, which comes with every Compellent storage array deployment and allows you to monitor, manage, and analyze one or multiple arrays from a centralized management console.

Probably the most valuable feature of Enterprise Manager for an average user is its historical performance statistics. From the Storage Center GUI you can see only real-time data. Enterprise Manager is capable of keeping statistics for up to a year. And can obviously do a multitude of other things, such as assist with capacity planning, allow you to configure replication between production and DR sites, generate reports or simply serve as a single pain of glass interface to all of your Compellent storage arrays.

Enterprise Manager installation does not include an embedded database. If you want to deploy EM database on an existing Microsoft SQL database or install a new dedicated Microsoft SQL Express instance, you need to do it manually. The following guide describes how to install Enterprise manager with Microsoft SQL Server 2012 Express.

SQL Server Authentication

During installation, Enterprise Manager will ask you for authentication credentials to connect to the SQL database. In SQL Server you can use either the Windows Authentication Mode or Mixed Mode. Mixed Mode allows both Windows authentication and SQL Server authentication via local SQL Server accounts.

For a typical SQL Server setup, Microsoft does not recommend enabling SQL Server authentication and especially with the default “sa” account, as it’s seen as insecure. So if you have a centralized Microsoft SQL Server in your network you’ll most likely be using Microsoft Authentication. But for a dedicated SQL Server Express database it’s fine to enable Mixed Mode and use SQL Server authentication.

Make sure to enable Mixed Mode during SQL Server Express installation and enter a password for the “sa” account.

sql_authentication

SQL Server Network Configuration

Enterprise Manager uses TCP/IP to connect to the SQL database over port 1433. TCP/IP is not enabled by default in SQL Server Express, you will need to enable it manually.

sql_network

Use SQL Server Configuration Manager, which is installed with the database, and browse to SQL Server Network Configuration > Protocols for SQLEXPRESS > TCP/IP. Enable TCP/IP on the Protocols tab and assign port 1433 to TCP Port field in IPAll section.

Make sure to restart the SQL server to apply the settings and you should now be able to connect Enterprise Manager to the database.

If you get stuck, refer to Dell Compellent Enterprise Manager Installation Guide and specifically the section “Prepare a Microsoft SQL Server Database”. This guide is available in Dell Compellent Knowledge Center.