Posts Tagged ‘controller’

How to move aggregates between NetApp controllers

September 25, 2013

Stop Sign_91602

 

DISCLAMER: I ACCEPT NO RESPONSIBILITY FOR ANY DAMAGE OR CORRUPTION OF DATA THAT MAY OCCUR AS A RESULT OF CARRYING OUT STEPS DESCRIBED BELOW. YOU DO THIS AT YOUR OWN RISK.

 

We had an issue with high CPU usage on one of the NetApp controllers servicing a couple of NFS datastores to VMware ESX cluster. HA pair of FAS2050 had two shelves, both of them owned by the first controller. The obvious solution for us was to reassign disks from one of the shelves to the other controller to balance the load. But how do you do this non-disruptively? Here is the plan.

In our setup we had two controllers (filer1, filer2), two shelves (shelf1, shelf2) both assigned to filer1. And two aggregates, each on its own shelf (aggr0 on shelf0, aggr1 on shelf1). Say, we want to reassign disks from shelf2 to filer2.

First step is to migrate all of the VMs from the shelf2 to shelf1. Because operation is obviously disruptive to the hosts accessing data from the target shelf. Once all VMs are evacuated, offline all volumes and an aggregate, to prevent any data corruption (you can’t take aggregate offline from online state, so change it to restricted first).

If you prefer to reassign disks in two steps, as described in NetApp Professional Services Tech Note #021: Changing Disk Ownership, don’t forget to disable automatic ownership assignment on both controllers, otherwise disks will be assigned back to the same controller again, right after you unown them:

> options disk.auto_assign off

It’s not necessary if you change ownership in one step as shown below.

Next step is to actually reassign the disks. Since they are already part of an aggregate you will need to force the ownership change:

filer1> disk assign 1b.01.00 -o filer2 -f

filer1> disk assign 1b.01.01 -o filer2 -f

filer1> disk assign 1b.01.nn -o filer2 -f

If you do not force disk reassignment you will get an error:

Assign request failed for disk 1b.01.0. Reason:Disk is part of a failed or offline aggregate or volume. Changing its owner may prevent aggregate or volume from coming back online. Ownership may be changed only by using the appropriate force option.

When all disks are moved across to filer2, new aggregate will show up in the list of aggregates on filer2 and you’ll be able to bring it online. If you can’t see the aggregate, force filer to rescan the drives by running:

filer2> disk show

The old aggregate will still be seen in the list on filer1. You can safely remove it:

filer1> aggr destroy aggr1

Advertisement

Overview of NetApp Replication and HA features

August 9, 2013

NetApp has quite a bit of features related to replication and clustering:

  • HA pairs (including mirrored HA pairs)
  • Aggregate mirroring with SyncMirror
  • MetroCluster (Fabric and Stretched)
  • SnapMirror (Sync, Semi-Sync, Async)

It’s easy to get lost here. So lets try to understand what goes where.

Simple-Metrocluster

SnapMirror

SnapMirror is a volume level replication, which normally works over IP network (SnapMirror can work over FC but only with FC-VI cards and it is not widely used).

Asynchronous version of SnapMirror replicates data according to schedule. SnapMiror Sync uses NVLOGM shipping (described briefly in my previous post) to synchronously replicate data between two storage systems. SnapMirror Semi-Sync is in between and synchronizes writes on Consistency Point (CP) level.

SnapMirror provides protection from data corruption inside a volume. But with SnapMirror you don’t have automatic failover of any sort. You need to break SnapMirror relationship and present data to clients manually. Then resynchronize volumes when problem is fixed.

SyncMirror

SyncMirror mirror aggregates and work on a RAID level. You can configure mirroring between two shelves of the same system and prevent an outage in case of a shelf failure.

SyncMirror uses a concept of plexes to describe mirrored copies of data. You have two plexes: plex0 and plex1. Each plex consists of disks from a separate pool: pool0 or pool1. Disks are assigned to pools depending on cabling. Disks in each of the pools must be in separate shelves to ensure high availability. Once shelves are cabled, you enable SyncMiror and create a mirrored aggregate using the following syntax:

> aggr create aggr_name -m -d disk-list -d disk-list

HA Pair

HA Pair is basically two controllers which both have connection to their own and partner shelves. When one of the controllers fails, the other one takes over. It’s called Cluster Failover (CFO). Controller NVRAMs are mirrored over NVRAM interconnect link. So even the data which hasn’t been committed to disks isn’t lost.

MetroCluster

MetroCluster provides failover on a storage system level. It uses the same SyncMirror feature beneath it to mirror data between two storage systems (instead of two shelves of the same system as in pure SyncMirror implementation). Now even if a storage controller fails together with all of its storage, you are safe. The other system takes over and continues to service requests.

HA Pair can’t failover when disk shelf fails, because partner doesn’t have a copy to service requests from.

Mirrored HA Pair

You can think of a Mirrored HA Pair as HA Pair with SyncMirror between the systems. You can implement almost the same configuration on HA pair with SyncMirror inside (not between) the system. Because the odds of the whole storage system (controller + shelves) going down is highly unlike. But it can give you more peace of mind if it’s mirrored between two system.

It cannot failover like MetroCluster, when one of the storage systems goes down. The whole process is manual. The reasonable question here is why it cannot failover if it has a copy of all the data? Because MetroCluster is a separate functionality, which performs all the checks and carry out a cutover to a mirror. It’s called Cluster Failover on Disaster (CFOD). SyncMirror is only a mirroring facility and doesn’t even know that cluster exists.

Further Reading

Unexpected Deduplication Impact on VMware I/O Latency

May 28, 2013

NetApp deduplication is a postponed process. During normal operation Data ONTAP only calculates hashes for the data blocks. Actual deduplication is carried out off-hours as per configured schedule. Hash calculation doesn’t affect performance in most cases. I talked about that in my previous post. NetApp states in its documentation that deduplication is a low-priority process:

When one deduplication process is running, there is 0% to 15% performance degradation on other applications.

Once I faced a situation when deduplication was configured to be carried out during business hours on one of the volumes. No one noticed that at some point volume run out of space and Data ONTAP wasn’t able to perform deduplication from that time. Situation became worse when Data ONTAP was upgraded from version 7.3.2 to 8.1.0. Now during deduplication filer tried to upgrade the fingerprint metadata to a new version at 15:00 every day with the message: “Fingerprint is being upgraded” and failed. It seems that the metadata upgrade is a very resource-intensive process and heavily affects I/O latency.

This volume was not a VMware datastore, but it sit on the same aggregate together with the several VMFS LUNs. Here what happened to the VMware I/O latency every day at 15:00 (click to enlarge):

dedup_issue_ed

I deleted the host name and the datastores names from the graph. You can see the large latency spike, which won’t turn yourVMs into kernel panic, but it’s not the thing you would want your production environment to experience every day.

The solution was simple. After space was increased on this volume, deduplication metadata upgrade performed successfully and problem went away. Additionally, deduplication was shifted to off-hours.

The simple lesson to learn: don’t schedule deduplication during the day, you never know what could possibly go wrong.

Windows MPIO with IBM storage

September 17, 2012

IBM mid-range storage systems (like DS3950) work in active/passive mode. It means that access to each LUN is given through one controller, in constrast to active/active storage where data between host and two controllers can flow in round-robin fashion. So redundant path here is used only as a failover. Software which provides this failover functionality is called Multipath I/O (MPIO) and has implementations for all operating systems. I’ll desribe how to configure MPIO version for Windows.

Installation

Prior to Windows Server 2008, Microsoft didn’t have its own MPIO implementation and MPIO was distributed with IBM DS Storage Manager product. Now you can install MPIO from “Feautures” sub-menu of Windows Server 2008 Server Manager. After installation is complete you will find MPIO configuration options under Control Panel and in Administrative Tools.

IBM storage works well with default Windows MPIO implementation, however it’s recommended to install IBM MPIO (device-specific module) from Storage Manager installation bundle. In my case MPIO installation file was called SMIA-WSX64-01.03.0305.0608.

Enable multipathing

Initially you will see two hard drives for each LUN in Device Manager. You can enable MPIO for particular hardware ID (in other words, storage system) on Discover Multi-Paths tab of MPIO control panel. You can’t do that with LUN granularity. After you add selected devices and reboot, you will see them on “MPIO Devices” tab. Now each LUN will be seen as a single hard drive in Device Manager.

Configure preferred path

MPIO supports several load-balancing policies, which are configured on a LUN basis from MPIO tab of a hard drive in Device Manager. As a Load Balance Policy select Fail Over Only. Then for each path select which is Active/Optimized and which is a Standby path. Also make active path Preferred, so that after failover it failbacks to it.

Don’t be confused by iSCSI on the figure. It’s the same for pure FC. It’s just for reference.

Check configuration

When you configure active and passive paths you assume that first path listed is to controller A and second path is to controller B. But, in fact, there is no indication of that from the configuration page and you can neither confirm nor deny it. The only ID you see is adapter ports but they don’t even map to the actual ports on HBAs.

To be able to check your configuration you need to install IBM SMdevices utility which comes with IBM DS Storage Manager. Run DS SM installation and go for Custom Installation. There you need to check only the Utilities part. In SMdevices output you can see which path is preferred for this LUN and if it’s configured as active (In Use):

C:\Program Files\IBM_DS\util>SMdevices
IBM System Storage DS Storage Manager Devices
. . .
\\.\PHYSICALDRIVE1 [Storage Subsystem ITSO5300, Logical
Drive 1, LUN 0, Logical Drive ID
<600a0b80002904de000005a246d82e17>, Preferred Path
(Controller-A): In Use]

References

The best reference I found on that topic is IBM Midrange System Storage Hardware Guide (SG24-7676-01), from p.453: DS5000 logical drive representation in Windows Server 2008. As well as Installing and Configuring MPIO guide from Microsoft.