Posts Tagged ‘plex’

Overview of NetApp Replication and HA features

August 9, 2013

NetApp has quite a bit of features related to replication and clustering:

  • HA pairs (including mirrored HA pairs)
  • Aggregate mirroring with SyncMirror
  • MetroCluster (Fabric and Stretched)
  • SnapMirror (Sync, Semi-Sync, Async)

It’s easy to get lost here. So lets try to understand what goes where.

Simple-Metrocluster

SnapMirror

SnapMirror is a volume level replication, which normally works over IP network (SnapMirror can work over FC but only with FC-VI cards and it is not widely used).

Asynchronous version of SnapMirror replicates data according to schedule. SnapMiror Sync uses NVLOGM shipping (described briefly in my previous post) to synchronously replicate data between two storage systems. SnapMirror Semi-Sync is in between and synchronizes writes on Consistency Point (CP) level.

SnapMirror provides protection from data corruption inside a volume. But with SnapMirror you don’t have automatic failover of any sort. You need to break SnapMirror relationship and present data to clients manually. Then resynchronize volumes when problem is fixed.

SyncMirror

SyncMirror mirror aggregates and work on a RAID level. You can configure mirroring between two shelves of the same system and prevent an outage in case of a shelf failure.

SyncMirror uses a concept of plexes to describe mirrored copies of data. You have two plexes: plex0 and plex1. Each plex consists of disks from a separate pool: pool0 or pool1. Disks are assigned to pools depending on cabling. Disks in each of the pools must be in separate shelves to ensure high availability. Once shelves are cabled, you enable SyncMiror and create a mirrored aggregate using the following syntax:

> aggr create aggr_name -m -d disk-list -d disk-list

HA Pair

HA Pair is basically two controllers which both have connection to their own and partner shelves. When one of the controllers fails, the other one takes over. It’s called Cluster Failover (CFO). Controller NVRAMs are mirrored over NVRAM interconnect link. So even the data which hasn’t been committed to disks isn’t lost.

MetroCluster

MetroCluster provides failover on a storage system level. It uses the same SyncMirror feature beneath it to mirror data between two storage systems (instead of two shelves of the same system as in pure SyncMirror implementation). Now even if a storage controller fails together with all of its storage, you are safe. The other system takes over and continues to service requests.

HA Pair can’t failover when disk shelf fails, because partner doesn’t have a copy to service requests from.

Mirrored HA Pair

You can think of a Mirrored HA Pair as HA Pair with SyncMirror between the systems. You can implement almost the same configuration on HA pair with SyncMirror inside (not between) the system. Because the odds of the whole storage system (controller + shelves) going down is highly unlike. But it can give you more peace of mind if it’s mirrored between two system.

It cannot failover like MetroCluster, when one of the storage systems goes down. The whole process is manual. The reasonable question here is why it cannot failover if it has a copy of all the data? Because MetroCluster is a separate functionality, which performs all the checks and carry out a cutover to a mirror. It’s called Cluster Failover on Disaster (CFOD). SyncMirror is only a mirroring facility and doesn’t even know that cluster exists.

Further Reading

Advertisement

NetApp storage architecture

October 9, 2011

All of us are get used to SATA disk drives connected to our workstations and we call it storage. Some organizations has RAID arrays. RAID is one level of logical abstraction which combine several hard drives to form logical drive with greater size/reliability/speed. What would you say if I’d tell you that NetApp has following terms in its storage architecture paradigm: disk, RAID group, plex, aggregate, volume, qtree, LUN, directory, file. Lets try to understand how all this work together.

RAID in NetApp terminology is called RAID group. Unlike ordinary storage systems NetApp works mostly with RAID 4 and RAID-DP. Where RAID 4 has one separate disk for parity and RAID-DP has two. Don’t think that it leads to performance degradation. NetApp has very efficient implementation of these RAID levels.

Plex is collection of RAID groups and is used for RAID level mirroring. For instance if you have two disk shelves and SyncMirror license then you can create plex0 from first shelf drives and plex1 from second shelf.  This will protect you from one disk shelf failure.

Aggregate is simply a highest level of hardware abstraction in NetApp and is used to manage plexes, raid groups, etc.

Volume is a logical file system. It’s a well-known term in Windows/Linux/Unix realms and serves for the same goal. Volume may contain files, directories, qtrees and LUNs. It’s the highest level of abstraction from the logical point of view. Data in volume can be accessed by any of protocols NetApp supports: NFS, CIFS, iSCSI, FCP, WebDav, HTTP.

Qtree can contain files and directories or even LUNs and is used to put security and quota rules on contained objects with user/group granularity.

LUN is necessary to access data via block-level protocols like FCP and iSCSI. Files and directories are used with file-level protocols NFS/CIFS/WebDav/HTTP.