Archive for the ‘Hardware’ Category
December 14, 2015
In my previous post Brocade 300 Initial Setup I briefly went through the firmware upgrade process, which is a part of every new switch installation. Make sure to check the post out for instructions on how to install a FTP server. You will need it to upload firmware to the switch.
I intentionally didn’t go into all details of firmware upgrade in my previous post, as it’s not necessary for a green field install. For a production switch the process is different. The reason is, if you’re upgrading to a Fabric OS version which is two or more versions apart from the current switch firmware revision, it will be disruptive and take the FC ports offline. Which is fine for a new deployment, but not ideal for production.
Disruptive and Non-Disruptive Upgrades
Brocade Fabric OS major firmware release versions are 6.3.x, 6.4.x, 7.0.x, 7.1.x, 7.2.x, etc. For a NDU the rule of thumb is to apply all major releases consecutively. For example, if my production FC switch is running FOS version 6.3.2b and I want to upgrade to version 7.2.1d, which is the latest recommended version for my hardware platform, then I’ll have to upgrade from 6.3.2b to 6.4.x to 7.0.x to 7.1.x and finally to 7.2.1d.
First and foremost save the current switch config and make a config backup via FTP (give write permissions to your FTP user’s home folder). Don’t underestimate this step. The last thing you want to do is to recreate all zoning if switch loses config during the upgrade:
> cfgSave
> configUpload

In case you need to restore, you can run the following command to download the backed up config back to the switch:
> configDownload
Next step is to install every firmware revision up to the desired major release (-s key is not required):
> firmwaredownload

Brocade switch has two firmware partitions – primary and secondary. Primary is the partition the switch boots from. And the secondary partition is used for firmware upgrades.
After each upgrade switch does a warm reboot. All FC ports stay up and switch continues to forward FC frames with no disruption to FC traffic. To accomplish that, switch uses the secondary partition to upload the new firmware to and then quickly swap them without disrupting FC switching.
At a high level the upgrade process goes as follows:
- The Fabric OS downloads the firmware to the secondary partition.
- The system performs a high availability reboot (haReboot). After the haReboot, the former secondary partition is the primary partition.
- The system replicates the firmware from the primary to the secondary partition.
Each upgrade may take up to 30 minutes to complete, but in my experience it doesn’t take more than 10 minutes. Once the first switch is upgraded, log back in and check the firmware version. And you will see how secondary partition has now become primary and firmware is uploaded to the secondary partition.

As a last step, check that FC paths on all hosts are active and then move on to the second switch. The steps are exactly the same for each upgrade.
Firmware Upload and Commit
Under normal circumstances when you run the firmwareDownload command, switch does the whole upgrade in an automated fashion. After the upgrade is finished you end up with both primary and secondary partitions on the same firmware version. But if you’re a large enterprise, you may want to test the firmware first and have an option to roll-back.
To accomplish that you can use -s key and disable auto-commit:

Switch will upload the firmware to the secondary partition, switch secondary and primary partitions after a reboot, but won’t replicate the firmware to the secondary partition. You can use the following command to restore firmware back to the previous version:
> firmwareRestore
Or if you’re happy with the firmware, commit it to the secondary partition:
> firmwareCommit
The only caveat here, a non-disruptive upgrade is not supported in this scenario. When switch reboots, it’ll be disruptive to FC traffic.
Important Notes
When downloading firmware for your switch, make sure to use switch’s vendor web-site. EMC Connectrix DS-300B, Brocade 300 and IBM SAN24B-4 are essentially the same switch, but firmware and supported versions for each OEM vendor may slightly vary. Here are the links where you can get FC switch firmware for some of the vendors:
- EMC: sign in to http://support.emc.com > find your switch model under the product section and go to downloads
- Brocade: sign in to http://www.brocade.com > go to Downloads section > enter FOS in the search field
- Dell: http://www.brocadeassist.com/dellsoftware/public/DELLAssist includes a subset of Fabric OS versions, which are tested and approved by Dell
- IBM: http://ibm.brocadeassist.com/public/FabricOSv6xRelease and http://ibm.brocadeassist.com/public/FabricOSv7xRelease are the links where you can download FOS for IBM switches. You can also go to http://support.ibm.com, search for the switch in the Product Finder and find FOS under the “Downloads (drivers, firmware, PTFs)” section
References
Tags:backup, Brocade, commit, configdownload, configupload, disruptive, fibre, fibre channel, firmware, firmwarecommit, firmwaredownload, firmwarerestore, firmwareshow, FTP, non-disruptive, primary, release revision, restore, secondary, update, upgrade, version
Posted in SAN, Storage | 2 Comments »
December 8, 2015
There are a few steps you need to do on the Brocades before moving on to cabling and zoning. The process is pretty straightforward, but worth documenting especially for those who are doing it for the first time.
After you power on the switch there are two ways of setting it up: GUI or CLI. We’ll go hardcore and do all configuration in CLI, but if you wish you can assign a static IP to your laptop from the 10.70.70.0/24 subnet and browse to https://10.77.77.77. Default credentials are admin/password for both GUI and CLI.
Network Settings
To configure network settings, such as a hostname, management IP address, DNS and NTP use the following commands:
> switchname PRODFCSW01
> ipaddrset
> dnsconfig
> tsclockserver 10.10.10.1
Most of these commands are interactive and ask for parameters. The only caveat is, if you have multiple switches under the same fabric, make sure to set NTP server to LOCL on all subordinate switches. It instructs them to synchronize their time with the principal switch.
Firmware Upgrade
This is the fun part. You can upgrade switch firmware using a USB stick, but the most common way is to upgrade using FTP. This obviously means that you need to install a FTP server. You can use FileZilla FTP server, which is decent and free.
Download the server and the client parts and install both. Default settings work just fine. Go to Edit > Users and add an anonymous user. Give it a home folder and unpack downloaded firmware into it. This is what it should look like:

To upgrade firmware run the following command on the switch, which is also interactive and then reboot:
> firmwaredownload -s
If you’re running a Fabric OS revision older than 7.0.x, such as 6.3.x or 6.4.x, then you will need to upgrade to version 7.0.x first and then to your target version, such as 7.3.x or 7.4.x.
In the next blog post I will discuss firmware upgrades in more detail, such as how to do a non-disruptive upgrade on a production switch and where to download vendor-specific FOS firmware from.
Tags:Brocade, configuration, disruptive, fiber channel, fibre, firmware, FTP, installation, non-disruptive, principal, subordinate, update, upgrade
Posted in Storage | 1 Comment »
November 20, 2015
I haven’t seen too many blog posts on how to configure Compellent for iSCSI. And there seem to be some confusion on what the best practices for iSCSI are. I hope I can shed some light on it by sharing my experience.
In this post I want to talk specifically about the Windows scenario, such as when you want to use it for Hyper-V. I used Windows Server 2012 R2, but the process is similar for other Windows Server versions.
Design Considerations
All iSCSI design considerations revolve around networking configuration. And two questions you need to ask yourself are, what your switch topology is going to look like and how you are going to configure your subnets. And it all typically boils down to two most common scenarios: two stacked switches and one subnet or two standalone switches and two subnets. I could not find a specific recommendation from Dell on whether it should be one or two subnets, so I assume that both scenarios are supported.
Worth mentioning that Compellent uses a concept of Fault Domains to group front-end ports that are connected to the same Ethernet network. Which means that you will have one fault domain in the one subnet scenario and two fault domains in the two subnets scenario.
For iSCSI target ports discovery from the hosts, you need to configure a Control Port on the Compellent. Control Port has its own IP address and one Control Port is configured per Fault Domain. When server targets iSCSI port IP address, it automatically discovers all ports in the fault domain. In other words, instead of using IPs configured on the Compellent iSCSI ports, you’ll need to use Control Port IP for iSCSI target discovery.
Compellent iSCSI Configuration
In my case I had two stacked switches, so I chose to use one iSCSI subnet. This translates into one Fault Domain and one Control Port on the Compellent.
IP settings for iSCSI ports can be configured at Storage Management > System > Setup > Configure iSCSI IO Cards.

To create and assign Fault Domains go to Storage Management > System > Setup > Configure Local Ports > Edit Fault Domains. From there select your fault domain and click Edit Fault Domain. On IP Settings tab you will find iSCSI Control Port IP address settings.


Host MPIO Configuration
On the Windows Server start by installing Multipath I/O feature. Then go to MPIO Control Panel and add support for iSCSI devices. After a reboot you will see MSFT2005iSCSIBusType_0x9 in the list of supported devices. This step is important. If you don’t do that, then when you map a Compellent disk to the hosts, instead of one disk you will see multiple copies of the same disk device in Device Manager (one per path).


Host iSCSI Configuration
To connect hosts to the storage array, open iSCSI Initiator Properties and add your Control Port to iSCSI targets. On the list of discovered targets you should see four Compellent iSCSI ports.
Next step is to connect initiators to the targets. This is where it is easy to make a mistake. In my scenario I have one iSCSI subnet, which means that each of the two host NICs can talk to all four array iSCSI ports. As a result I should have 2 host ports x 4 array ports = 8 paths. To accomplish that, on the Targets tab I have to connect each initiator IP to each target port, by clicking Connect button twice for each target and selecting one initiator IP and then the other.



Compellent Volume Mapping
Once all hosts are logged in to the array, go back to Storage Manager and add servers to the inventory by clicking on Servers > Create Server. You should see hosts iSCSI adapters in the list already. Make sure to assign correct host type. I chose Windows 2012 Hyper-V.

It is also a best practice to create a Server Cluster container and add all hosts into it if you are deploying a Hyper-V or a vSphere cluster. This guarantees consistent LUN IDs across all hosts when LUN is mapped to a Server Cluster object.
From here you can create your volumes and map them to the Server Cluster.
Check iSCSI Paths
To make sure that multipathing is configured correctly, use “mpclaim” to show I/O paths. As you can see, even though we have 8 paths to the storage array, we can see only 4 paths to each LUN.

Arrays such as EMC VNX and NetApp FAS use Asymmetric Logical Unit Access (ALUA), where LUN is owned by only one controller, but presented through both. Then paths to the owning controller are marked as Active/Optimized and paths to the non-owning controller are marked as Active/Non-Optimized and are used only if owning controller fails.
Compellent is different. Instead of ALUA it uses iSCSI Redirection to move traffic to a surviving controller in a failover situation and does not need to present the LUN through both controllers. This is why you see 4 paths instead of 8, which would be the case if we used an ALUA array.
References
Tags:ALUA, Asymmetric Logical Unit Access, Compellent, Control Port, dell, Design, discovery, failover, Fault Domain, Hyper-V, initiator, iSCSI, iSCSI Redirection, LUN, MPIO, multipathing, non-optimized, optimized, path, SC4020, Subnet, switch, target, windows
Posted in Compellent | 25 Comments »
November 20, 2015
I come across this issue too often. You need to fetch some information for the customer from the My AutoSupport web-site and can’t because the last AutoSupport message is from half a year ago.
Check AutoSupport State
When you list the AutoSupport history on the target system you see something similar to this:
# autosupport history show

Mail Server Configuration
If AutoSupport is configured to use SMTP as in this case, then the first place to check is obviously the mail server. The most common cause of the issue is blocked relay.
There are two things you need to make sure are configured: NetApp controllers management IPs are whitelisted on the mail server and authentication is disabled.
To set this up on a Exchange server go to Exchange Management Console > Server Configuration > Hub Transport, select a Receive Connector (or create it if you don’t have one for whitelisting already), go to properties and add NetApp IPs on the network tab.

Then make sure to enable Externally Secured authentication type on the Authentication tab.

Confirm AutoSupport is Working
To confirm that the issue is fixed send an AutoSupport message either from OnCommand System Manager or right from the console and make sure that status shows “sent-successfull”.
# options autosupport.doit Test
# autosupport history show

Tags:ASUP, authentication, AutoSupport, controllers, Exchange, failed, fix, history, Hub Transport, IP, issue, mail server, message, NetApp, Receive Connector, relay, smtp, troubleshoot, White List
Posted in NetApp | Leave a Comment »
March 23, 2015
Hit an issue today where VNXe array FC ports negotiate to L-port instead of F-port when Fill Word is set to Mode 3 (ARB/ARB then IDLE/ARB). Result – loss of connectivity on the affected link.

Recommended FC Fill Word for VNX/VNXe arrays is Mode 3. Generally it’s a good idea to set them according to best practice as part of each installation. Apparently, when changing Fill Word from legacy Mode 0 (IDLE/IDLE) to Mode 3 (ARB/ARB then IDLE/ARB) array might negotiate as L-port and FC path goes down.
Solution is to statically configure port as F-port in port settings.

Environment:
- Dell M5424 8Gb Fibre Channel Switch: Brocade FOS v7.2.1b
- EMC VNXe 3200: Block OE v3.1.1.4993502
Tags:ARB, ARBff, Brocade, dell, F-Port, Fabric OS, FC, fiber channel, fibre channel, Fill Word, Fillword, firmware, FOS, IDLE, L-Port, M5424, Negotiate, VNX, VNXe
Posted in Storage | Leave a Comment »
February 12, 2015
This’s been discussed multiple times. I run into these issues all the time as well. The first issue is that Java won’t let Brocade Management GUI to run, because of the security key length. The error is: “Failed to validate certificate. The application will not be executed”. Solution is to change one line in “C:\Program Files (x86)\Java\jre7\lib\security\java.security” from:
jdk.certpath.disabledAlgorithms=MD2, RSA keySize < 1024
to
jdk.certpath.disabledAlgorithms=MD2, RSA keySize < 256

The second problem is not signed certificate. The error would read as follows: “Application Blocked by Java Security. Your security settings have blocked an application signed with an expired or not-yet-valid certificate from running”. To work around that lower the security from High to Medium in Java Control Panel on Securtiy tab.

An the third trouble is Java itself. Java 8 is not supported as of Feb 2015. I wasn’t able to get this working even with the link in the exception list on the latest Java release. So I had to downgrade to version 7. It might have changed by the time you’re reading it.
Tags:Brocade, Certificate, Grief, java, Pain, Suffering, Web Tools
Posted in SAN | Leave a Comment »
February 11, 2015
Have you ever stumbled upon AD authentication issues on VNX, even though it all looked configured properly? LDAP integration has always been a PITA on storage arrays and blade chassis as usually there is no way to troubleshoot what the actual error is.

If VNX cannot lookup the user or group that you’re trying to authenticate against in AD, you’ll see just this. Now go figure why it’s getting upset about it. Even though you can clearly see the group configured in “Role Mapping” and there doesn’t seem to be any typos.
Common problem is Nested Groups. By default VNX only checks if your account is under the specified AD group and doesn’t traverse the hierarchy. So for example, if your account is under the group called IT_Admins in AD, IT_Admins is added to Domain Admins and Domain Admins is in “Role Mapping” – it’s not gonna work.

To make it work change “Nested Group Level” to something appropriate for you and this’d resolve the issue and make your life happier.
Tags:AD, authentication, EMC, error, Integration, issue, LDAP, Nested Groups, Problem, VNX
Posted in VNX | 2 Comments »
February 20, 2014
In my previous post “EMC Isilon Overview” I talked about general Isilon configuration. In this post I want to describe some of the performance tuning options.
SSD Acceleration
You can choose from three types of Isilon nodes: S-series, X-series and NL-series. And within the node series you can select amount of memory, number/size/type of disk drives and the number of 1GB/s and 10GB/s network ports.
S- and X- series nodes can have SSD drives, which you can use for metadata and/or data, depending on how much flash storage you have. Isilon have four SSD strategies: “Metadata Read Acceleration”, “Metadata Read/Write Acceleration”, “Data on SSDs” and “Avoid SSDs”. All strategies are pretty much self-explanatory. When using first strategy Isilon creates a copy of metadata on SSD disks for read acceleration. With second strategy it mirrors metadata reads and writes to SSDs (requires four to six times more space). Third strategy allows you to have data on SSDs. And forth disables SSD acceleration all together. You can configure SSD strategy on a file level policy level. And if you want to have “Data on SSDs” you can redirect only particular files (say with the most recent access timestamps) to SSDs.
Isilon allows you to have SSD metadata acceleration even on node pools that don’t have SSDs in them. It’s called Global Namespace Acceleration (GNA) and requires at least 20% of nodes to have 1 or more SSD disks and requires 1.5% or more (2% is recommended) of total storage to be SSD-based.
Data Access Patterns
Isilon can be used for different type of workloads. You can have VMware datastores on it connected via iSCSI or have CIFS file shares or maybe use it for streaming. Depending on your data access patterns you can tweak Isilon for Random, Concurrency or Streaming content, which affects how Isilon writes data on disks and how it uses its cache.
“Random” read/write type of workloads are typical for VMware environments. With this setting Isilon disables prefetching of data into read cache. And this setting is default for iSCSI LUNs.
With “Concurrency” Isilon optimizes data layout for simultaneous access of many files from the storage array. It uses moderate level of prefetching in this case. And for every 32MB of a file it tries to hit the same disk within the node. This it is default for all data except iSCSI LUNs.
And “Streaming” is for the fast access to big files like media content. It has the most aggressive level of prefetching and tries to use as many disk drives as possible when writing data on a cluster.
SmartCache
This setting affects write caching. With SmartCache turned on, data that comes in to a cluster will be cached in node’s memory. This is not NVRAM and if node fails uncommitted data is lost. This might not be that critical for NFS and CIFS data, but loosing iSCSI blocks can result in file system being unreadable. So be careful with SmartCache. You can specifically disable write cache on iSCSI LUNs if you want to.
Accelerator Nodes
Isilon provides two types of accelerator nodes: Backup Accelerator and Performance Accelerators. Both of them don’t contribute their disks to storage pool and provide additional capabilities to a cluster instead. Backup accelerator has four 4GB/s FC ports for connections to tape libraries and allows to offload backups from the storage nodes. And performance accelerators add additional CPU and memory capacity to a cluster without adding any storage.
Data Protection Overhead
The default data protection level on Isilon is +2:1. It protects the storage system from two disk failures or one node failure. It gives you sufficient level of protection and doesn’t have too much overhead. If you need a higher level of protection, you need to realize that it can introduce much overhead. Below is the table which shows amount of overhead depending on protection level and number of nodes.

As you can see, for six nodes +1 and +3:1 protection levels have the same overhead, but +3:1 gives better protection. So you need to understand the impact of changing protection level and set it according to your needs.
Tags:acceleration, cache, CIFS, concurrency, data protection, datastore, EMC, Global Namespace Acceleration, GNA, iSCSI, Isilon, LUN, metadata, NAS, overhead, pattern, performance, prefetching, random, Scale-Out, share, SmartCache, SSD, vmware, workload
Posted in Isilon | 2 Comments »
February 20, 2014
OneFS Overview
EMC Isilon OneFS is a storage OS which was built from the ground up as a clustered system.
NetApp’s Clustered ONTAP for example has evolved from being an OS for HA-pair of storage controllers to a clustered system as a result of integration with Spinnaker intellectual property. It’s not necessarily bad, because cDOT shows better performance on SPECsfs2008 than Isilon, but these systems still have two core architectural differences:
1. Isilon doesn’t have RAIDs and complexities associated with them. You don’t choose RAID protection level. You don’t need to think about RAID groups and even load distribution between them. You don’t even have spare drives per se.
2. All data on Isilon system is kept on one volume, which is a one big distributed file system. cDOT use concept of infinite volumes, but bear in mind that each NetApp filer has it’s own file system beneath. If you have 24 NetApp nodes in a cluster, then you have 24 underlying file systems, even though they are viewed as a whole from the client standpoint.
This makes Isilon very easy to configure and operate. But its simplicity comes at a price of flexibility. Isilon web interface has few options to configure and not very feature rich.
Isilon Nodes and Networking
In a nutshell Isilon is a collection of a certain number of nodes connected via 20Gb/s DDR InfiniBand back-end network and either 1GB/s or 10GB/s front-end network for client connections. There are three types of Isilon nodes S-Series (SAS + SSD drives) for transactional random access I/O, X-Series (SATA + SSD drives) for high throughput applications and NL-series (SATA drives) for archival or not frequently used data.
If you choose to have two IB switches at the back-end, then you’ll have three subnets configured for internal network: int-a, int-b and failover. You can think of a failover network as a virtual network in front of int-a and int-b. So when the packet comes to failover network IP address, the actual IB interface that receives the packet is chosen dynamically. That helps to load-balance the traffic between two IB switches and makes this set up an active/active network.

On the front-end you can have as many subnets as you like. Subnets are split between pools of IP addresses. And you can add particular node interfaces to the pool. Each pool can have its own SmartConnect zone configured. SmartConnect is a way to load-balance connections between the nodes. Basically SmartConnect is a DNS server which runs on the Isilon side. You can have one SmartConnect service on a subnet level. And one SmartConnect zone (which is simply a domain) on each of the subnet pools. To set up SmartConnect you’ll need to assign an IP address to the SmartConnect service and set a SmartConnect zone name on a pool level. Then you create an “A” record on DNS for the SmartConnect service IP address and delegate SmartConnect DNS zone to this IP. That way each time you refer to the SmartConnect zone to get access to a file share you’ll be redirected to dynamically picked up node from the pool.
SmartPools
Each type of node is automatically assigned to what is called a “Node Pool”. Nodes are grouped to the same pool if they are of the same series, have the same amount of memory and disks of the same type and size. Node Pool level is one of the spots where you can configure protection level. We’ll talk about that later. Node Pools are grouped within Tiers. So you can group NL node pool with 1TB drives and NL node pool with 3TB drives into an archive tier if you wish. And then you have File Pool Policies which you can use to manage placement of files within the cluster. For example, you can redirect files with specific extension or file size or last access time to be saved on a specific node pool or tier. File pool policies also allow you to configure data protection and override the default node pool protection setting.
SmartPools is a concept that Isilon use to name Tier/Node Pool/File Pool Policy approach. File placement is not applied automatically, otherwise it would cause high I/O overhead. It’s implemented as a job on the cluster instead which runs at 22:00 every day by default.
Data Layout and Protection
Instead of using RAIDs, Isilon uses FEC (Forward Error Correction) and more specifically a Reed-Solomon algorithm to protect data on a cluster. It’s similar to RAID5 in how it generates a protection block (or blocks) for each stripe. But it happens on a software level, instead of hardware as in storage arrays. So when a file comes in to a node, Isilon splits the file in stripe units of 128KB each, generates one FEC protection unit and distributes all of them between the nodes using back-end network. This is what is called “+1” protection level, where Isilon can sustain one disk or one node failure. Then you have “+2”, “+3” and “+4”. In “+4” you have four FECs per stripe and can sustain four disk or node failures. Note however that there is a rule that the number of data stripe units in a stripe has to be greater than number of FEC units. So the minimum requirement for “+4” protection level is 9 nodes in a cluster.

The second option is to use mirroring. You can have from 2x to 8x mirrors of your data. And the third option is “+2:1” and “+3:1” protection levels. These protection levels let you balance between the data protection and amount of the FEC overhead. For example “+2:1” setting compared to “+2” can sustain two drive failures or one node failure, instead of two node failure protection that “+2” offers. And it makes sense, since simultaneous two node failure is unlikely to happen. There is also a difference in how the data is laid out. In “+2” for each stripe Isilon uses one disk on each node and in “+2:1” it uses two disks on each node. And first FEC in this case goes to first subset of disks and second goes to second.
One benefit of not having RAID is that you can set protection level with folder or even file granularity. Which is impossible with conventional RAIDs. And what’s quite handy, you can change protection levels without recreation of storage volumes, as you might have to do while transitioning between some of the RAID levels. When you change protection level for any of the targets, Isilon creates a low priority job which redistributes data within the cluster.
Tags:cDOT, cluster, Data ONTAP, DOT, EMC, FEC, File Pool Policy, Forward Error Correction, IB, InfiniBand, Isilon, NetApp, node, Node Pool, OneFS, RAID, Reed-Solomon, SmartConnect, SmartPools, storage, stripe, tier
Posted in Isilon | 2 Comments »
September 25, 2013

DISCLAMER: I ACCEPT NO RESPONSIBILITY FOR ANY DAMAGE OR CORRUPTION OF DATA THAT MAY OCCUR AS A RESULT OF CARRYING OUT STEPS DESCRIBED BELOW. YOU DO THIS AT YOUR OWN RISK.
We had an issue with high CPU usage on one of the NetApp controllers servicing a couple of NFS datastores to VMware ESX cluster. HA pair of FAS2050 had two shelves, both of them owned by the first controller. The obvious solution for us was to reassign disks from one of the shelves to the other controller to balance the load. But how do you do this non-disruptively? Here is the plan.
In our setup we had two controllers (filer1, filer2), two shelves (shelf1, shelf2) both assigned to filer1. And two aggregates, each on its own shelf (aggr0 on shelf0, aggr1 on shelf1). Say, we want to reassign disks from shelf2 to filer2.
First step is to migrate all of the VMs from the shelf2 to shelf1. Because operation is obviously disruptive to the hosts accessing data from the target shelf. Once all VMs are evacuated, offline all volumes and an aggregate, to prevent any data corruption (you can’t take aggregate offline from online state, so change it to restricted first).
If you prefer to reassign disks in two steps, as described in NetApp Professional Services Tech Note #021: Changing Disk Ownership, don’t forget to disable automatic ownership assignment on both controllers, otherwise disks will be assigned back to the same controller again, right after you unown them:
> options disk.auto_assign off
It’s not necessary if you change ownership in one step as shown below.
Next step is to actually reassign the disks. Since they are already part of an aggregate you will need to force the ownership change:
filer1> disk assign 1b.01.00 -o filer2 -f
filer1> disk assign 1b.01.01 -o filer2 -f
…
filer1> disk assign 1b.01.nn -o filer2 -f
If you do not force disk reassignment you will get an error:
Assign request failed for disk 1b.01.0. Reason:Disk is part of a failed or offline aggregate or volume. Changing its owner may prevent aggregate or volume from coming back online. Ownership may be changed only by using the appropriate force option.
When all disks are moved across to filer2, new aggregate will show up in the list of aggregates on filer2 and you’ll be able to bring it online. If you can’t see the aggregate, force filer to rescan the drives by running:
filer2> disk show
The old aggregate will still be seen in the list on filer1. You can safely remove it:
filer1> aggr destroy aggr1
Tags:aggregate, assignment, controller, corruption, CPU, datastore, disk, ESX, FAS, Filer, force, load balancing, migrate, NetApp, NFS, non-disruptively, offline, online, own, ownership, reassign, restricted, shelf, unown, VM, vmware, volume
Posted in NetApp, VMware | Leave a Comment »