Posts Tagged ‘datastore’

History of vSphere Storage Size Limitations

June 5, 2016

data-storageThis seems like a straightforward topic. If you are on vSphere 6 you can create VMFS datastores and VM disks as big as 64TB (62TB for VM disks to be precise). The reality is, not all customers are running the latest and greatest for various reasons. The most common one is concerns about reliability. vSphere 6 is still at version 6.0. Once vSphere 6.1 comes out we will see wider adoption. At this stage I see various versions of vSphere 5 in the field. And even vSphere 4 at times, which is officially not supported by VMware since May 2015. So it’s not surprising I still get this question, what are the datastore and disk size limits for various vSphere versions?

Datastore size limit

The biggest datastore size for both VMFS3 and VMFS5 is 64TB. What you need to know is, VMFS3 file system uses MBR partition style, which is limited to 2TB. The way VMFS3 overcomes this limitation is by using extents. To extend VMFS3 partition to 64TB you would need 32 x 2TB LUNs on the storage array. VMFS5 file system has GPT partition style and can be extended to 64TB by expanding one underlying LUN without using extents, which is a big plus.

VMFS3 datastores are rare these days, unless you’re still on vSphere 4. The only consideration here is whether your vSphere 5 environment was a greenfield build. If the answer is yes, then all your datastores are VMFS5 already. If environment was upgraded from vSphere 4, you need to make sure all datastores have been upgraded (or better recreated) to VMFS5 as well. If the upgrade wasn’t done properly you may still have some VMFS3 datastores in your environment.

Disk size limit

For .vmdk disks the limitation had been 2TB for a long time, until VMware increased the limit to 62TB in vSphere 5.5. So if all of your datastores are VMFS5, this means you still have 2TB  .vmdk limitation if you’re on 5.0 or 5.1.

For VMFS3 file system you also had an option to choose block size – 1MB, 2MB, 4MB or 8MB. 2TB .vmdk disks were supported only with the 8MB block size. The default was 1MB. So if you chose the default block size during datastore creation you were limited to 256GB .vmdk disks.

The above limits led to proliferation of Raw Device Mapping disks in many pre 5.5 environments. Those customers who needed VM disks bigger than 2TB had to use RDMs, as physical RDMs starting from VMFS5 supported 64TB (pRDMs on VMFS3 were still limited to 2TB).

This table summarises storage configuration maximums for vSphere version 4.0 to 6.0:

vSphere Datastore Size VMDK Size pRDM Size
4.0 64TB 2TB 2TB
4.1 64TB 2TB 2TB
5.0 64TB 2TB 64TB
5.1 64TB 2TB 64TB
5.5 64TB 62TB 64TB
6.0 64TB 64TB 64TB


The bottom line is, if you’re on vSphere 5.5 or 6.0 and all your datastores are VMFS5 you can forget about the legacy .vmdk disk size limitations. And if you’re not on vSphere 5.5 you should consider upgrading as soon as possible, as vSphere 5.0 and 5.1 are coming to an end of support on 24 of August 2016.

Requirements for Unmounting a VMware Datastore

December 30, 2015

I have come across issues unmounting VMware datastores myself multiple times. In recent vSphere versions vCenter shows you a warning if some of the requirements are not fulfilled. It is not the case in the older vSphere versions, which makes it harder to identify the issue.

Interestingly, there are some pre-requisites which even vCenter does not prompt you about. I will discuss all of the requirements in this post.

General Requirements

In this category I combine all requirements which vCenter checks against, such as:

Requirement: No virtual machine resides on the datastore.

Action: You have to make sure that the host you are unmounting the datastore from has no virtual machines (running or stopped) registered on this datastore.  If you are unmounting just one datastore from just one host, you can simply vMotion all VMs residing on the datastore from this host to the remaining hosts. If you are unmounting the datastore from all hosts, you’ll have to either Storage vMotion all VMs to the remaining datastores or shutdown the VMs and unregister them from vCenter.


Requirement: The datastore is not part of a Datastore Cluster.

Requirement: The datastore is not managed by storage DRS.

Action: Drag and drop the datastore from the Datastore Cluster in vCenter to move it out of the Datastore Cluster. Second requirement is redundant, because SDRS is enabled on a datastore which is configured withing a Datastore Cluster. By removing a datastore from a Datastore Cluster you atomatically disable storage DRS on it.

Requirement: Storage I/O control is disabled for this datastore.

Action: Go to the datastore properties and uncheck Storage I/O Control option. On a SIOC-enabled datastore vSphere creates a folder named after the block device ID and keeps a file called “slotsfile” in it. Its size will change to 0.00 KB once SIOC is disabled.

Requirement: The datastore is not used for vSphere HA heartbeat.

Action: vSphere HA automatically selects two VMware datastores, creates .vSphere-HA folders and use them to keep HA heartbeats. If you have more than two datastores in your cluster, you can control which datastores are selected. Go to cluster properties > Datastore Heartbeating (under vSphere HA section) and select preferred datastores from the list. This will work if you are unmounting one datastore. If you need to unmount all datastores, you will have to disable HA on the cluster level altogether.


Additional Requirements

Requirements which fall in this category are not checked by vCenter, but are still have to be satisfied. Otherwise vCenter will not let you unmount the datastore.

Requirement: The datastore is not used for swap.

Action: When VM is powered on by default it creates a swap file in the VM directory with .vswp extension. You can change the default behavior and on a per host basis select a dedicated datastore where host will be creating swap files for virtual machines. This setting is enabled in cluster properties in Swapfile Location section. The datastore is then selected for each host in Virtual Machine Swapfile Location settings on the the host configuration tab.

What host also does when you enable this option is it creates a host local swap file, which is named something like this: sysSwap-hls-55de2f14-6c5d-4d50-5cdf-000c296fc6a7.swp

There are scenarios where you need to unmount the swap datastore, such as when you say need to reconnect all of your storage from FC to iSCSI. Even if you shutdown all of your VMs, datastore unmount will fail because the host swap files are still there and you will see an error such as this:

The resource ‘Datastore Name: iSCSI1 VMFS uuid: 55de473c-7f3ae2b5-f9f8-000c29ba113a’ is in use.

See the error stack for details on the cause of the problem.

Error Stack:

Call “HostStorageSystem.UnmountVmfsVolume” for object “storageSystem-29” on vCenter Server “VC.lab.local” failed.

Cannot unmount volume ‘Datastore Name: iSCSI1 VMFS uuid: 55de473c-7f3ae2b5-f9f8-000c29ba113a’ because file system is busy. Correct the problem to retry the operation.

The workaround is to change the setting on the cluster level to store VM swap file in VM directory and reboot all hosts. After a reboot the host .swp file will disappear.

If rebooting the hosts is not desirable, you can SSH to each host and type the following command:

# esxcli sched swap system set –hostlocalswap-enabled false

To confirm that the change has taken effect run:

# esxcli sched swap system get

Then check the datastore and the .swp files should no longer be there.


If you satisfy all of the above requirements you should have no problems when unmounting VMware datastores. vSphere creates a few additional system folders on each of the datastores, such as .sdd.sf and .dvsData, but I personally have never had issues with them.

Zerto Overview

March 6, 2014

zerto-logoZerto is a VM replication product which works on a hypervisor level. In contrast to array level replication, which SRM has been using for a long time, it eliminates storage array from the equation and all the complexities which used to come along with it (SRAs, splitting the LUNs for replicated and non-replicated VMs, potential incompatibilities between the orchestrated components, etc).

Basic Operation

Zerto consists of two components: ZVM (Zerto Virtual Manger) and VRA (Virtual Replication Appliance). VRAs are VMs that need to be installed on each ESXi host within the vCenter environment (performed in automated fashion from within ZVM console). ZVM manages VRAs and all the replication settings and is installed one per vCenter. VRA mirrors protected VMs I/O operations to the recovery site. VMs are grouped in VPGs (Virtual Protection Groups), which can be used as a consistency group or just a container.

Protected VMs can be preseeded  to DR site. But what Zerto essentially does is it replicates VM disks to any datastore on recovery site where you point it to and then tracks changes in what is called a journal volume. Journal is created for each VM and is kept as a VMDK within the “ZeRTO volumes” folder on a target datastore. Every few seconds Zerto creates checkpoints on a journal, which serve as crash consistent recovery points. So you can recover to any point in time, with a few seconds granularity. You can set the journal length in hours, depending on how far you potentially would want to go back. It can be anywhere between 1 and 120 hours.Data-Replication-over-WAN

VMs are kept unregistered from vCenter on DR site and VM configuration data is kept in Zerto repository. Which essentially means that if an outage happens and something goes really wrong and Zerto fails to bring up VMs on DR site you will need to recreate VMs manually. But since VMDKs themselves are kept in original format you will still be able to attach them to VMs and power them on.

Failover Scenarios

There are four failover scenarios within Zerto:

  • Move Operation – VMs are shut down on production site, unregistered from inventory, powered on at DR site and protection is reversed if you decide to do so. If you choose not to reverse protection, VMs are completely removed from production site and VPG is marked as “Needs Configuration”. This scenario can be seen as a planned migration of VMs between the sites and needs both sites to be healthy and operational.
  • Failover Operation – is used in disaster scenario when production site might be unavailable. In this case Zerto brings up protected VMs on DR site, but it does not try to remove VMs from production site inventory and leave them as is. If production site is still accessible you can optionally select to shutdown VMs. You cannot automatically reverse protection in this scenario, VPG is marked as “Needs Configuration” and can be activated later. And when it is activated, Zerto does all the clean up operations on the former production site: shuts down VMs (if they haven’t been already), unregister them from inventory and move to VRA folder on the datastore.
  • Failover Test Operation – this is for failover testing and brings up VMs on DR site in a configured bubble network which is normally not uplinked to any physical network. VMs continue to run on both sites. Note that VMs disk files in this scenario are not moved to VMs folders (as in two previous scenarios) and are just connected from VRA VM folder. You would also notice that Zerto created second journal volume which is called “scratch” journal. Changes to the VM that is running on DR site are saved to this journal while it’s being tested.
  • Clone Operation – VMs are cloned on DR site and connected to network. VMs are not automatically powered on to prevent potential network conflicts. This can be used for instance in DR site testing, when you want to check actual networking connectivity, instead of connecting VMs to an isolated network. Or for implementing backups, cloned environment for applications testing, etc.

Zerto Journal Sizing

By default journal history is configured as 4 hours and journal size is unlimited. Depending on data change rate within the VM journal can be smaller or larger. 15GB is approximately enough storage to support a virtual machine with 1TB of storage, assuming a 10% change rate per day with four hours of journal history saved. Zerto has a Journal Sizing Tool which helps to size journals. You can create a separate journal datastore as well.

Zerto compared to VMware Replication and SRM

There are several replication products in the market from VMware. Standalone VMware replication, VMware replication + SRM orchestraion and SRM array-based replication. If you want to know more on how they compare to Zerto, you can read the articles mentioned in references below. One apparent Zerto advantage, which I want to mention here, is integration with vCloud Director, which is essential for cloud providers who offer DRaaS solutions. SRM has no vCloud Director support.


How to move aggregates between NetApp controllers

September 25, 2013

Stop Sign_91602




We had an issue with high CPU usage on one of the NetApp controllers servicing a couple of NFS datastores to VMware ESX cluster. HA pair of FAS2050 had two shelves, both of them owned by the first controller. The obvious solution for us was to reassign disks from one of the shelves to the other controller to balance the load. But how do you do this non-disruptively? Here is the plan.

In our setup we had two controllers (filer1, filer2), two shelves (shelf1, shelf2) both assigned to filer1. And two aggregates, each on its own shelf (aggr0 on shelf0, aggr1 on shelf1). Say, we want to reassign disks from shelf2 to filer2.

First step is to migrate all of the VMs from the shelf2 to shelf1. Because operation is obviously disruptive to the hosts accessing data from the target shelf. Once all VMs are evacuated, offline all volumes and an aggregate, to prevent any data corruption (you can’t take aggregate offline from online state, so change it to restricted first).

If you prefer to reassign disks in two steps, as described in NetApp Professional Services Tech Note #021: Changing Disk Ownership, don’t forget to disable automatic ownership assignment on both controllers, otherwise disks will be assigned back to the same controller again, right after you unown them:

> options disk.auto_assign off

It’s not necessary if you change ownership in one step as shown below.

Next step is to actually reassign the disks. Since they are already part of an aggregate you will need to force the ownership change:

filer1> disk assign 1b.01.00 -o filer2 -f

filer1> disk assign 1b.01.01 -o filer2 -f

filer1> disk assign 1b.01.nn -o filer2 -f

If you do not force disk reassignment you will get an error:

Assign request failed for disk 1b.01.0. Reason:Disk is part of a failed or offline aggregate or volume. Changing its owner may prevent aggregate or volume from coming back online. Ownership may be changed only by using the appropriate force option.

When all disks are moved across to filer2, new aggregate will show up in the list of aggregates on filer2 and you’ll be able to bring it online. If you can’t see the aggregate, force filer to rescan the drives by running:

filer2> disk show

The old aggregate will still be seen in the list on filer1. You can safely remove it:

filer1> aggr destroy aggr1

NetApp VSC Single File Restore Explained

August 5, 2013

netapp_dpIn one of my previous posts I spoke about three basic types of NetApp Virtual Storage Console restores: datastore restore, VM restore and backup mount. The last and the least used feature, but very underrated, is the Single File Restore (SFR), which lets you restore single files from VM backups. You can do the same thing by mounting the backup, connecting vmdk to VM and restore files. But SFR is a more convenient way to do this.


SFR is pretty much an out-of-the-box feature and is installed with VSC. When you create an SFR session, you specify an email address, where VSC sends an .sfr file and a link to Restore Agent. Restore Agent is a separate application which you install into VM, where you want restore files to (destination VM). You load the .sfr file into Restore Agent and from there you are able to mount source VM .vmdks and map them to OS.

VSC uses the same LUN cloning feature here. When you click “Mount” in Restore Agent – LUN is cloned, mapped to an ESX host and disk is connected to VM on the fly. You copy all the data you want, then click “Dismount” and LUN clone is destroyed.

Restore Types

There are two types of SFR restores: Self-Service and Limited Self-Service. The only difference between them is that when you create a Self-Service session, user can choose the backup. With Limited Self-Service, backup is chosen by admin during creation of SFR session. The latter one is used when destination VM doesn’t have connection to SMVI server, which means that Remote Agent cannot communicate with SMVI and control the mount process. Similarly, LUN clone is deleted only when you delete the SFR session and not when you dismount all .vmdks.

There is another restore type, mentioned in NetApp documentation, which is called Administartor Assisted restore. It’s hard to say what NetApp means by that. I think its workflow is same as for Self-Service, but administrator sends the .sfr link to himself and do all the job. And it brings a bit of confusion, because there is an “Admin Assisted” column on SFR setup tab. And what it actually does, I believe, is when Port Group is configured as Admin Assisted, it forces SFR to create a Limited Self-Service session every time you create an SFR job. You won’t have an option to choose Self-Assisted at all. So if you have port groups that don’t have connectivity to VSC, check the Admin Assisted option next to them.


Keep in mind that SFR doesn’t support VM’s with IDE drives. If you try to create SFR session for VMs which have IDE virtual hard drives connected, you will see all sorts of errors.

Monitoring ESX Storage Queues

July 30, 2013

6a00d8341c328153ef01774354e2fd970d-500wiQueue Limits

I/O data goes through several storage queues on its way to disk drives. VMware is responsible for VM queue, LUN queue and HBA queue. VM and LUN queues are usually equal to 32 operations. It means that each ESX host at any moment can have no more than 32 active operations to a LUN. Same is true for VMs. Each VM can have as many as 32 active operations to a datastore. And if multiple VMs share the same datastore, their combined I/O flow can’t go over the 32 operations limit (per LUN queue for QLogic HBAs has been increased from 32 to 64 operations in vSphere 5). HBA queue size is much bigger and can hold several thousand operations (4096 for QLogic, however I can see in my config that driver is configured with 1014 operations).

Queue Monitoring

You can monitor storage queues of ESX host from the console. Run “esxtop”, press “d” to view disk adapter stats, then press “f” to open fields selection and add Queue Stats by pressing “d”.

AQLEN column will show the queue depth of the storage adapter. CMDS/s is the real-time number of IOPS. DAVG is the latency which comes from the frame traversing through the “driver – HBA – fabric – array SP” path and should be less than 20ms. Otherwise it means that storage is not coping. KAVG shows the time which operation spent in hypervisor kernel queue and should be less than 2ms.

Press “u” to see disk device statistics. Press “f” to open the add or remove fields dialog and select Queue Stats “f”. Here you’ll see a number of active (ACTV) and queue (QUED) operations per LUN.  %USD is the queue load. If you’re hitting 100 in %USD and see operations under QUED column, then again it means that your storage cannot manage the load an you need to redistribute your workload between spindles.

Some useful documents:

Magic behind NetApp VSC Backup/Restore

June 12, 2013

netapp_dpNetApp Virtual Storage Console is a plug-in for VMware vCenter which provides capabilities to perform instant backup/restore using NetApp snapshots. It uses several underlying NetApp features to accomplish its tasks, which I want to describe here.

Backup Process

When you configure a backup job in VSC, what VSC does, is it simply creates a NetApp snapshot for a target volume on a NetApp filer. Interestingly, if you have two VMFS datastores inside one volume, then both LUNs will be snapshotted, since snapshots are done on the volume level. But during the datastore restore, the second volume will be left intact. You would think that if VSC reverts the volume to the previously made snapshot, then both datastores should be affected, but that’s not the case, because VSC uses Single File SnapRestore to restore the LUN (this will be explained below). Creating several VMFS LUNs inside one volume is not a best practice. But it’s good to know that VSC works correctly in this case.

Same thing for VMs. There is no sense of backing up one VM in a datastore, because VSC will make a volume snapshot anyway. Backup the whole datastore in that case.

Datastore Restore

After a backup is done, you have three restore options. The first and least useful kind is a datastore restore. The only use case for such restore that I can think of is disaster recovery. But usually disaster recovery procedures are separate from backups and are based on replication to a disaster recovery site.

VSC uses NetApp’s Single File SnapRestore (SFSR) feature to restore a datastore. In case of a SAN implementation, SFSR reverts only the required LUN from snapshot to its previous state instead of the whole volume. My guess is that SnapRestore uses LUN clone/split functionality in background, to create new LUN from the snapshot, then swap the old with the new and then delete the old. But I haven’t found a clear answer to that question.

For that functionality to work, you need a SnapRestore license. In fact, you can do the same trick manually by issuing a SnapRestore command:

> snap restore -t file -s nightly.0 /vol/vol_name/vmfs_lun_name

If you have only one LUN in the volume (and you have to), then you can simply restore the whole volume with the same effect:

> snap restore -t vol -s nightly.0 /vol/vol_name

VM Restore

VM restore is also a bit controversial way of restoring data. Because it completely removes the old VM. There is no way to keep the old .vmdks. You can use another datastore for particular virtual hard drives to restore, but it doesn’t keep the old .vmdks even in this case.

VSC uses another mechanism to perform VM restore. It creates a LUN clone (don’t confuse with FlexClone,which is a volume cloning feature) from a snapshot. LUN clone doesn’t use any additional space on the filer, because its data is mapped to the blocks which sit inside the snapshot. Then VSC maps the new LUN to the ESXi host, which you specify in the restore job wizard. When datastore is accessible to the ESXi host, VSC simply removes the old VMDKs and performs a storage vMotion from the clone to the active datastore (or the one you specify in the job). Then clone is removed as part of a clean up process.

The equivalent cli command for that is:

> lun clone create /vol/clone_vol_name -o noreserve -b /vol/vol_name nightly.0

Backup Mount

Probably the most useful way of recovery. VSC allows you to mount the backup to a particular ESXi host and do whatever you want with the .vmdks. After the mount you can connect a virtual disk to the same or another virtual machine and recover the data you need.

If you want to connect the disk to the original VM, make sure you changed the disk UUID, otherwise VM won’t boot. Connect to the ESXi console and run:

# vmkfstools -J setuuid /vmfs/volumes/datastore/VM/vm.vmdk

Backup mount uses the same LUN cloning feature. LUN is cloned from a snapshot and is connected as a datastore. After an unmount LUN clone is destroyed.

Some Notes

VSC doesn’t do a good cleanup after a restore. As part of the LUN mapping to the ESXi hosts, VSC creates new igroups on the NetApp filer, which it doesn’t delete after the restore is completed.

What’s more interesting, when you restore a VM, VSC deletes .vmdks of the old VM, but leaves all the other files: .vmx, .log, .nvram, etc. in place. Instead of completely substituting VM’s folder, it creates a new folder vmname_1 and copies everything into it. So if you use VSC now and then, you will have these old folders left behind.

Limiting the number of concurrent storage vMotions

June 6, 2013

vmw-dgrm-vsphr-087b-diagram1VMware vCenter allows several concurrent storage vMotions on a datastore. But it can negatively impact your production environment, by hammering your underlying storage. If you want to migrate several virtual machines to another datastore, it’s much safer to do that one by one. But it’s too much manual work.

There is a simple way to limit the number of concurrent storage vMotions by configuring vCenter advanced settings. There are a group of resource management parameters for network, host and datastore limits which apply to vMotion and Storage vMotion. They are called limits and costs. For ESXi 4.1 default datastore limit for migration with Storage vMotion is 128. And datastore resource cost for Storage vMotion is 16 (defaults for other versions of ESXi can be found here: Limits on Simultaneous Migrations). It basically means that 8 concurrent storage vMotions is allowed for each datastore. So to allow only one storage vMotion at a time you can either change the limit to 16 or cost to 128.

Lets say we choose to change the cost to 128. There are two ways of doing it. The first one is to edit vCenter vpxd.cfg file and add the following stanza between <vpxd></vpxd> tags:


The second simpler one way is to edit vCenter -> Administration -> vCenter Server Settings -> Advanced Settings and add config.vpxd.ResourceManager.CostPerEsx41SVmotion key with value equal to 128. You will probably need to reboot vCenter after that.

There is one moment, however. If you migrate VMs from say 3 source datastores to 1 destination, then 3 concurrent storage vMotion will kick off. I do not know what is the reason for that, but that’s what I found from the practice.

Unexpected Deduplication Impact on VMware I/O Latency

May 28, 2013

NetApp deduplication is a postponed process. During normal operation Data ONTAP only calculates hashes for the data blocks. Actual deduplication is carried out off-hours as per configured schedule. Hash calculation doesn’t affect performance in most cases. I talked about that in my previous post. NetApp states in its documentation that deduplication is a low-priority process:

When one deduplication process is running, there is 0% to 15% performance degradation on other applications.

Once I faced a situation when deduplication was configured to be carried out during business hours on one of the volumes. No one noticed that at some point volume run out of space and Data ONTAP wasn’t able to perform deduplication from that time. Situation became worse when Data ONTAP was upgraded from version 7.3.2 to 8.1.0. Now during deduplication filer tried to upgrade the fingerprint metadata to a new version at 15:00 every day with the message: “Fingerprint is being upgraded” and failed. It seems that the metadata upgrade is a very resource-intensive process and heavily affects I/O latency.

This volume was not a VMware datastore, but it sit on the same aggregate together with the several VMFS LUNs. Here what happened to the VMware I/O latency every day at 15:00 (click to enlarge):


I deleted the host name and the datastores names from the graph. You can see the large latency spike, which won’t turn yourVMs into kernel panic, but it’s not the thing you would want your production environment to experience every day.

The solution was simple. After space was increased on this volume, deduplication metadata upgrade performed successfully and problem went away. Additionally, deduplication was shifted to off-hours.

The simple lesson to learn: don’t schedule deduplication during the day, you never know what could possibly go wrong.