Posts Tagged ‘VM’

Unexpected Deduplication Impact on VMware I/O Latency

May 28, 2013

NetApp deduplication is a postponed process. During normal operation Data ONTAP only calculates hashes for the data blocks. Actual deduplication is carried out off-hours as per configured schedule. Hash calculation doesn’t affect performance in most cases. I talked about that in my previous post. NetApp states in its documentation that deduplication is a low-priority process:

When one deduplication process is running, there is 0% to 15% performance degradation on other applications.

Once I faced a situation when deduplication was configured to be carried out during business hours on one of the volumes. No one noticed that at some point volume run out of space and Data ONTAP wasn’t able to perform deduplication from that time. Situation became worse when Data ONTAP was upgraded from version 7.3.2 to 8.1.0. Now during deduplication filer tried to upgrade the fingerprint metadata to a new version at 15:00 every day with the message: “Fingerprint is being upgraded” and failed. It seems that the metadata upgrade is a very resource-intensive process and heavily affects I/O latency.

This volume was not a VMware datastore, but it sit on the same aggregate together with the several VMFS LUNs. Here what happened to the VMware I/O latency every day at 15:00 (click to enlarge):

dedup_issue_ed

I deleted the host name and the datastores names from the graph. You can see the large latency spike, which won’t turn yourVMs into kernel panic, but it’s not the thing you would want your production environment to experience every day.

The solution was simple. After space was increased on this volume, deduplication metadata upgrade performed successfully and problem went away. Additionally, deduplication was shifted to off-hours.

The simple lesson to learn: don’t schedule deduplication during the day, you never know what could possibly go wrong.

NetApp thin-provisioning for VMware LUNs

May 22, 2013

thin

LUN and Volume Thin Provisioning

I already described thin provisioning of VMware NFS volumes some time ago here. Now I want to discuss thin provisioning of LUNs.

LUNs are different from VMFS on top of NFS implementation, because LUN is an additional container inside of NetApp FlexVol. So if you’re using FC, you need to thin provision both LUN and volume:

> lun set reservation “/vol/targetvol/targetlun” disable
> vol options “targetvol” guarantee none

In fact, you can make the LUN thin and the volume thick. Then storage space that’s not used by the LUN, is returned to the volume level. But in this case it cannot be used by other volumes as a shared pool of space.

As the best practice, NetApp now recommends to set Fractional Reserve and Snap Reserve for your volumes to 0%. Don’t forget about that, if you want to save more storage space:

> vol options “targetvol” fractional_reserve 0
> snap reserve “targetvol” 0

Disable snapshots if you don’t use them:

> snap sched “targetvol” 0

It’s easy as that. Now you don’t waste your space by reserving it ahead, but use it as a shared pool of resources. But make sure to monitor aggregate free space. If you starting to run out of storage, plan purchase of new disks in advance or redistribute data between other aggregates.

Safety Features

Disabling volume level, LUN and snapshot reservations helps you to save storage space. The drawback of this approach is that you don’t have any mechanisms in place to prevent volume out-of-space situations. If you enable snapshots on the volume and they consume all the volume space, the volume goes offline. Very undesirable consequence. NetApp has two features that can serve as safety net in thin-provisioned environments: autosize and snap autodelete.

Snap autodelete automatically removes old snapshots if there is no space left inside the volume. Autosize, on the other hand, allows the volume to automatically grow to the specified limit (+20% to the volume size by default) in specified increments (5% of the volume size by the default). You can also specify what to do first autosize or snapshot autodelete by using ‘try_first’ option.

> snap autodelete “targetvol” on
> vol autosize “targetvol” on
> vol options “targetvol” try_first volume_grow

SnapMirror Considerations

If you use SnapMirroring and switch on the autosize on the source volume, then the destination volume won’t grow automatically. And SnapMirror will break the relationship if it runs out of space on the smaller destination volume. The trick here is to make the destination volume as big as the autosize limit for the source volume and thin provision the destination volume. By doing that you won’t run out of space on destination even if the source volume grows to its maximum.

Further reading

TR-3965: NetApp Thin Provisioning Deployment and Implementation Guide Data ONTAP 8.1 7-Mode

Storwize V7000 with vSphere 5 storage configuration

December 1, 2012

storwizeInformation on how to configure Storwize for optimal performance is very scarce. I’ll try to build some understanding of it from bits an pieces gathered throughout the Internet and redbooks.

Barry Whyte gave many insights on Storwize internals in his blog. Particularly his “Configuring IBM Storwize V7000 and SVC for Optimal Performance” series of posts. I’ll quote him here. The main Storwize redbook “Implementing the IBM Storwize V7000 V6.3” is mostly an administration guide and gives no useful information on the topic. I find “SAN Volume Controller Best Practices and Performance Guidelines” way more helpful (Storwize firmware is built on SVC code).

Total Number of MDisks

That’s what Barry says:

… At the heart of each V7000 controller canister is an Intel Jasper Forrest (Sandy Bridge) based quad core CPU. … When we added the tried and trusted (SSA) DS8000 RAID functionality in 2010 (6.1.0) we therefore assigned RAID processing on a per mdisk basis to a single core. That means you need at least 4 arrays per V7000 to get maximal CPU core performance. …

Number of MDisks per Storage Pool

SVC Redbook:

The capability to stripe across disk arrays is the single most important performance advantage of the SVC; however, striping across more arrays is not necessarily better. The objective here is to only add as many arrays to a single Storage Pool as required to meet the performance objectives.

If the Storage Pool is already meeting its performance objectives, we recommend that, in most cases, you add the new MDisks to new Storage Pools rather than add the new MDisks to existing Storage Pools.

Table 5-1 shows the recommended number of arrays per Storage Pool that is appropriate for general cases.

Controller type       Arrays per Storage Pool
DS4000/DS5000         4 - 24
DS6000/DS8000         4 - 12
IBM Storwise V7000    4 - 12

The development recommendations for Storwize V7000 are summarized below:

  • One MDisk group per storage subsystem
  • One MDisk group per RAID array type (RAID 5 versus RAID 10)
  • One MDisk and MDisk group per disk type (10K versus 15K RPM, or 146 GB versus 300 GB)

There are situations where multiple MDisk groups are desirable:

  • Workload isolation
  • Short-stroking a production MDisk group
  • Managing different workloads in different groups

We recommend that you have at least two MDisk groups, one for key applications, another for everything else.

Number of LUNs per Storage Pool

SVC Redbook:

We generally recommend that you configure LUNs to use the entire array, which is especially true for midrange storage subsystems where multiple LUNs configured to an array have shown to result in a significant performance degradation. The performance degradation is attributed mainly to smaller cache sizes and the inefficient use of available cache, defeating the subsystem’s ability to perform “full stride writes” for Redundant Array of Independent Disks 5 (RAID 5) arrays. Additionally, I/O queues for multiple LUNs directed at the same array can have a tendency to overdrive the array.

Table 5-2 provides our recommended guidelines for array provisioning on IBM storage subsystems.

Controller type                     LUNs per array
IBM System Storage DS4000/DS5000    1
IBM System Storage DS6000/DS8000    1 - 2
IBM Storwize V7000                  1

General considerations

vsphere5-logoLets take a look at vSphere use case scenario on top of Storwize with 16 x 600GB SAS drives in control enclosure and 10 x 2TB NL-SAS in extension enclosure (our personal case).

First of all we need to decide how many arrays we need. Do we have different workloads? No. All storage will be assigned to virtual machines which have in general the same random read/write access pattern. Do we need to isolate workloads? Probably yes, it’s generally a good idea to separate highly critical production VMs from everything else. Do we have different drive types? Yes. Obviously we don’t want to mix drive types in one RAID. Are we going to make different RAID types? Again, yes. RAID 10 is appropriate on SAS and RAID 5 on NL-SAS. So two MDisks – one RAID 10 on SAS and one RAID 5 on NL-SAS would be enough. Storwize nodes have 4 cores each. It may seem that you would benefit from 4 MDisks, but in fact you won’t. Here what Barry says:

In the case where you only have 1 or 2 HDD arrays, then the core stuff doesn’t really come into play. Its only when you get to larger systems, where you are driving more I/O than a single RAID core can handle that you need to spread them.

This is also true if you are running all SSD arrays, so 24x SSD would be best split into 4 arrays to get maximum IOPs, whereas 24x HDD are not going to saturate a single core, so (if you could create a 23+P! [ you can’t 15+P is largest we support ] then it would perform as well as 2x 11+P etc

To storage pools. In our example we have two MDisks, so you simply make two storage pools. In future if you hit performance limit, you create additional MDisks and then you have two options. If each MDisk separately is able to sustain your performance requirements, you make additional storage pools and redistribute workload between them. If you have huge load on storage and even redistribution of VMs between two arrays doesn’t help, then you better combine two MDisks of each type in its own storage pool for striping between MDisks.

Same story for number of LUNs. IBM recommends one to one LUN to MDisk relationship. But read carefully. Recommendation comes from the fact that different workloads can clash and degrade array performance. But if we have generally the same I/O patterns coming to the array it’s safe to make several LUNs on it, until latency is in the acceptable range. Moreover, when it comes to vSphere and VMFS, it’s beneficial to have at least two volumes in terms of manageability. With several LUNs you will at least have an ability to move VMs between LUNs for reconfiguration purposes. Also keep in mind that ESXi 5 hypervisor limit each host to storage queue of depth 32 per LUN. It means that if you have one big LUN and many VMs running on the host, you can quickly reach queue limit. On the other hand do not create too many LUNs or you will oversubscribe storage processors (SPs).

Sample configuration

IBM recommends constructing both RAID 10 and RAID 5 arrays from 8 drives + 1 spare drive. But since we have 16 SAS and 10 NL-SAS I would launch CLI and create two arrays: one 14 drives + 2 spares RAID 10 and one 8 drives + 2 spares RAID 5 (or 9 drives + 1 spare, but it’s not a good idea to create RAID with uneven number of drives). Each RAID in its own pool. Several LUNs in each pool. I would go for 2TB LUNs.

Consistent VMware snapshots on NetApp

March 16, 2012

If you use NetApp as a storage for you VMware hard drives, it’s wise to utilize NetApp’s powerful snapshot capabilities as an instant backup tool. I shortly mentioned in my previous post that you should disable default snapshot schedule. Snapshot is done very quickly on NetApp, but still it’s not instantaneous. If VM is running you can get .vmdks which have inconsistent data. Here I’d like to describe how you can perform consistent snapshots of VM hard drives which sit on NetApp volumes exported via NFS. Obviously it won’t work for iSCSI LUNs since you will have LUNs snapshots which are almost useless for backups.

What makes VMware virtualization platform far superior to other well-known solutions in the market is VI APIs. VI API is a set of Web services hosted on Virtual Center and ESX hosts that provides interfaces for all components and operations. Particularly, there is a Perl interface for VI API which is called VMware Infrastructure Perl Toolkit. You can download and install it for free. Using VI Perl Toolkit you can write a script which will every day put your VMs in a so called hot backup mode and make NetApp snapshots as well. Practically, hot backup mode is also a snapshot. When you create a VM snapshot, original VM hard drive is left intact and VMware starts to write delta in another file. It means that VM hard drive won’t change when making NetApp snapshot and you will get consistent .vmdk files. Now lets move to implementation.

I will write excerpts from the actual script here, because lines in the script are quite long and everything will be messed up on the blog page. I uploaded full script on FileDen. Here is the link. I apologize if you read this blog entry far later than it was published and my account or the FileDen service itself no longer exist.

VI Perl Toolkit is effectively a set of Perl scripts which you run as ready to use utilities. We will use snapshotmanager.pl which lets you create VMware VM snapshots. In the first step you make snapshots of all VMs:

\”$perl_path\perl\” -w \”$perl_toolkit_path\snapshotmanager.pl\” –server vc_ip –url https://vc_ip/sdk/vimService –username snapuser –password 123456  –operation create –snapshotname \”Daily Backup Shapshot\”

For the sake of security I created Snapshot Manager role and respective user account in Virtual Center with only two allowed operations: Create Snapshot and Remove Snapshot. Run line is self explanatory. I execute it using system($run_line) command.

After VM snapshots are created you make a NetApp snapshot:

“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap create vm_sata snap_name

To connect to NetApp terminal I use PuTTY ssh client. putty.exe itself has a GUI and plink.exe is for batch scripting. Using this command you create snapshot of particular NetApp volume. Those which hold .vmdks in our case.

To get all VMs from hot backup mode run:

\”$perl_path\perl\” -w \”$perl_toolkit_path\snapshotmanager.pl\” –server vc_ip –url https://vc_ip/sdk/vimService –username snapuser –password 123456  –operation remove –snapshotname \”Daily Backup Shapshot\”  –children 0

By –children 0 here we tell not to remove all children snapshots.

After we familiarized ourselves with main commands, lets move on to the script logic. Apparently you will want to have several snapshots. For example 7 of them for each day of the week. It means each day, before making new snapshot you will need to remove oldest and rename others. Renaming is just for clarity. You can name your snapshots vmsnap.1, vmsnap.2, … , vmsnap.7. Where vmsnap.7 is the oldest. Each night you put your VMs in hot backup mode and delete the oldest snapshot:

“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap delete vm_sata vmsnap.7

Then you rename other snapshots:

“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap rename vm_sata vmsnap.6 vmsnap.7
“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap rename vm_sata vmsnap.5 vmsnap.6
“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap rename vm_sata vmsnap.4 vmsnap.5
“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap rename vm_sata vmsnap.3 vmsnap.4
“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap rename vm_sata vmsnap.2 vmsnap.3

And create the new one:

“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap create vm_sata vmsnap.1

As a last step you bring your VMs out of hot backup mode.

Using this technique you can create short term backups of your virtual infrastructure and use them for long term retention with help of standalone backup solutions. Like backing up data from snapshots to tape library using Symantec BackupExec. I’m gonna talk about this in my later posts.

NetApp thin provisioning for VMware

March 15, 2012

Thin provisioning is a popular buzzword, especially when it comes to NetApp. However, it can really save you time and headache in a number of situations. We use thin provisioning while presenting NFS volumes from NetApp to VI3 ESX hosts. Well, NetApp already let you change size of its FlexVol volumes on the fly. But you need to do it manually. Thin provisioning helps you to configure volumes so that in case of space shortage on a volume it will automatically expand without manual intervention. Of course you need to look after your volumes, otherwise they can fill all your storage space. But it will save you enough time to resolve data growth problem. Without thin provisioning in such situation your applications can easily crash.

NetApp doesn’t support iSCSI thin provisioning for VMware, so NFS is the only option. Don’t be afraid of performance issues. Without a doubt it’s slower than FC, but NetApp is famous for its NFS performance and it’s very well suited for mid-level workloads.

To be more specific, using thin provisioning you can create say 300GB virtual hard drive for particular VM and it will initially use no space. Then it will grow as long as you fill it. It can save you tremendous amount of storage space. Because you never exactly know ahead how much space you need. But be aware, if you will try to migrate thin provisioned virtual hard drive using storage migration plugin for VMware Virtual Center then it will fill all space. It means 300GB will use all 300GB even if it’s half-full.

The best article which will help you to integrate NetApp with VMware VI3 is NetApp TR-3428: NetApp and VMware Virtual Infrastructure 3 Storage Best Practices. What I will write here are basically excerpts from this article.

NetApp Configuration

Lets start from the NetApp configuration. First thing to do is to disable snapshots as usual. Generally it’s not a good idea to make snapshots of VMware virtual hard drives on the fly. They won’t be consistent. I will touch this topic in my later posts.

> snap sched <vol-name> 0 0 0
> snap reserve <vol-name> 0

Next step is to disable access time update on the volume, which is safe because VMware doesn’t rely on accurate access time for its files. It will increase performance, since Filer won’t need to update access time for files each time they are read or written.

> vol options <vol-name> no_atime_update on

Then configure the thin provisioning feature itself by switching volume auto size policy to on. It has two keys -m and -i. By -m you set maximum volume size and by -i you configure increment size.

> vol autosize <vol-name> [-m <size>[k|m|g|t]] [-i <size>[k|m|g|t]] on

NetApp recommends to disable Fractional Reserve for thin provisioned volumes, it’s just not needed anymore. Fractional Reserve guarantees successful writes to volumes in case you use snapshots. According to how snapshots work if you completely overwrite snapshot data you will use double amount of storage space. And it’s where Fractional Reserve comes into place. It reserves 100% of additional space for such cases. It means you will never run into situation when you are out of space due to active snapshots. But since we enabled auto size, our volume will resize on demand and Fractional Reserve becomes redundant. Supposedly auto size was implemented little bit later than Fractional Reserve and we have both of them in NetApp.

> vol options <vol-name> fractional_reserve 0

In case you use snapshots as a tool for instant VMware block level backups you can change auto delete policy. I said earlier that you should disable snapshot schedule, however you can manually (using scripts) create consistent snapshots. If you want to do that then you can additionally instruct NetApp to delete oldest snapshots when you are out of space on Filer and can’t auto grow volume.

> snap autodelete <vol-name> commitment try trigger volume target_free_space 5 delete_order oldest_first
> vol options <vol-name> try_first volume_grow

Now we need to create NFS export on NetApp Filer. It’s where FilerView interface comes handy. In short, you should give your ESX hosts read-write access, root access and configure Unix security style.

VMware Configuration

VMware configuration is trivial. Go to VMware Add Storage Wizard, select Network File System, then point to your NetApp filer and specify your volume path. Additionally NetApp recommends to tune NFS heartbeat parameters. Go to Host Configuration – Advanced Settings – NFS and for ESX 3.0 hosts change:

NFS.HeartbeatFrequency to 5 from 9
NFS.HeartbeatMaxFailures to 25 from 3

For ESX 3.5 hosts change:

NFS.HeartbeatFrequency to 12
NFS.HeartbeatMaxFailures to 10

There are much more information and tuning parameters that you might want to read about. Find some time to look through TR-3428 in case you need some clarifications or additional info.

VMware Tools update issue

September 20, 2011

Recently I decided to update VMware Tools on VMs because most of them showed Out of date in VI client. For some reason several Linux VMs didn’t update even though VI client showed no error. I tried to update from inside VM by running /usr/sbin/vmware-tools-upgrade and it showed that there is not enough space in /tmp. I enlarged /tmp from 128 to 512MB and update went fine this time.

Take into account that:

  1. Windows VM will most likely be rebooted after update.
  2. In Linux VMmware Tools may not start automatically. If it’s the case start it manually by calling /etc/init.d/vmware-tools start.
  3. Network interfaces in Linux may go down after VMware Tools update. Boot them manually.