Posts Tagged ‘storage’

Run CLI Commands on NSX Manager Using REST API

August 29, 2019

Over the last few years I’ve had a chance to work with NSX-V REST APIs in many different shapes and forms. Directly from vRealize Orchestrator and PowerShell/PowerNSX, indirectly from vRealize Automation or simply by making calls from Postman, which is sometimes required during NSX deployment and upgrades.

To date I haven’t been able to find any gaps in the API and can say only good things about it. It is very well documented. You can find detailed descriptions of all requests in NSX API Guide PDF or interactively browse it in API explorer on https://code.vmware.com.

But at the end of the day, NSX REST API is only a subset of what you can do from CLI and there are situations where it’s not sufficient. I’ll give you an example. Let’s say you want to know how much storage is available on NSX Manager appliance log partition. There’s a REST API call, which will give you a response similar to this:

GET https://nsxm/api/1.0/appliance-management/system/storageinfo

<storageInfo>
  <totalStorage>86G</totalStorage>
  <usedStorage>22G</usedStorage>
  <freeStorage>64G</freeStorage>
  <usedPercentage>25</usedPercentage>
</storageInfo>

As you can see, it answers the question of how much total space is available on the appliance, but doesn’t provide a full per partition breakdown available from the CLI via “show filesystem”:

Filesystem      Size  Used Avail Use% Mounted on
/dev/root       5.6G  1.2G  4.1G  23% /
tmpfs           7.9G  232K  7.9G   1% /run
devtmpfs        7.9G     0  7.9G   0% /dev
/dev/sda6        44G   19G   24G  44% /common
/dev/loop0       16G   45M   15G   1% /common/vdisk_mnt

So what are the options here? What is not widely known is that you can use NSX central command-line interface to remotely invoke appliance CLI commands using REST API.

Invoking CLI Commands

NSX REST API has a handy POST call https://nsxm/api/1.0/nsx/cli?action=execute. All you need to provide in addition to Authorization credentials using “Basic Auth” option is the following body in XML format:

<nsxcli>
  <command>show filesystem</command>
</nsxcli>

In response you will get a body in “text/plain” format, which is the only drawback of this method. You will need to parse the response in your scripting language of choice. In PowerShell, if you made the original call using Invoke-WebRequest cmdlet and saved it into the $response variable, it can look something like this:

$responseRows = $response.Content -split "`n"
foreach($row in $responseRows) {
  if($row -Like "*/dev/sda6*") {
    $pctUsed = $row.Split(" ",[StringSplitOptions]"RemoveEmptyEntries")[4]
    $pctUsedValue = $pctUsed.Substring(0, $pctUsed.Length-1)
    Write-Host "Space utilization on the log partition is $pctUsed."
    break
  }
}

Conclusion

For most use cases NSX REST API provides all the necessary information about NSX component configuration in structured JSON or XML format. This method is more of an exception rather than a rule, but it’s a nice tool to have in your tool belt, when you run out of options.

Advertisement

EMC Isilon Overview

February 20, 2014

isilon_logo_188x110OneFS Overview

EMC Isilon OneFS is a storage OS which was built from the ground up as a clustered system.

NetApp’s Clustered ONTAP for example has evolved from being an OS for HA-pair of storage controllers to a clustered system as a result of integration with Spinnaker intellectual property. It’s not necessarily bad, because cDOT shows better performance on SPECsfs2008 than Isilon, but these systems still have two core architectural differences:

1. Isilon doesn’t have RAIDs and complexities associated with them. You don’t choose RAID protection level. You don’t need to think about RAID groups and even load distribution between them. You don’t even have spare drives per se.

2. All data on Isilon system is kept on one volume, which is a one big distributed file system. cDOT use concept of infinite volumes, but bear in mind that each NetApp filer has it’s own file system beneath. If you have 24 NetApp nodes in a cluster, then you have 24 underlying file systems, even though they are viewed as a whole from the client standpoint.

This makes Isilon very easy to configure and operate. But its simplicity comes at a price of flexibility. Isilon web interface has few options to configure and not very feature rich.

Isilon Nodes and Networking

In a nutshell Isilon is a collection of a certain number of nodes connected via 20Gb/s DDR InfiniBand back-end network and either 1GB/s or 10GB/s front-end network for client connections. There are three types of Isilon nodes S-Series (SAS + SSD drives) for transactional random access I/O, X-Series (SATA + SSD drives) for high throughput applications and NL-series (SATA drives) for archival or not frequently used data.

If you choose to have two IB switches at the back-end, then you’ll have three subnets configured for internal network: int-a, int-b and failover. You can think of a failover network as a virtual network in front of int-a and int-b. So when the packet comes to failover network IP address, the actual IB interface that receives the packet is chosen dynamically. That helps to load-balance the traffic between two IB switches and makes this set up an active/active network.

131_22

On the front-end you can have as many subnets as you like. Subnets are split between pools of IP addresses. And you can add particular node interfaces to the pool. Each pool can have its own SmartConnect zone configured. SmartConnect is a way to load-balance connections between the nodes. Basically SmartConnect is a DNS server which runs on the Isilon side. You can have one SmartConnect service on a subnet level. And one SmartConnect zone (which is simply a domain) on each of the subnet pools. To set up SmartConnect you’ll need to assign an IP address to the SmartConnect service and set a SmartConnect zone name on a pool level. Then you create an “A” record on DNS for the SmartConnect service IP address and delegate SmartConnect DNS zone to this IP. That way each time you refer to the SmartConnect zone to get access to a file share you’ll be redirected to dynamically picked up node from the pool.

SmartPools

Each type of node is automatically assigned to what is called a “Node Pool”. Nodes are grouped to the same pool if they are of the same series, have the same amount of memory and disks of the same type and size. Node Pool level is one of the spots where you can configure protection level. We’ll talk about that later. Node Pools are grouped within Tiers. So you can group NL node pool with 1TB drives and NL node pool with 3TB drives into an archive tier if you wish. And then you have File Pool Policies which you can use to manage placement of files within the cluster. For example, you can redirect files with specific extension or file size or last access time to be saved on a specific node pool or tier. File pool policies also allow you to configure data protection and override the default node pool protection setting.

SmartPools is a concept that Isilon use to name Tier/Node Pool/File Pool Policy approach. File placement is not applied automatically, otherwise it would cause high I/O overhead. It’s implemented as a job on the cluster instead which runs at 22:00 every day by default.

Data Layout and Protection

Instead of using RAIDs, Isilon uses FEC (Forward Error Correction) and more specifically a Reed-Solomon algorithm to protect data on a cluster. It’s similar to RAID5 in how it generates a protection block (or blocks) for each stripe. But it happens on a software level, instead of hardware as in storage arrays. So when a file comes in to a node, Isilon splits the file in stripe units of 128KB each, generates one FEC protection unit and distributes all of them between the nodes using back-end network. This is what is called “+1” protection level, where Isilon can sustain one disk or one node failure. Then you have “+2”, “+3” and “+4”. In “+4” you have four FECs per stripe and can sustain four disk or node failures. Note however that there is a rule that the number of data stripe units in a stripe has to be greater than number of FEC units. So the minimum requirement for “+4” protection level is 9 nodes in a cluster.

dp2

The second option is to use mirroring. You can have from 2x to 8x mirrors of your data. And the third option is “+2:1” and “+3:1” protection levels. These protection levels let you balance between the data protection and amount of the FEC overhead. For example “+2:1” setting compared to “+2” can sustain two drive failures or one node failure, instead of two node failure protection that “+2” offers. And it makes sense, since simultaneous two node failure is unlikely to happen. There is also a difference in how the data is laid out. In “+2” for each stripe Isilon uses one disk on each node and in “+2:1” it uses two disks on each node. And first FEC in this case goes to first subset of disks and second goes to second.

One benefit of not having RAID is that you can set protection level with folder or even file granularity. Which is impossible with conventional RAIDs. And what’s quite handy, you can change protection levels without recreation of storage volumes, as you might have to do while transitioning between some of the RAID levels. When you change protection level for any of the targets, Isilon creates a low priority job which redistributes data within the cluster.

Overview of NetApp Replication and HA features

August 9, 2013

NetApp has quite a bit of features related to replication and clustering:

  • HA pairs (including mirrored HA pairs)
  • Aggregate mirroring with SyncMirror
  • MetroCluster (Fabric and Stretched)
  • SnapMirror (Sync, Semi-Sync, Async)

It’s easy to get lost here. So lets try to understand what goes where.

Simple-Metrocluster

SnapMirror

SnapMirror is a volume level replication, which normally works over IP network (SnapMirror can work over FC but only with FC-VI cards and it is not widely used).

Asynchronous version of SnapMirror replicates data according to schedule. SnapMiror Sync uses NVLOGM shipping (described briefly in my previous post) to synchronously replicate data between two storage systems. SnapMirror Semi-Sync is in between and synchronizes writes on Consistency Point (CP) level.

SnapMirror provides protection from data corruption inside a volume. But with SnapMirror you don’t have automatic failover of any sort. You need to break SnapMirror relationship and present data to clients manually. Then resynchronize volumes when problem is fixed.

SyncMirror

SyncMirror mirror aggregates and work on a RAID level. You can configure mirroring between two shelves of the same system and prevent an outage in case of a shelf failure.

SyncMirror uses a concept of plexes to describe mirrored copies of data. You have two plexes: plex0 and plex1. Each plex consists of disks from a separate pool: pool0 or pool1. Disks are assigned to pools depending on cabling. Disks in each of the pools must be in separate shelves to ensure high availability. Once shelves are cabled, you enable SyncMiror and create a mirrored aggregate using the following syntax:

> aggr create aggr_name -m -d disk-list -d disk-list

HA Pair

HA Pair is basically two controllers which both have connection to their own and partner shelves. When one of the controllers fails, the other one takes over. It’s called Cluster Failover (CFO). Controller NVRAMs are mirrored over NVRAM interconnect link. So even the data which hasn’t been committed to disks isn’t lost.

MetroCluster

MetroCluster provides failover on a storage system level. It uses the same SyncMirror feature beneath it to mirror data between two storage systems (instead of two shelves of the same system as in pure SyncMirror implementation). Now even if a storage controller fails together with all of its storage, you are safe. The other system takes over and continues to service requests.

HA Pair can’t failover when disk shelf fails, because partner doesn’t have a copy to service requests from.

Mirrored HA Pair

You can think of a Mirrored HA Pair as HA Pair with SyncMirror between the systems. You can implement almost the same configuration on HA pair with SyncMirror inside (not between) the system. Because the odds of the whole storage system (controller + shelves) going down is highly unlike. But it can give you more peace of mind if it’s mirrored between two system.

It cannot failover like MetroCluster, when one of the storage systems goes down. The whole process is manual. The reasonable question here is why it cannot failover if it has a copy of all the data? Because MetroCluster is a separate functionality, which performs all the checks and carry out a cutover to a mirror. It’s called Cluster Failover on Disaster (CFOD). SyncMirror is only a mirroring facility and doesn’t even know that cluster exists.

Further Reading

Monitoring ESX Storage Queues

July 30, 2013

6a00d8341c328153ef01774354e2fd970d-500wiQueue Limits

I/O data goes through several storage queues on its way to disk drives. VMware is responsible for VM queue, LUN queue and HBA queue. VM and LUN queues are usually equal to 32 operations. It means that each ESX host at any moment can have no more than 32 active operations to a LUN. Same is true for VMs. Each VM can have as many as 32 active operations to a datastore. And if multiple VMs share the same datastore, their combined I/O flow can’t go over the 32 operations limit (per LUN queue for QLogic HBAs has been increased from 32 to 64 operations in vSphere 5). HBA queue size is much bigger and can hold several thousand operations (4096 for QLogic, however I can see in my config that driver is configured with 1014 operations).

Queue Monitoring

You can monitor storage queues of ESX host from the console. Run “esxtop”, press “d” to view disk adapter stats, then press “f” to open fields selection and add Queue Stats by pressing “d”.

AQLEN column will show the queue depth of the storage adapter. CMDS/s is the real-time number of IOPS. DAVG is the latency which comes from the frame traversing through the “driver – HBA – fabric – array SP” path and should be less than 20ms. Otherwise it means that storage is not coping. KAVG shows the time which operation spent in hypervisor kernel queue and should be less than 2ms.

Press “u” to see disk device statistics. Press “f” to open the add or remove fields dialog and select Queue Stats “f”. Here you’ll see a number of active (ACTV) and queue (QUED) operations per LUN.  %USD is the queue load. If you’re hitting 100 in %USD and see operations under QUED column, then again it means that your storage cannot manage the load an you need to redistribute your workload between spindles.

Some useful documents:

NetApp NVRAM and Write Caching

July 19, 2013

388375Overview

NetApp storage systems use several types of memory for data caching. Non-volatile battery-backed memory (NVRAM) is used for write caching (whereas main memory and flash memory in forms of either extension PCIe card or SSD drives is used for read caching). Before going to hard drives all writes are cached in NVRAM. NVRAM memory is split in half and each time 50% of NVRAM gets full, writes are being cached to the second half, while the first half is being written to disks. If during 10 seconds interval NVRAM doesn’t get full, it is forced to flush by a system timer.

To be more precise, when data block comes into NetApp it’s actually written to main memory and then journaled in NVRAM. NVRAM here serves as a backup, in case filer fails. When data has been written to disks as part of so called Consistency Point (CP), write blocks which were cached in main memory become the first target to be evicted and replaced by other data.

Caching Approach

NetApp is frequently criticized for small amounts of write cache. For example FAS3140 has only 512MB of NVRAM, FAS3220 has a bit more 1,6GB. In mirrored HA or MetroCluster configurations NVRAM is mirrored via NVRAM interconnect adapter. Half of the NVRAM is used for local operations and another half for the partner’s. In this case the amount of write cache becomes even smaller. In FAS32xx series NVRAM has been integrated into main memory and is now called NVMEM. You can check the amount of NVRAM/NVMEM in your filer by running:

> sysconfig -a

The are two answers to the question why NetApp includes less cache in their controllers. The first one is given in white paper called “Optimizing Storage Performance and Cost with Intelligent Caching“. It states that NetApp uses different approach to write caching, compared to other vendors. Most often when data block comes in, cache is used to keep the 8KB data block, as well as 8KB inode and 8KB indirect block for large files. This way, write cache can be thought as part of the physical file system, because it mimics its structure. NetApp on the other hand uses journaling approach. When data block is received by the filer, 8KB data block is cached along with 120B header. Header contains all the information needed to replay the operation. After each cache flush Consistency Point (CP) is created, which is a special type of consistent file system snapshot. If controller fails, the only thing which needs to be done is reverting file system to the latest consistency point and replaying the log.

But this white paper was written in 2010. And cache journaling is not a feature unique to NetApp. Many vendors are now using it. The other answer, which makes more sense, was found on one of the toaster mailing list archives here: NVRAM weirdness (UNCLASSIFIED). I’ll just quote the answer:

The reason it’s so small compared to most arrays is because of WAFL. We don’t need that much NVRAM because when writes happen, ONTAP writes out single complete RAID stripes and calculates parity in memory. If there was a need to do lots of reads to regenerate parity, then we’d have to increase the NVRAM more to smooth out performance.

NVLOG Shipping

A feature called NVLOG shipping is an integral part of sync and semi-sync SnapMirror. NVLOG shipping is simply a transfer of NVRAM writes from the primary to a secondary storage system.  Writes on primary cannot be transferred directly to NVRAM of the secondary system, because in contrast to mirrored HA and MetroCluster, SnapMirror doesn’t have any hardware implementation of the NVRAM mirroring. That’s why the stream of data is firstly written to the special files on the volume’s parent aggregate on the secondary system and then are read to the NVRAM.

nvram

Documents I found useful:

WP-7107: Optimizing Storage Performance and Cost with Intelligent Caching

TR-3326: 7-Mode SnapMirror Sync and SnapMirror Semi-Sync Overview and Design Considerations

TR-3548: Best Practices for MetroCluster Design and Implementation

United States Patent 7730153: Efficient use of NVRAM during takeover in a node cluster

NetApp thin-provisioning for VMware LUNs

May 22, 2013

thin

LUN and Volume Thin Provisioning

I already described thin provisioning of VMware NFS volumes some time ago here. Now I want to discuss thin provisioning of LUNs.

LUNs are different from VMFS on top of NFS implementation, because LUN is an additional container inside of NetApp FlexVol. So if you’re using FC, you need to thin provision both LUN and volume:

> lun set reservation “/vol/targetvol/targetlun” disable
> vol options “targetvol” guarantee none

In fact, you can make the LUN thin and the volume thick. Then storage space that’s not used by the LUN, is returned to the volume level. But in this case it cannot be used by other volumes as a shared pool of space.

As the best practice, NetApp now recommends to set Fractional Reserve and Snap Reserve for your volumes to 0%. Don’t forget about that, if you want to save more storage space:

> vol options “targetvol” fractional_reserve 0
> snap reserve “targetvol” 0

Disable snapshots if you don’t use them:

> snap sched “targetvol” 0

It’s easy as that. Now you don’t waste your space by reserving it ahead, but use it as a shared pool of resources. But make sure to monitor aggregate free space. If you starting to run out of storage, plan purchase of new disks in advance or redistribute data between other aggregates.

Safety Features

Disabling volume level, LUN and snapshot reservations helps you to save storage space. The drawback of this approach is that you don’t have any mechanisms in place to prevent volume out-of-space situations. If you enable snapshots on the volume and they consume all the volume space, the volume goes offline. Very undesirable consequence. NetApp has two features that can serve as safety net in thin-provisioned environments: autosize and snap autodelete.

Snap autodelete automatically removes old snapshots if there is no space left inside the volume. Autosize, on the other hand, allows the volume to automatically grow to the specified limit (+20% to the volume size by default) in specified increments (5% of the volume size by the default). You can also specify what to do first autosize or snapshot autodelete by using ‘try_first’ option.

> snap autodelete “targetvol” on
> vol autosize “targetvol” on
> vol options “targetvol” try_first volume_grow

SnapMirror Considerations

If you use SnapMirroring and switch on the autosize on the source volume, then the destination volume won’t grow automatically. And SnapMirror will break the relationship if it runs out of space on the smaller destination volume. The trick here is to make the destination volume as big as the autosize limit for the source volume and thin provision the destination volume. By doing that you won’t run out of space on destination even if the source volume grows to its maximum.

Further reading

TR-3965: NetApp Thin Provisioning Deployment and Implementation Guide Data ONTAP 8.1 7-Mode

IBM DS4700 copyback failed

August 27, 2012

If you have a global hot spare (GHS) drive when one of the active hard drives failes, your data is reconstructed to a GHS. Then, when you replace the failed drive, storage system automatically initiates a copyback, which gets the data from the GHS back to the replacement drive. Sometimes it doesn’t happen and replacement drive stays in an Unassigned state. If it is the case go to the DS Storage Manager, right click on the RAID array and select Replace Drives. There you should see the failed drive. Choose replacement from unassigned drives and click Replace Drive. Copyback will start immediately.

Take into consideration that copyback can be long-lasting, depending on the array size. If it is a production system and its performance is critical, right click on the logical drive, choose Change -> Modification Priority. There you can set how much resources will be allocated for modification (such as copyback, reconstruction, etc) and performance. Change it to Low for maximum performance.

Disconnect stalled NDMP sessions

March 30, 2012

Once, I started installation of Symantec Backup Exec service pack update when tape library inventory job was running. After installation has been completed I ended up with library offline and not available. It happened because of hanged NDMP sessions. To list your media changer and tape drives information run:

storage show mc
storage show tape

or

sysconfig -m
sysconfig -t

To list and kill particular NDMP sessions run:

ndmpd status
ndmpd kill job_id

Then restart Backup Exec service.

Jumbo Frames justified?

March 27, 2012

When it comes to VMware on NetApp, boosting  performance by implementing Jumbo Frames is always taken into consideration. However, it’s not clear if it really has any significant impact on latency and throughput.

Officially VMware doesn’t support Jumbo Frames for NAS and iSCSI. It means that using Jumbo Frames to transfer storage traffic from VMkernel interface to your storage system is the solution which is not tested by VMware, however, it actually works. To use Jumbo Frames you need to activate them throughout the whole communication path: OS, virtual NIC (change to Enchanced vmxnet from E1000), Virtual Switch and VMkernel, physical ethernet switch and storage. It’s a lot of work to do and it’s disruptive at some points, which is not a good idea for production infrastructure. So I decided to take a look at benchmarks, before deciding to spend a great amount of time and effort on it.

VMware and NetApp has a TR-3808-0110 technical report which is called “VMware vSphere and ESX 3.5 Multiprotocol Performance Comparison Using FC, iSCSI, and NFS”. Section 2.2 clearly states that:

  • Using NFS with jumbo frames enabled using both Gigabit and 10GbE generated overall performance that was comparable to that observed using NFS without jumbo frames and required approximately 6% to 20% fewer ESX CPU resources compared to using NFS without jumbo frames, depending on the test configuration.
  • Using iSCSI with jumbo frames enabled using both Gigabit and 10GbE generated overall performance that was comparable to slightly lower than that observed using iSCSI without jumbo and required approximately 12% to 20% fewer ESX CPU resources compared to using iSCSI without jumbo frames depending on the test configuration.
Another important statement here is:
  • Due to the smaller request sizes used in the workloads, it was not expected that enabling jumbo frames would improve overall performance.

I believe that 4K and 8K packet sizes are fair in case of virtual infrastructure. Maybe if you move large amounts of data through your virtual machines it will make sense for you, but I feel like it’s not reasonable to implement Jumbo Frames for virual infrastructure in general.

The another report finding is that Jumbo Frames decrease CPU load, but if you use TOE NICs, then no sense once again.

VMware supports jumbo frames with the following NICs: Intel (82546, 82571), Broadcom (5708, 5706, 5709), Netxen (NXB-10GXxR, NXB-10GCX4), and Neterion (Xframe, Xframe II, Xframe E). We use Broadcom NetXtreme II BCM5708 and Intel 82571EB, so Jumbo Frames implementation is not going to be a problem. Maybe I’ll try to test it by myself when I’ll have some free time.

Links I found useful:

Consistent VMware snapshots on NetApp

March 16, 2012

If you use NetApp as a storage for you VMware hard drives, it’s wise to utilize NetApp’s powerful snapshot capabilities as an instant backup tool. I shortly mentioned in my previous post that you should disable default snapshot schedule. Snapshot is done very quickly on NetApp, but still it’s not instantaneous. If VM is running you can get .vmdks which have inconsistent data. Here I’d like to describe how you can perform consistent snapshots of VM hard drives which sit on NetApp volumes exported via NFS. Obviously it won’t work for iSCSI LUNs since you will have LUNs snapshots which are almost useless for backups.

What makes VMware virtualization platform far superior to other well-known solutions in the market is VI APIs. VI API is a set of Web services hosted on Virtual Center and ESX hosts that provides interfaces for all components and operations. Particularly, there is a Perl interface for VI API which is called VMware Infrastructure Perl Toolkit. You can download and install it for free. Using VI Perl Toolkit you can write a script which will every day put your VMs in a so called hot backup mode and make NetApp snapshots as well. Practically, hot backup mode is also a snapshot. When you create a VM snapshot, original VM hard drive is left intact and VMware starts to write delta in another file. It means that VM hard drive won’t change when making NetApp snapshot and you will get consistent .vmdk files. Now lets move to implementation.

I will write excerpts from the actual script here, because lines in the script are quite long and everything will be messed up on the blog page. I uploaded full script on FileDen. Here is the link. I apologize if you read this blog entry far later than it was published and my account or the FileDen service itself no longer exist.

VI Perl Toolkit is effectively a set of Perl scripts which you run as ready to use utilities. We will use snapshotmanager.pl which lets you create VMware VM snapshots. In the first step you make snapshots of all VMs:

\”$perl_path\perl\” -w \”$perl_toolkit_path\snapshotmanager.pl\” –server vc_ip –url https://vc_ip/sdk/vimService –username snapuser –password 123456  –operation create –snapshotname \”Daily Backup Shapshot\”

For the sake of security I created Snapshot Manager role and respective user account in Virtual Center with only two allowed operations: Create Snapshot and Remove Snapshot. Run line is self explanatory. I execute it using system($run_line) command.

After VM snapshots are created you make a NetApp snapshot:

“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap create vm_sata snap_name

To connect to NetApp terminal I use PuTTY ssh client. putty.exe itself has a GUI and plink.exe is for batch scripting. Using this command you create snapshot of particular NetApp volume. Those which hold .vmdks in our case.

To get all VMs from hot backup mode run:

\”$perl_path\perl\” -w \”$perl_toolkit_path\snapshotmanager.pl\” –server vc_ip –url https://vc_ip/sdk/vimService –username snapuser –password 123456  –operation remove –snapshotname \”Daily Backup Shapshot\”  –children 0

By –children 0 here we tell not to remove all children snapshots.

After we familiarized ourselves with main commands, lets move on to the script logic. Apparently you will want to have several snapshots. For example 7 of them for each day of the week. It means each day, before making new snapshot you will need to remove oldest and rename others. Renaming is just for clarity. You can name your snapshots vmsnap.1, vmsnap.2, … , vmsnap.7. Where vmsnap.7 is the oldest. Each night you put your VMs in hot backup mode and delete the oldest snapshot:

“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap delete vm_sata vmsnap.7

Then you rename other snapshots:

“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap rename vm_sata vmsnap.6 vmsnap.7
“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap rename vm_sata vmsnap.5 vmsnap.6
“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap rename vm_sata vmsnap.4 vmsnap.5
“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap rename vm_sata vmsnap.3 vmsnap.4
“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap rename vm_sata vmsnap.2 vmsnap.3

And create the new one:

“\$plink_path” -ssh -2 -batch -i \”private_key_path\” -l root netapp_ip snap create vm_sata vmsnap.1

As a last step you bring your VMs out of hot backup mode.

Using this technique you can create short term backups of your virtual infrastructure and use them for long term retention with help of standalone backup solutions. Like backing up data from snapshots to tape library using Symantec BackupExec. I’m gonna talk about this in my later posts.