Posts Tagged ‘policy’

AWS Cloud Protection Manager Part 3: Backup and Restore

August 21, 2017

Backup

Backups are created according to the schedule specified in the backup policy. We discussed how to configure backup policies in the previous blog post of the series. The list of backups you see on the Backup Monitor tab are your restore points. Backups that are older then the specified retention policy will be purged from the list and you will not see them there, unless you move them to “Freezer”.

It is important to understand that apart from volume snapshots, for each backed up instance CPM also creates an AMI. Those who has hands-on experience with AWS may already know, that AMIs is the only way to create clones of Windows EC2 instances in AWS. If you go to AWS console and try to find a clone action under the instance Action menu, you won’t find any. You will have “Create Image” instead. It creates an AMI, from which you can then spin up a clone of an instance the image was created from.

CPM does exactly that. For each backup policy the instance is under, it creates one AMI. In our example we have four backup policies, that will result in four AMIs for each of the instances. Every AMI has to have at least one storage volume. So CPM will include the root volume of each instance into AMI, just because it has to. But AMIs are required only to restore EC2 instance configuration. Data is restored from volume snapshots, that can be used to create new volumes from them and then attach them to the instance. You can click on the View button under Snapshots to find the corresponding snapshot and AMI IDs.

There is a backup log for each job run as well that is helpful for issue troubleshooting.

Restore

To perform a restore click on the Recover button next to the backup job and you will get the list of the instances you can recover. CPM offers you three options: instance recovery, volume recovery and file recovery. Let’s go back to front.

File recovery is probably the most used recovery option. As it lets you restore individual files. When you click on the “Explore” button, CPM creates new volumes from the snapshots you are restoring from and mount them to the CPM instance. You are then presented with a simple file system browser where you can find the file and click on the green down arrow icon in Download column to save the file to your computer.

If you click on “Volume Only”, you can restore particular volumes. Restored volumes are not attached to any instance, unless you specify it under “Attach to Instance” column. You can then select under “Attach Behaviour” what CPM should do if such volume is already attached to the instance or if you want to automatically detach the original volume, but the instance is running (you can do it only if instance is stopped).

And the last option is “Instance”. It will create a clone of the original instance using the pre-generated AMI and volume snapshots, as we discussed in the Backup section of this blog post. You can specify many options under Advanced Options section, including recovery to another VPC or different availability zone. If anything, make sure you specify a new IP address for the instance, otherwise you’ll have a conflict and your restore will fail. Ideally you should also shut down the original EC2 instance before spinning up a restore clone.

Advanced Features

There are quite a few worth mentioning. So far we have looked at simple EC2 instance restore. But you don’t have to backup whole instances, you can also backup individual volumes. On top of that, CPM supports RDS database, Aurora and Redshift cluster backups.

If you run MS Exchange, Sharepoint or SQL on your EC2 instances, you can install CPM backup agent on them to ensure you have application-consistent backups via VSS, as opposed to crash-consistent backups you get if agent is not used. If you install the agent, you can also run a script on the instance before and after the backup is taken.

Last but not least is DR. Restoring to another availability zone within the region is already supported on instance recovery level. You can choose availability zone you want to restore to. It is not possible to recover to another region, though. Because AWS snapshots and AMIs are local to the region they are created in. If you want to be able to recover to another region, you can configure DR in CPM, which will utilise AWS AMI and snapshot copy functionality to copy backups to another region at configured frequency.

Conclusion

Overall, I found Cloud Protection Manager very easy to install, configure and use. If you come from infrastructure background, at first glance CPM may look to you like a very basic tool, compared to such feature-rich solutions like Veeam or Commvault. But that feeling is misleading. CPM is simple, because AWS simple. All infrastructure complexity is hidden under the covers. As a result, all AWS backup tools need to do is create snapshots and CPM does it well.

Advertisement

AWS Cloud Protection Manager Part 2: Configuration

August 14, 2017

Overview

As we discussed in Part 1 of this series, snapshots serve as a good basis to implement backup in AWS. But AWS does not provide an out-of-the-box tool that can manage snapshots at scale and perform snapshot creation/deletion based on a defined retention. Rich AWS APIs allow you to build such tool yourself or you can use an existing backup solution built for AWS. In this blog post we are looking at one such product, called Cloud Protection Manager.

You will be surprised to know that the first version of Cloud Protection Manager was released back in 2013. The product has matured over the years and the current CPM version 2.1 according to N2W web-site has become quite popular amongst AWS customers.

CPM is offered in four different versions: Standard, Advanced, Enterprise and Enterprise Plus. Functionality across all four versions is mostly the same, with the key difference being the number of instances you can backup. Ranging from 20 instances in Standard and $5 per instance in Enterprise Plus.

Installation

CPM offers a very straightforward consumption model. You purchase it from AWS marketplace and pay by the month. Licensing costs are billed directly to your account. There are no additional steps involved.

To install CPM you need to find the version you want to purchase in AWS Marketplace, specify instance settings, such as region, VPC subnet, security group, then accept the terms and click launch. AWS will spin up a new CPM server as an EC2 instance for you. You also have an option to run a 30-day trial if you want to play with the product before making a purchasing decision.

Note that CPM needs to be able to talk to AWS API endpoints to perform snapshots, so make sure that the appliance has Internet access by means of a public IP address, Elastic IP address or a NAT gateway. Similarly, the security group you attach it to should at least have HTTPS out allowed.

Initial Configuration

Appliance is then configured using an initial setup wizard. Find out what private IP address has been assigned to the instance and open a browser session to it. The wizard is reasonably straightforward, but there are two things I want to draw your attention to.

You will be asked to create a data volume. This volume is needed purely to keep CPM configuration and metadata. Backups are kept in S3 and do not use this volume. The default size is 5GB, which is enough for roughly 50 instances. If you have a bigger environment allocate 1GB per every 10 AWS instances.

You will also need to specify AWS credentials for CPM to be able to talk to AWS APIs. You can use your AWS account, but this is not a security best practice. In AWS you can assign a role to an EC2 instance, which is what you should be using for CPM. You will need to create IAM policies that essentially describe permissions for CPM to create backups, perform restores, send notifications via AWS SNS and configure EC2 instances. Just refer to CPM documentation, copy and paste configuration for all policies, create a role and specify the role in the setup wizard.

Backup Policies

Once you are finished with the initial wizard you will be able to log in to the appliance using the password you specified during installation. As in most backup solutions you start with backup policies, which allow you to specify backup targets, schedule and retention.

One thing that I want to touch on here is backup schedules, that may be a bit confusing at start. It will be easier to explain it in an example. Say you want to implement a commonly used GFS backup schedule, with 7 daily, 4 weekly, 12 monthly and 7 yearly backups. Daily backup should run every day at 8pm and start from today. Weekly backups run on Sundays.

This is how you would configure such schedule in CPM:

  • Daily
    • Repeats Every: 1 Days
    • Start Time: Today Date, 20:00
    • Enabled on: Mon-Sat
  • Weekly
    • Repeats Every: 1 Weeks
    • Start Time: Next Sunday, 20:00
    • Enabled on: Mon-Sun
  • Monthly
    • Repeats Every: 1 Months
    • Start Time: 28th of this month, 21:00
    • Enabled on: Mon-Sun
  • Yearly:
    • Repeats Every: 12 Months
    • Start Time: 31st of December, 22:00
    • Enabled on: Mon-Sun

Some of the gotchas here:

  • “Enabled on” setting is relevant only to the Daily backup, the rest of the schedules are based on the date you specify in “Start Time” field. For instance, if you specify a date in the Weekly backup Start Time that is a Sunday, your weekly backups will run every Sunday.
  • Make sure to run your Monthly backup on 28th day of every month, to guarantee you have a backup every month, including February.
  • It’s not possible to prevent Weekly backup to not run on the last week of every month. So make sure to adjust the Start Time for the Monthly backup so that Weekly and Monthly backups don’t run at the same time if they happen to fall on the same day.
  • Same considerations are true for the Yearly backup as well.

Then you create your daily, weekly, monthly and yearly backup policies using the corresponding schedules and add EC2 instances that require protection to every policy. Retention is also specified at the policy level. According to our scenario we will have 6 generations for Daily, 4 generations for Weekly, 12 generations for Monthly and 7 generations for Yearly.

Notifications

CPM uses AWS Simple Notification Service (SNS) to send email alerts. If you gave CPM instance SNS permissions in IAM role you created previously, you should be able to simply go to Notification settings, enable Alerts and select “Create new topic” and “Add user email as recipient” options. CPM will create a SNS topic in AWS for you automatically and use email address you specified in the setup wizard to send notifications to. You can change or add more email addresses to the SNS topic in AWS console later on if you need to.

Conclusion

This is all you need to get your Cloud Protection Manager up and running. In the next blog post we will look at how instances are backed up and restored and discuss some of the advanced backup options CPM offers.

vSphere SDRS Design Considerations

June 26, 2016

data storageIf you happen to have your vSphere cluster to be licensed with Enterprise Plus edition, you may be aware of some of the advanced storage management features it includes, such as Storage DRS and Profile-Driven Storage.

These two features work together to let you optimise VM distribution between multiple VMware datastores from capability, capacity and latency perspective, much like DRS does for memory and compute. But they have some interoperability limitations, which I want to discuss in this post.

Datastore Clusters

In simple terms, datastore cluster is a collection of multiple datastores, which can be seen as a single entity from VM provisioning perspective.

datastore_cluster

VMware poses certain requirements for datastore clustes, but in my opinion the most important one is this:

Datastore clusters must contain similar or interchangeable datastores.

In other words, all of the datastores within a datastore cluster should have the same performance properties. You should not mix datastores provisioned on SSD tier with datastores on SAS and SATA tier and vise versa. The reason why is simple. Datastore clusters are used by SDRS to load-balance VMs between the datastores of a datastore cluster. DRS balances VMs based on datastore capacity and I/O latency only and is not storage capability aware. If you had SSD, SAS and SATA datastores all under the same cluster, SDRS would simply move all VMs to SSD-backed datastores, because it has the lowest latency and leave SAS and SATA empty, which makes little sense.

Design Decision 1:

  • If you have several datastores with the same performance characteristics, combine them all in a datastore cluster. Do not mix datastores from different arrays or array storage tiers in one datastore cluster. Datastore clusters is not a storage tiering solution.

Storage DRS

As already mentioned, SDRS is a feature, which when enabled on a datastore cluster level, lets you automatically (or manually) distribute VMs between datastores based on datastore storage utilization and I/O latency basis. VM placement recommendations and datastore maintenance mode are amongst other useful features of SDRS.

storage_drs

Quite often SDRS is perceived as a feature that can work with Profile-Driven Storage to enforce VM Storage Policy compliance. One of the scenarios, that is often brought up is what if there’s a VM with multiple .vmdk disks. Each disk has a certain storage capability. Mistakenly one of the disks has been storage vMotion’ed to a datastore, which does not meet the storage capability requirements. Can SDRS automatically move the disk back to a compliant datastore or notify that VM is not compliant? The answer is – no. SDRS does not take storage capabilities into account and make decisions only based on capacity and latency. This may be implemented in future versions, but is not supported in vSphere 5.

Design Decision 2:

  • Use datastore clusters in conjunction with Storage DRS to get the benefit of VM load-balancing and placement recommendations. SDRS is not storage capability aware and cannot enforce VM Storage Policy compliance.

Profile-Driven Storage

So if SDRS and datastore clusters are not capable of supporting  multiple tiers of storage, then what does? Profile-Driven Storage is aimed exactly for that. You can assign user-defined or system-defined storage capabilities to a datastore and then create a VM Storage Policy and assign it to a VM. VM Storage Policy includes the list of required storage capabilities and only those datastores that mach them, will be suggested as a target for the VM that is assigned to that policy.

You can create storage capabilities manually, such as SSD, SAS, SATA. Or more abstract, such as Bronze, Silver and Gold and assign them to corresponding datastores. Or you can leverage VASA, which automatically assigns corresponding storage capabilities. Below is an example of a datastore connected from a Dell Compellent storage array.

datastore_capabilities

You can then use storage capabilities from the VASA provider to create VM Storage Policies and assign them to VMs accordingly.

VASA.jpg

Design Decision 3:

  • If you have more than one datastore storage type, use Profile-Driven Storage to enforce VM placement based on VM storage requirements. VASA can simplify storage capabilities management.

Conclusion

If all of your datastores have the same performance characteristics, such as a number of LUNs auto-tiered on the storage array side, then one SDRS-enabled datastore cluster is a perfect solution for you.

But if your storage design is slightly more complex and you have datastores with different performance characteristics, such as SSD, SAS and SATA, leverage Profile-Driven Storage to control VM placement and enforce compliance. Just make sure to use a separate cluster for each tier of storage and you will get the most benefit out of vSphere Storage Policy-Based Management.

How Admission Control Really Works

May 2, 2016

confusionThere is a moment in every vSphere admin’s life when he faces vSphere Admission Control. Quite often this moment is not the most pleasant one. In one of my previous posts I talked about some of the common issues that Admission Control may cause and how to avoid them. And quite frankly Admission Control seems to do more harm than good in most vSphere environments.

Admission Control is a vSphere feature that is built to make sure that VMs with reservations can be restarted in a cluster if one of the cluster hosts fails. “Reservations” is the key word here. There is a common belief that Admission Control protects all other VMs as well, but that’s not true.

Let me go through all three vSphere Admission Control policies and explain why you’re better of disabling Admission Control altogether, as all of these policies give you little to no benefit.

Host failures cluster tolerates

This policy is the default when you deploy a vSphere cluster and policy which causes the most issues. “Host failures cluster tolerates” uses slots to determine if a VM is allowed to be powered on in a cluster. Depending on whether VM has CPU and memory reservations configured it can use one or more slots.

Slot Size

To determine the total number of slots for a cluster, Admission Control uses slot size. Slot size is either the default 32MHz and 128MB of RAM (for vSphere 6) or if you have VMs in the cluster configured with reservations, then the slot size will be calculated based on the maximum CPU/memory reservation. So say if you have 100 VMs, 98 of which have no reservations, one VM has 2 vCPUs and 8GB of memory reserved and another VM has 4 vCPUs and 4GB of memory reserved, then the slot size will jump from 32MHz / 128MB to 4 vCPUs / 8GB of memory. If you have 2.0 GHz CPUs on your hosts, the 4 vCPU reservation will be an equivalent of 8.0 GHz.

Total Number of Slots

Now that we know the slot size, which happens to be 8.0 GHz and 8GB of memory, we can calculate the total number of slots in the cluster. If you have 2 x 8 core CPUs and 256GB of RAM in each of 4 ESXi hosts, then your total amount of resources is 16 cores x 2.0 GHz x 4 hosts = 128 GHz and 256GB x 4 hosts = 1TB of RAM. If your slot size is 4 vCPUs and 8GB of RAM, you get 64 vCPUs / 4 vCPUs = 16 slots (you’ll get more for memory, but the least common denominator has to be used).

total_slots

Practical Use

Now if you configure to tolerate one host failure, you have to subtract four slots from the total number. Every VM, even if it doesn’t have reservations takes up one slot. And as a result you can power on maximum 12 VMs on your cluster. How does that sound?

Such incredibly restrictive behaviour is the reason why almost no one uses it in production. Unless it’s left there by default. You can manually change the slot size, but I have no knowledge of an approach one would use to determine the slot size. That’s the policy number one.

Percentage of cluster resources reserved as failover spare capacity

This is the second policy, which is commonly recommended by most to use instead of the restrictive “Host failures cluster tolerates”. This policy uses percentage-based instead of the slot-based admission.

It’s much more straightforward, you simply specify the percentage of resources you want to reserve. For example if you have four hosts in a cluster the common belief is that if you specify 25% of CPU and memory, they’ll be reserved to restart VMs in case one of the hosts fail. But it won’t. Here’s the reason why.

When calculating amount of free resources in a cluster, Admission Control takes into account only VM reservations and memory overhead. If you have no VMs with reservations in your cluster then HA will be showing close to 99% of free resources even if you’re running 200 VMs.

failover_capacity

For instance, if all of your VMs have 4 vCPUs and 8GB of RAM, then memory overhead would be 60.67MB per VM. For 300 VMs it’s roughly 18GB. If you have two VMs with reservations, say one VM with 2 vCPUs / 4GB of RAM and another VM with 4 vCPUs / 2GB of RAM, then you’ll need to add up your reservations as well.

So if we consider memory, it’s 18GB + 4GB + 2GB = 24GB. If you have the total of 1TB of RAM in your cluster, Admission Control will consider 97% of your memory resources being free.

For such approach to work you’d need to configure reservations on 100% of your VMs. Which obviously no one would do. So that’s the policy number two.

Specify failover hosts

This is the third policy, which typically is the least recommended, because it dedicates a host (or multiple hosts) specifically just for failover. You cannot run VMs on such hosts. If you try to vMotion a VM to it, you’ll get an error.

failover_host

In my opinion, this policy would actually be the most useful for reserving cluster resources. You want to have N+1 redundancy, then reserve it. This policy does exactly that.

Conclusion

When it comes to vSphere Admission Control, everyone knows that “Host failures cluster tolerates” policy uses slot-based admission and is better to be avoided.

There’s a common misconception, though, that “Percentage of cluster resources reserved as failover spare capacity” is more useful and can reserve CPU and memory capacity for host failover. But in reality it’ll let you run as many VMs as you want and utilize all of your cluster resources, except for the tiny amount of CPU and memory for a handful of VMs with reservations you may have in your environment.

If you want to reserve failover capacity in your cluster, either use “Specify failover hosts” policy or simply disable Admission Control and keep an eye on your cluster resource utilization manually (or using vROps) to make sure you always have room for growth.

Implications of Ignoring vSphere Admission Control

April 5, 2016

no-admissionHA Admission Control has historically been on of the lesser understood vSphere topics. It’s not intuitive how it works and what it does. As a result it’s left configured with default values in most vSphere environments. But default Admission Control setting are very restrictive and can often cause issues.

In this blog post I want to share the two most common issues with vSphere Admission Control and solutions to these issues.

Issue #1: Not being able to start a VM

Description

Probably the most common issue everyone encounters with Admission Control is when you suddenly cannot power on VMs any more. There are multiple reasons why that might happen, but most likely you’ve just configured a reservation on one of your VMs or deployed a VM from an OVA template with a pre-configured reservation. This has triggered a change in Admission Control slot size and based on the new slot size you no longer have enough slots to satisfy failover requirements.

As a result you get the following alarm in vCenter: “Insufficient vSphere HA failover resources”. And when you try to create and boot a new VM you get: “Insufficient resources to satisfy configured failover level for vSphere HA”.

admission_error

Cause

So what exactly has happened here. In my example a new VM with 4GHz of CPU and 4GB of RAM was deployed. Admission Control was set to its default “Host Failures Cluster Tolerates” policy. This policy uses slot sizes. Total amount of resources in the cluster is divided by the slot size (4GHz and 4GB in the above case) and then each VM (even if it doesn’t have a reservation) uses at least 1 slot. Once you configure a VM reservation, depending on the number of VMs in your cluster more often than not you get all slots being used straight away. As you can see based on the calculations I have 91 slots in the cluster, which have instantly been used by 165 running VMs.

slot_calculations

Solution

You can control the slot size manually and make it much smaller, such as 1GHz and 1GB of RAM. That way you’d have much more slots. The VM from my previous example would use four slots. And all other VMs which have no reservations would use less slots in total, because of a smaller slot size. But this process is manual and prone to error.

The better solution is to use “Percentage of Cluster Resources” policy, which is recommended for most environments. We’ll go over the main differences between the three available Admission Control policies after we discuss the second issue.

Issue #2: Not being able to enter Maintenance Mode

Description

It might be a corner case, but I still see it quite often. It’s when you have two hosts in a cluster (such as ROBO, DR or just a small environment) and try to put one host into maintenance mode.

The first issue you will encounter is that VMs are not automatically vMotion’ed to other hosts using DRS. You have to evacuate VMs manually.

And then once you move all VMs to the other host and put it into maintenance mode, you again can no longer power on VMs and get the same error: “Insufficient resources to satisfy configured failover level for vSphere HA”.

poweron_fail

Cause

This happens because disconnected hosts and hosts in maintenance mode are not used in Admission Control calculations. And one host is obviously not enough for failover, because if it fails, there are no other hosts to fail over to.

Solution

If you got caught up in such situation you can temporarily disable Admission Control all together until you finish maintenance. This is the reason why it’s often recommended to have at least 3 hosts in a cluster, but it can not always be justified if you have just a handful of VMs.

Alternatives to Slot Size Admission Control

There are another two Admission Control policies. First is “Specify a Failover Host”, which dedicates a host (or hosts) for failover. Such host acts as a hot standby and can run VMs only in a failover situation. This policy is ideal if you want to reserve failover resources.

And the second is “Percentage of Cluster Resources”. Resources under this policy are reserved based on the percentage of total cluster resources. If you have five hosts in your cluster you can reserve 20% of resources (which is equal to one host) for failover.

This policy uses percentage of cluster resources, instead of slot sizes, and hence doesn’t have the issues of the “Host Failures Cluster Tolerates” policy. There is a gotcha, if you add another five hosts to your cluster, you will need to change reservation to 10%, which is often overlooked.

Conclusion

“Percentage of Cluster Resources” policy is recommended to use in most cases to avoid issues with slot sizes. What is important to understand is that the goal of this policy is just to guarantee that VMs with reservations can be restarted in a host failure scenario.

If a VM has no reservations, then “Percentage of Cluster Resources” policy will use only memory overhead of this VM in its calculations. Which is probably the most confusing part about Admission Control in general. But that’s a topic for the next blog post.

 

GFS backup scheme in Symantec Backup Exec

March 23, 2012

Grandfather-Father-Son is an industry standard backup scheme, where you have 5 daily backups, 5 weekly backups and as many monthly as you need. Symantec Backup Exec has prebuilt policy for GFS, but before going into configuring backup scheme itself, lets talk a little bit about general backup job configuration in Backup Exec.

Basic Terminology

Inside user interface you see Jobs, Policies, Selection Lists and Media Sets. First of all you need to create Selection List, which describes what you want to backup. There you select files and folders from your Windows, Unix or NDMP servers. Then you create Media Set, which is a collection of tapes with particular append and retention periods. Append period specifies how long data can be added to the same tape and retention period tells for how long data cannot be overwritten. Retention period starts form the time of last append to the tape. Then you create Policy. Policy, by means of templates, defines when backup jobs are run, where backups are stored and what is the type of backup – incremental, differential or full. One policy can consist of several templates. In template you specify backup date and time, as well as target tape library.

GFS Implementation

Backup Exec has a template for GFS backup rotation scheme. Click “New policy using wizard”, choose GFS scheme and then select schedule, target backup device and media sets for daily, weekly and monthly backups. By default Backup Exec suggests the following configuration.

Three tape media sets:

  • Daily Media Set – 1 week overwrite, 1 week append
  • Weekly Media Set – 5 weeks overwrite, 5 weeks append
  • Monthly Media Set – 1 year overwrite, 1 year append

Policy with three templates:

  • Daily Backup – Monday to Friday, Incremental
  • Weekly Backup – every Friday, Full
  • Monthly Backup – first Saturday of each month, Full

Also Backup Exec automatically creates rules to resolve conflicts. For example when both Daily and Weekly backups try to run on Friday, jobs do not conflict, because weekly backups always supersede daily. Same for monthly.

I personally prefer another schedule. First of all, if you run your jobs after midnight, you will need to shift your schedules from Mon – Fri to Tue – Sat. Additionally, I run monthly backup on the first Saturday of the month. Backup Exec by default (taking into consideration my one day shift) would suggest first Sunday for the monthly backup. However, it doesn’t make much sense to have weekly on Saturday and then monthly next day on Sunday. You would just consume more space without any benefit. Also, you can schedule monthly on the last Saturday of the month, but if the last day is Thursday, for example, then you will loose four business days from your monthly backup.

After the policy is created, you need to create backup jobs using this policy by clicking on New jobs using policy. All three jobs will be created automatically according to Selection List, as well as Policy Schedule, Target, and Backup Type parameters.

I’d also recommend everyone to configure notifications. There are general Alerts properties as well as inside each job.

Why I wouldn’t recommend Microsoft Data Protection Manager as a backup solution

February 22, 2012

When taking into account different backup solutions which are in the market, MS DPM 2010 was rather attractive for us. It’s a solution from leading software vendor and it’s cheap. However, DPM has a number of limitations which forced us to abandon it and rebuild our backup procedures from ground up. Here I’d like to describe major points as impartial as I can.

The first thing about DPM is that it uses VSS snapshots as one and only backup method. The major consequence is that you are very limited in flexibility and cannot do incremental, differential or full backups, implement GFS backup strategy or Progressive Paradigm. The only option you have is to exclude weekends or any other particular days from backups. That means ineffective storage utilization and inability to have longer data retention as you could have with flexible backup policy as GFS for instance, not to mention additional spendings on storage.

Another problem of VSS is that it supports only 64 snapshots. Basically, that means if you exclude weekends from your backup policy, you will be able to have backups for 89 day period. It’s clearly not enough if, for example, you work in a financial institution where you have strict policies of long data retention. DPM assumes that you will use tapes for prolonged data retention. If you already have tapes then you are good to go, if not then once again it’s additional expenditures.

Interestingly enough in DPM you cannot have different retention policies for data which resides on the same volume. Say I want to keep database backups for 3 months and transaction logs for the last week. If database backups and transaction logs reside on the same volume then you will have to create the second volume and separate them.

Limitation which I personally find very inconvenient is space reservation. Each time you create a Protection Group you reserve space for it. Say 500GB. And you cannot change it. In case one folder from ten, which you backup from the server, moves to another place and 250GB become free, the only option you have is to destroy Protection Group, loose all backups and recreate it. DPM helps you in situation when you don’t know how your data will grow and you can specify smaller storage size initially and it will automatically grow as needed. However, it can extend only 32 times. If you hit that limit, then you are in the same situation as before.

Another major issue arise when you change protected server name. If server name is changed the only option you have is destroying protection group, loosing all backups and recreating it.

The next limitation I also find inflexible is target storage. In DPM you can only use blocked devices as target storage to keep your backups. So it’s either DAS, FC or iSCSI storage. NAS is not supported.

If you work in SMB then you would probably have issues with installation and support of legacy systems. DPM works only on 64-bit Windows Server 2008 platforms (Windows Server 2003 is not supported). DPM doesn’t support Bare Metal Recovery (BMR) of Windows Server 2003.

And lastly, DPM keeps all data on a raw volume. Raw volumes are more efficient in terms of disk I/O performance. But when it’s helpful for DBMS it doesn’t seem to make any difference for backup software. The downside of it is higher risk of loosing all data in case of volume damage or DPM bug. It’s arguable, so I will leave it as my personal opinion.

In conclusion, it’s rather disappointing to see how software with 7 years history (DPM 2006 was released in 2005) has more limitations than any backup software solution I can think of. Even if you don’t have enough money for Symantec Backup Exec, ARCServe Backup, HP Data Protector or any other software, I would recommend to make more effort and search for some other solution. Otherwise you can fall into the same trap as we did.

Mad IT workdays

February 10, 2010

Today I needed to get Symantec Storage Exec to work with NetApp filer. This software allows to enforce file blocking and allocation policies on filer’s volumes.

I spent whole day resolving numerous problems while integrating them:

  1. When I was trying to install Symantec, installer said that it was interrupted and installation had to be rolled back. I couldn’t find ANY information regarding this issue in the whole Internet. I found posts with similar problems with other Symantec products but they didn’t help. Then somehow I found installation log and made search with lines from it. Finally I ran into this solution: http://seer.entsupport.symantec.com/docs/284901.htm. So the Symantec uninstaller for some damn reason left keys in the registry and couldn’t install itself for the second time because of it’s own fault.
  2. Then I’ve got “Can’t connect to host (err=10061)” after adding filer to the list of managed appliances. This link (http://seer.entsupport.symantec.com/docs/326973.htm) says I need to enable HTTP access. What? We don’t even have HTTP license. After half an hour of playing with filer configure options I found out that it’s not an access to httpd server it’s an access to magic filer administration area which is governed by httpd.admin.enable option (don’t forget also to add Storage Exec server IP to httpd.admin.access).
  3. The next error is: “HTTP POST authorization failed” in Storage Exec and “HTTP XML Authentication failed” from the filer side. It turned out that I also needed the user with the same user name and password as the user from which Storage Exec is being run. This user should be in the filer’s Administrators group.

Symantec’s documentation doesn’t have a word about all this stuff. It doesn’t say about access to filer’s administrative area and necessary user names.  You have to find this all out by yourself. I think Symantec’s docs leave too much to be desired and it’s the most mild way to describe it. And also there is little information about Storage Exec in the Internet. It seems that not many people are actually using it.