Posts Tagged ‘restore’

AWS Cloud Protection Manager Part 3: Backup and Restore

August 21, 2017

Backup

Backups are created according to the schedule specified in the backup policy. We discussed how to configure backup policies in the previous blog post of the series. The list of backups you see on the Backup Monitor tab are your restore points. Backups that are older then the specified retention policy will be purged from the list and you will not see them there, unless you move them to “Freezer”.

It is important to understand that apart from volume snapshots, for each backed up instance CPM also creates an AMI. Those who has hands-on experience with AWS may already know, that AMIs is the only way to create clones of Windows EC2 instances in AWS. If you go to AWS console and try to find a clone action under the instance Action menu, you won’t find any. You will have “Create Image” instead. It creates an AMI, from which you can then spin up a clone of an instance the image was created from.

CPM does exactly that. For each backup policy the instance is under, it creates one AMI. In our example we have four backup policies, that will result in four AMIs for each of the instances. Every AMI has to have at least one storage volume. So CPM will include the root volume of each instance into AMI, just because it has to. But AMIs are required only to restore EC2 instance configuration. Data is restored from volume snapshots, that can be used to create new volumes from them and then attach them to the instance. You can click on the View button under Snapshots to find the corresponding snapshot and AMI IDs.

There is a backup log for each job run as well that is helpful for issue troubleshooting.

Restore

To perform a restore click on the Recover button next to the backup job and you will get the list of the instances you can recover. CPM offers you three options: instance recovery, volume recovery and file recovery. Let’s go back to front.

File recovery is probably the most used recovery option. As it lets you restore individual files. When you click on the “Explore” button, CPM creates new volumes from the snapshots you are restoring from and mount them to the CPM instance. You are then presented with a simple file system browser where you can find the file and click on the green down arrow icon in Download column to save the file to your computer.

If you click on “Volume Only”, you can restore particular volumes. Restored volumes are not attached to any instance, unless you specify it under “Attach to Instance” column. You can then select under “Attach Behaviour” what CPM should do if such volume is already attached to the instance or if you want to automatically detach the original volume, but the instance is running (you can do it only if instance is stopped).

And the last option is “Instance”. It will create a clone of the original instance using the pre-generated AMI and volume snapshots, as we discussed in the Backup section of this blog post. You can specify many options under Advanced Options section, including recovery to another VPC or different availability zone. If anything, make sure you specify a new IP address for the instance, otherwise you’ll have a conflict and your restore will fail. Ideally you should also shut down the original EC2 instance before spinning up a restore clone.

Advanced Features

There are quite a few worth mentioning. So far we have looked at simple EC2 instance restore. But you don’t have to backup whole instances, you can also backup individual volumes. On top of that, CPM supports RDS database, Aurora and Redshift cluster backups.

If you run MS Exchange, Sharepoint or SQL on your EC2 instances, you can install CPM backup agent on them to ensure you have application-consistent backups via VSS, as opposed to crash-consistent backups you get if agent is not used. If you install the agent, you can also run a script on the instance before and after the backup is taken.

Last but not least is DR. Restoring to another availability zone within the region is already supported on instance recovery level. You can choose availability zone you want to restore to. It is not possible to recover to another region, though. Because AWS snapshots and AMIs are local to the region they are created in. If you want to be able to recover to another region, you can configure DR in CPM, which will utilise AWS AMI and snapshot copy functionality to copy backups to another region at configured frequency.

Conclusion

Overall, I found Cloud Protection Manager very easy to install, configure and use. If you come from infrastructure background, at first glance CPM may look to you like a very basic tool, compared to such feature-rich solutions like Veeam or Commvault. But that feeling is misleading. CPM is simple, because AWS simple. All infrastructure complexity is hidden under the covers. As a result, all AWS backup tools need to do is create snapshots and CPM does it well.

Advertisements

AWS Cloud Protection Manager Part 2: Configuration

August 14, 2017

Overview

As we discussed in Part 1 of this series, snapshots serve as a good basis to implement backup in AWS. But AWS does not provide an out-of-the-box tool that can manage snapshots at scale and perform snapshot creation/deletion based on a defined retention. Rich AWS APIs allow you to build such tool yourself or you can use an existing backup solution built for AWS. In this blog post we are looking at one such product, called Cloud Protection Manager.

You will be surprised to know that the first version of Cloud Protection Manager was released back in 2013. The product has matured over the years and the current CPM version 2.1 according to N2W web-site has become quite popular amongst AWS customers.

CPM is offered in four different versions: Standard, Advanced, Enterprise and Enterprise Plus. Functionality across all four versions is mostly the same, with the key difference being the number of instances you can backup. Ranging from 20 instances in Standard and $5 per instance in Enterprise Plus.

Installation

CPM offers a very straightforward consumption model. You purchase it from AWS marketplace and pay by the month. Licensing costs are billed directly to your account. There are no additional steps involved.

To install CPM you need to find the version you want to purchase in AWS Marketplace, specify instance settings, such as region, VPC subnet, security group, then accept the terms and click launch. AWS will spin up a new CPM server as an EC2 instance for you. You also have an option to run a 30-day trial if you want to play with the product before making a purchasing decision.

Note that CPM needs to be able to talk to AWS API endpoints to perform snapshots, so make sure that the appliance has Internet access by means of a public IP address, Elastic IP address or a NAT gateway. Similarly, the security group you attach it to should at least have HTTPS out allowed.

Initial Configuration

Appliance is then configured using an initial setup wizard. Find out what private IP address has been assigned to the instance and open a browser session to it. The wizard is reasonably straightforward, but there are two things I want to draw your attention to.

You will be asked to create a data volume. This volume is needed purely to keep CPM configuration and metadata. Backups are kept in S3 and do not use this volume. The default size is 5GB, which is enough for roughly 50 instances. If you have a bigger environment allocate 1GB per every 10 AWS instances.

You will also need to specify AWS credentials for CPM to be able to talk to AWS APIs. You can use your AWS account, but this is not a security best practice. In AWS you can assign a role to an EC2 instance, which is what you should be using for CPM. You will need to create IAM policies that essentially describe permissions for CPM to create backups, perform restores, send notifications via AWS SNS and configure EC2 instances. Just refer to CPM documentation, copy and paste configuration for all policies, create a role and specify the role in the setup wizard.

Backup Policies

Once you are finished with the initial wizard you will be able to log in to the appliance using the password you specified during installation. As in most backup solutions you start with backup policies, which allow you to specify backup targets, schedule and retention.

One thing that I want to touch on here is backup schedules, that may be a bit confusing at start. It will be easier to explain it in an example. Say you want to implement a commonly used GFS backup schedule, with 7 daily, 4 weekly, 12 monthly and 7 yearly backups. Daily backup should run every day at 8pm and start from today. Weekly backups run on Sundays.

This is how you would configure such schedule in CPM:

  • Daily
    • Repeats Every: 1 Days
    • Start Time: Today Date, 20:00
    • Enabled on: Mon-Sat
  • Weekly
    • Repeats Every: 1 Weeks
    • Start Time: Next Sunday, 20:00
    • Enabled on: Mon-Sun
  • Monthly
    • Repeats Every: 1 Months
    • Start Time: 28th of this month, 21:00
    • Enabled on: Mon-Sun
  • Yearly:
    • Repeats Every: 12 Months
    • Start Time: 31st of December, 22:00
    • Enabled on: Mon-Sun

Some of the gotchas here:

  • “Enabled on” setting is relevant only to the Daily backup, the rest of the schedules are based on the date you specify in “Start Time” field. For instance, if you specify a date in the Weekly backup Start Time that is a Sunday, your weekly backups will run every Sunday.
  • Make sure to run your Monthly backup on 28th day of every month, to guarantee you have a backup every month, including February.
  • It’s not possible to prevent Weekly backup to not run on the last week of every month. So make sure to adjust the Start Time for the Monthly backup so that Weekly and Monthly backups don’t run at the same time if they happen to fall on the same day.
  • Same considerations are true for the Yearly backup as well.

Then you create your daily, weekly, monthly and yearly backup policies using the corresponding schedules and add EC2 instances that require protection to every policy. Retention is also specified at the policy level. According to our scenario we will have 6 generations for Daily, 4 generations for Weekly, 12 generations for Monthly and 7 generations for Yearly.

Notifications

CPM uses AWS Simple Notification Service (SNS) to send email alerts. If you gave CPM instance SNS permissions in IAM role you created previously, you should be able to simply go to Notification settings, enable Alerts and select “Create new topic” and “Add user email as recipient” options. CPM will create a SNS topic in AWS for you automatically and use email address you specified in the setup wizard to send notifications to. You can change or add more email addresses to the SNS topic in AWS console later on if you need to.

Conclusion

This is all you need to get your Cloud Protection Manager up and running. In the next blog post we will look at how instances are backed up and restored and discuss some of the advanced backup options CPM offers.

Brocade 300 Firmware Upgrade

December 14, 2015

In my previous post Brocade 300 Initial Setup I briefly went through the firmware upgrade process, which is a part of every new switch installation. Make sure to check the post out for instructions on how to install a FTP server. You will need it to upload firmware to the switch.

I intentionally didn’t go into all details of firmware upgrade in my previous post, as it’s not necessary for a green field install. For a production switch the process is different. The reason is, if you’re upgrading to a Fabric OS version which is two or more versions apart from the current switch firmware revision, it will be disruptive and take the FC ports offline. Which is fine for a new deployment, but not ideal for production. 

Disruptive and Non-Disruptive Upgrades

Brocade Fabric OS major firmware release versions are 6.3.x, 6.4.x, 7.0.x, 7.1.x, 7.2.x, etc. For a NDU the rule of thumb is to apply all major releases consecutively. For example, if my production FC switch is running FOS version 6.3.2b and I want to upgrade to version 7.2.1d, which is the latest recommended version for my hardware platform, then I’ll have to upgrade from 6.3.2b to 6.4.x to 7.0.x to 7.1.x and finally to 7.2.1d.

First and foremost save the current switch config and make a config backup via FTP (give write permissions to your FTP user’s home folder). Don’t underestimate this step. The last thing you want to do is to recreate all zoning if switch loses config during the upgrade:

> cfgSave
> configUpload

configUpload

In case you need to restore, you can run the following command to download the backed up config back to the switch:

> configDownload

Next step is to install every firmware revision up to the desired major release (-s key is not required):

> firmwaredownload

brocade_ndu.JPG

Brocade switch has two firmware partitions – primary and secondary. Primary is the partition the switch boots from. And the secondary partition is used for firmware upgrades.

After each upgrade switch does a warm reboot. All FC ports stay up and switch continues to forward FC frames with no disruption to FC traffic. To accomplish that, switch uses the secondary partition to upload the new firmware to and then quickly swap them without disrupting FC switching.

At a high level the upgrade process goes as follows:

  • The Fabric OS downloads the firmware to the secondary partition.
  • The system performs a high availability reboot (haReboot). After the haReboot, the former secondary partition is the primary partition.
  • The system replicates the firmware from the primary to the secondary partition.

Each upgrade may take up to 30 minutes to complete, but in my experience it doesn’t take more than 10 minutes. Once the first switch is upgraded, log back in and check the firmware version. And you will see how secondary partition has now become primary and firmware is uploaded to the secondary partition.

brocade_upgrade3

As a last step, check that FC paths on all hosts are active and then move on to the second switch. The steps are exactly the same for each upgrade.

Firmware Upload and Commit

Under normal circumstances when you run the firmwareDownload command, switch does the whole upgrade in an automated fashion. After the upgrade is finished you end up with both primary and secondary partitions on the same firmware version. But if you’re a large enterprise, you may want to test the firmware first and have an option to roll-back.

To accomplish that you can use -s key and disable auto-commit:

single_mode

Switch will upload the firmware to the secondary partition, switch secondary and primary partitions after a reboot, but won’t replicate the firmware to the secondary partition. You can use the following command to restore firmware back to the previous version:

> firmwareRestore

Or if you’re happy with the firmware, commit it to the secondary partition:

>  firmwareCommit

The only caveat here, a non-disruptive upgrade is not supported in this scenario. When switch reboots, it’ll be disruptive to FC traffic.

Important Notes

When downloading firmware for your switch, make sure to use switch’s vendor web-site. EMC Connectrix DS-300B, Brocade 300 and IBM SAN24B-4 are essentially the same switch, but firmware and supported versions for each OEM vendor may slightly vary. Here are the links where you can get FC switch firmware for some of the vendors:

  • EMC: sign in to http://support.emc.com > find your switch model under the product section and go to downloads
  • Brocade: sign in to http://www.brocade.com > go to Downloads section > enter FOS in the search field
  • Dell: http://www.brocadeassist.com/dellsoftware/public/DELLAssist includes a subset of Fabric OS versions, which are tested and approved by Dell
  • IBM: http://ibm.brocadeassist.com/public/FabricOSv6xRelease and http://ibm.brocadeassist.com/public/FabricOSv7xRelease are the links where you can download FOS for IBM switches. You can also go to http://support.ibm.com, search for the switch in the Product Finder and find FOS under the “Downloads (drivers, firmware, PTFs)” section

References

NetApp VSC Single File Restore Explained

August 5, 2013

netapp_dpIn one of my previous posts I spoke about three basic types of NetApp Virtual Storage Console restores: datastore restore, VM restore and backup mount. The last and the least used feature, but very underrated, is the Single File Restore (SFR), which lets you restore single files from VM backups. You can do the same thing by mounting the backup, connecting vmdk to VM and restore files. But SFR is a more convenient way to do this.

Workflow

SFR is pretty much an out-of-the-box feature and is installed with VSC. When you create an SFR session, you specify an email address, where VSC sends an .sfr file and a link to Restore Agent. Restore Agent is a separate application which you install into VM, where you want restore files to (destination VM). You load the .sfr file into Restore Agent and from there you are able to mount source VM .vmdks and map them to OS.

VSC uses the same LUN cloning feature here. When you click “Mount” in Restore Agent – LUN is cloned, mapped to an ESX host and disk is connected to VM on the fly. You copy all the data you want, then click “Dismount” and LUN clone is destroyed.

Restore Types

There are two types of SFR restores: Self-Service and Limited Self-Service. The only difference between them is that when you create a Self-Service session, user can choose the backup. With Limited Self-Service, backup is chosen by admin during creation of SFR session. The latter one is used when destination VM doesn’t have connection to SMVI server, which means that Remote Agent cannot communicate with SMVI and control the mount process. Similarly, LUN clone is deleted only when you delete the SFR session and not when you dismount all .vmdks.

There is another restore type, mentioned in NetApp documentation, which is called Administartor Assisted restore. It’s hard to say what NetApp means by that. I think its workflow is same as for Self-Service, but administrator sends the .sfr link to himself and do all the job. And it brings a bit of confusion, because there is an “Admin Assisted” column on SFR setup tab. And what it actually does, I believe, is when Port Group is configured as Admin Assisted, it forces SFR to create a Limited Self-Service session every time you create an SFR job. You won’t have an option to choose Self-Assisted at all. So if you have port groups that don’t have connectivity to VSC, check the Admin Assisted option next to them.

Notes

Keep in mind that SFR doesn’t support VM’s with IDE drives. If you try to create SFR session for VMs which have IDE virtual hard drives connected, you will see all sorts of errors.

Magic behind NetApp VSC Backup/Restore

June 12, 2013

netapp_dpNetApp Virtual Storage Console is a plug-in for VMware vCenter which provides capabilities to perform instant backup/restore using NetApp snapshots. It uses several underlying NetApp features to accomplish its tasks, which I want to describe here.

Backup Process

When you configure a backup job in VSC, what VSC does, is it simply creates a NetApp snapshot for a target volume on a NetApp filer. Interestingly, if you have two VMFS datastores inside one volume, then both LUNs will be snapshotted, since snapshots are done on the volume level. But during the datastore restore, the second volume will be left intact. You would think that if VSC reverts the volume to the previously made snapshot, then both datastores should be affected, but that’s not the case, because VSC uses Single File SnapRestore to restore the LUN (this will be explained below). Creating several VMFS LUNs inside one volume is not a best practice. But it’s good to know that VSC works correctly in this case.

Same thing for VMs. There is no sense of backing up one VM in a datastore, because VSC will make a volume snapshot anyway. Backup the whole datastore in that case.

Datastore Restore

After a backup is done, you have three restore options. The first and least useful kind is a datastore restore. The only use case for such restore that I can think of is disaster recovery. But usually disaster recovery procedures are separate from backups and are based on replication to a disaster recovery site.

VSC uses NetApp’s Single File SnapRestore (SFSR) feature to restore a datastore. In case of a SAN implementation, SFSR reverts only the required LUN from snapshot to its previous state instead of the whole volume. My guess is that SnapRestore uses LUN clone/split functionality in background, to create new LUN from the snapshot, then swap the old with the new and then delete the old. But I haven’t found a clear answer to that question.

For that functionality to work, you need a SnapRestore license. In fact, you can do the same trick manually by issuing a SnapRestore command:

> snap restore -t file -s nightly.0 /vol/vol_name/vmfs_lun_name

If you have only one LUN in the volume (and you have to), then you can simply restore the whole volume with the same effect:

> snap restore -t vol -s nightly.0 /vol/vol_name

VM Restore

VM restore is also a bit controversial way of restoring data. Because it completely removes the old VM. There is no way to keep the old .vmdks. You can use another datastore for particular virtual hard drives to restore, but it doesn’t keep the old .vmdks even in this case.

VSC uses another mechanism to perform VM restore. It creates a LUN clone (don’t confuse with FlexClone,which is a volume cloning feature) from a snapshot. LUN clone doesn’t use any additional space on the filer, because its data is mapped to the blocks which sit inside the snapshot. Then VSC maps the new LUN to the ESXi host, which you specify in the restore job wizard. When datastore is accessible to the ESXi host, VSC simply removes the old VMDKs and performs a storage vMotion from the clone to the active datastore (or the one you specify in the job). Then clone is removed as part of a clean up process.

The equivalent cli command for that is:

> lun clone create /vol/clone_vol_name -o noreserve -b /vol/vol_name nightly.0

Backup Mount

Probably the most useful way of recovery. VSC allows you to mount the backup to a particular ESXi host and do whatever you want with the .vmdks. After the mount you can connect a virtual disk to the same or another virtual machine and recover the data you need.

If you want to connect the disk to the original VM, make sure you changed the disk UUID, otherwise VM won’t boot. Connect to the ESXi console and run:

# vmkfstools -J setuuid /vmfs/volumes/datastore/VM/vm.vmdk

Backup mount uses the same LUN cloning feature. LUN is cloned from a snapshot and is connected as a datastore. After an unmount LUN clone is destroyed.

Some Notes

VSC doesn’t do a good cleanup after a restore. As part of the LUN mapping to the ESXi hosts, VSC creates new igroups on the NetApp filer, which it doesn’t delete after the restore is completed.

What’s more interesting, when you restore a VM, VSC deletes .vmdks of the old VM, but leaves all the other files: .vmx, .log, .nvram, etc. in place. Instead of completely substituting VM’s folder, it creates a new folder vmname_1 and copies everything into it. So if you use VSC now and then, you will have these old folders left behind.

AIX at first glance

May 19, 2012

Recently I set up an AIX 5.1 on a RS/6000 box. Now, after some time working with the OS, I’d like to share my first impressions and features that distinguishes it from Linux.

FYI: Do not try to run AIX on x86, it won’t work. And it have never done. Only PowerPC and POWER RISC architectures.

System Management Services

The very first thing which may surprise you when you start a PowerPC system is absence of BIOS. PowerPC uses SMS which is an acronym for System Management Services. You enter SMS by pressing F2 during server startup. However, SMS implements same features as conventional server’s BIOS. Like configuring boot sequence, performing simple diagnostics, etc.

AIX default shell

AIX uses KornShell (ksh) by default. Bourne shell (bsh) is also available. But do not confuse it with Bourne-again shell (bash). It was developed two ears earlier (1989) than AIX 5.1 (2001), but wasn’t included. What’s interesting about ksh is that by default it works in vi editing mode. It means that initially you work in an input mode and enter commands by typing and hitting return as usual. Type ESC to enter control mode. For example type CTRL+V in control mode and you will find your ksh version. Mine is M-11/16/88f. If you type backslash (\) in control mode you will complete a file path. ksh88 shortcoming is that it doesn’t support commands completion.

System Management Interface Tool

AIX operating system is configured using the System Management Interface Tool (SMIT). It’s an equivalent of YaST in SuSE, redhat-config-* tools in Red Hat or Windows Control Panel. SMIT is very thorough configuration tool. For example, user add page consists of forty fields! SMIT has several handy functional keys. For instance, F5 sets field to the default value, using F9 you can temporarily invoke command shell, F4 generates a list if field implies it, like list of packages available to install from particular directory. Apart from that, SMIT has weird field hints: ‘-‘ says that field is numerical, ‘+’ means a list, ‘/’ is a path. Everything you do in SMIT is logged in /smit.log.

Web-based System Manager

On top of that, AIX has Web-based System Manager (WebSM) which lets you monitor your system and manage devices, backups, processes and virtually everything in your operating system. You can do that either from inside operating system itself or through standalone client which is available for Windows and Linux. To manage your AIX host via WebSM you need to have equal Manager and Remote Client versions.  To satisfy that you can download Windows version of Web-based System Manager Remote Client right from the AIX host using SCP or FTP from /usr/websm/pc_client/setup.exe. WebSM Client for AIX 5 is incompatible with Windows 7.

 

Object Data Manager

Feature which is unique to AIX is Object Data Manager (ODM) database, which maintains device configuration. ODM consists of Predefined Configuration Database (PCD) and Customized Configuration Database (CCD). Predefined Configuration Database keeps information on supported devices which means devices for which AIX has drivers and Customized Configuration Database hold information of devices which are currently connected to the system. Data in ODM is stored in terms of objects and their attributes. Access to ODM is implemented via special API. User can manage ODM by calling odmshow, odmadd, odmchange and odmdelete utilities. Additionally, AIX uses location codes to identify devices. Location code is effectively a path from a motherboard to a device. For example, location code of a SCSI device is in the form AB-CD-EF-G,H. Here AB is a bus type, CD – slot or adapter number, EF – connector ID, G – Control Unit Address of SCSI Device, H – Logical Unit Address of SCSI Device. I have two SCSI hard drives hdisk0 and hdisk1. For hdisk0 location code is 04-C0-00-5,0. Here 04 means PCI bus (00 – CPU bus, 01 – ISA bus, 05 – PCMCIA bus), C0 – integrated SCSI controller (A0 -ISA bus, B0 – secondary PCI bus), 00 – SCSI bus number, 5 – SCSI ID, 0 – LUN.

Logical Volume Manager

Did you know that LVM was implemented in AIX ten years earlier (1989) than in Linux (1998)? In fact, after AIX version of LVM was developed, its license was bought by HP. And only after that Heinz Mauelshagen developed Linux version with commands similar to the HP version. Windows Server platform still doesn’t have anything similar AFAIK.

Journaling File System

Another AIX achievement is JFS file system which is journaling by design. First JFS version was implemented in 1990 in AIX 3.1 Do you remeber when ext3 was developed? I believe somewhere in 2001. Journaling NTFS v3 was implemented in 2000 with Windows Server 2000. JFS file system in AIX 4.2 supported 64GB file size (it was 1996). With introduction of JFS2 in 2001, AIX 5 began to support 1TB files. Maximum file size for FAT32 was 4GB. All these facts are explainable. AIX was developed far earlier than Linux and Windows. But it’s still interesting how features firstly introduced in AIX (and other flavors of UNIX) migrate to younger OSes.

Full system recovery

Unlike Linux, AIX allows you to create full volume group backups with all logical volumes. Even in present times in Linux you work with antique tar, gzip, cpio and dd (or duplicity and bacula if you want something more sophisticated). In 2001 AIX already had savevg for backing up non-rootvg volume groups and mksysb which lets you backup rootvg along with system related data. mksysb creates installable image for full system recovery. I find these tools invaluable. I do not know of Linux alternative.

User/group administration

Additionally, AIX has several handy user administration features. For example, a user group can be either administrative or standard. If it’s administrative, then only root can add/remove users from it. If it’s standard, it means that ordinary users can administer that group. Feature I sometimes lack in Linux. Groups are configured in /etc/security/group and look like the following:

system:
admin = true

jradmin:
admin = false
adms = pac,xander

Here system is an administrative group and jradmin is standard. admin field identifies group type and adms contains the list of group administartors (pac an xander). Also, in AIX you can assign portions of root authority to non-root users. There are several predefined roles, like ManageAllUsers, ManageShutdown, ManageBackupRestore, etc, defined in /etc/security/roles. Roles consist of a number of authorizations, which is a set of particular tasks that user can perform. For example, ManageAllUsers role consists of the following authorizations: UserAudit, ListAuditClasses, UserAdmin, RoleAdmin, PasswdAdmin, GroupAdmin. You can create your own roles from these authorizations. In AIX 5 Role-Based Access Control (RBAC) is rather primitive and restricted, but it’s better than nothing.

Error logging

And the last thing I’d like to talk about is error logging. In Linux logging is performed by syslogd, AIX has the same daemon. However, AIX error logging facility is augmented by errdemon. It is started as part of system initialization and continuously monitors /dev/error. When information is read from /dev/error errdemon checks its Error Record Template Repository /var/adm/ras/errtmplt and if it has any additional info on this error, demon writes this information into /var/adm/ras/errlog. Log is in binary format. To read it run errpt command:

errpt -a -s 0519000012

This will show you detailed information on log entries starting from 19th of May 2012 00:00 a.m.

Conclusion

My first experience working with AIX (even with such an outdated version) makes me think of it as a sophisticated and very well written operating system. Many major features were developed in AIX much earlier than in Linux and Windows and I believe it’s still true for modern AIX releases. It becomes obvious why Unix is the primary choice for many big organizations with strong IT infrastructure.

Why I wouldn’t recommend Microsoft Data Protection Manager as a backup solution

February 22, 2012

When taking into account different backup solutions which are in the market, MS DPM 2010 was rather attractive for us. It’s a solution from leading software vendor and it’s cheap. However, DPM has a number of limitations which forced us to abandon it and rebuild our backup procedures from ground up. Here I’d like to describe major points as impartial as I can.

The first thing about DPM is that it uses VSS snapshots as one and only backup method. The major consequence is that you are very limited in flexibility and cannot do incremental, differential or full backups, implement GFS backup strategy or Progressive Paradigm. The only option you have is to exclude weekends or any other particular days from backups. That means ineffective storage utilization and inability to have longer data retention as you could have with flexible backup policy as GFS for instance, not to mention additional spendings on storage.

Another problem of VSS is that it supports only 64 snapshots. Basically, that means if you exclude weekends from your backup policy, you will be able to have backups for 89 day period. It’s clearly not enough if, for example, you work in a financial institution where you have strict policies of long data retention. DPM assumes that you will use tapes for prolonged data retention. If you already have tapes then you are good to go, if not then once again it’s additional expenditures.

Interestingly enough in DPM you cannot have different retention policies for data which resides on the same volume. Say I want to keep database backups for 3 months and transaction logs for the last week. If database backups and transaction logs reside on the same volume then you will have to create the second volume and separate them.

Limitation which I personally find very inconvenient is space reservation. Each time you create a Protection Group you reserve space for it. Say 500GB. And you cannot change it. In case one folder from ten, which you backup from the server, moves to another place and 250GB become free, the only option you have is to destroy Protection Group, loose all backups and recreate it. DPM helps you in situation when you don’t know how your data will grow and you can specify smaller storage size initially and it will automatically grow as needed. However, it can extend only 32 times. If you hit that limit, then you are in the same situation as before.

Another major issue arise when you change protected server name. If server name is changed the only option you have is destroying protection group, loosing all backups and recreating it.

The next limitation I also find inflexible is target storage. In DPM you can only use blocked devices as target storage to keep your backups. So it’s either DAS, FC or iSCSI storage. NAS is not supported.

If you work in SMB then you would probably have issues with installation and support of legacy systems. DPM works only on 64-bit Windows Server 2008 platforms (Windows Server 2003 is not supported). DPM doesn’t support Bare Metal Recovery (BMR) of Windows Server 2003.

And lastly, DPM keeps all data on a raw volume. Raw volumes are more efficient in terms of disk I/O performance. But when it’s helpful for DBMS it doesn’t seem to make any difference for backup software. The downside of it is higher risk of loosing all data in case of volume damage or DPM bug. It’s arguable, so I will leave it as my personal opinion.

In conclusion, it’s rather disappointing to see how software with 7 years history (DPM 2006 was released in 2005) has more limitations than any backup software solution I can think of. Even if you don’t have enough money for Symantec Backup Exec, ARCServe Backup, HP Data Protector or any other software, I would recommend to make more effort and search for some other solution. Otherwise you can fall into the same trap as we did.

Rollforward on a backup server to another db name

February 20, 2012

Sometimes it’s handy to restore a production database on a testing server to a particular point in time (PIT). This procedure involves several nuances.

First of all production databases usually have much more memory and larger buffer pools. If you have less memory (as in our case) on a testing server, then your database just won’t rollforward because it won’t be able to activate database which in its turn is due to inability to activate buffer pools. In this situation you also cannot alter buffer pools manually because you cannot connect to the database. Looks like a deadlock. Solution here is to use DB2_OVERRIDE_BPF registry variable with say 5000 pages. After a database restart all buffer pools will be of 5000 pages size. Then you will be able to alter your buffer pools. Don’t forget to unset DB2_OVERRIDE_BPF and restart after you finished all maintenance procedures. I do that inside a batch script:

db2set DB2_OVERRIDE_BPF=5000
db2stop force
db2start

db2cmd -w -c db2 -t -f db_rf_restore.db2 -z db_rf_restore.err

db2set DB2_OVERRIDE_BPF=
db2stop force
db2start

Actual restore is done inside db_rf_restore.db2 DB2 CLP script. Important point here is to use -w flag which will wait for script completion before moving on to the next command. And -c which will automatically close Command Window after script has been run. Otherwise you will need to do it by hand.

Another matter here is how to rollforward a database if you changed its name and moved it to another server. To accomplish that you will need to copy archival (don’t confuse them with active) logs from a production server to your testing server and use OVERFLOW LOG PATH (“D:\backup\tlogs”) clause in ROLLFORWARD DATABASE command to point to directory where you copied them to. Use also AND COMPLETE clause to finish rollforward process and turn off rollforward pending state.

Here is the complete restore script:

DROP DATABASE db_2;

CREATE DATABASE db_2 ON D:;

RESTORE DATABASE db
FROM “D:\backup”
TO D:
INTO db_2
WITH 2 BUFFERS
BUFFER 1024
PARALLELISM 1
COMPRLIB C:\SQLLIB\BIN\db2compr.dll
WITHOUT PROMPTING;

ROLLFORWARD DATABASE db_2
TO 2012-02-16-12.30.00.000000
AND COMPLETE
OVERFLOW LOG PATH (“D:\backup\tlogs”);

CONNECT TO db_2;
ALTER BUFFERPOOL data_pool SIZE 75000;
CONNECT RESET;

Migrating IBM DB2 from 32 to 64-bit platform

December 21, 2011

The best way to move your database from one server to another is a backup/restore procedure. You can also use db2move utility but it’s not much of help here because it moves only the tables.

If you use a built-in compression to reduce size of your backups which is a normal thing to do then if you’ll try to restore a backup made on a 32-bit architecture to a 64-bit platform using a command like this

RESTORE DATABASE mydb FROM “D:\dbbackup” TAKEN AT 20111215030019 TO “D:” WITH 2 BUFFERS BUFFER 1024 PARALLELISM 1 WITHOUT ROLLING FORWARD WITHOUT PROMPTING;

then you will get an error

SQL2570N  An attempt to restore on target OS “NT-64” from a backup created on source OS “NT-32” failed due to the incompatibility of operating systems or an incorrect specification of the restore command.  Reason-code: “2”.

The reason why this happens is a compression library. Each time you make a compressed backup DB2 puts a compression library into a backup itself. When restoring on a 64-bit platform DB2 refuses to use a 32-bit library. There are two solutions to this problem. First is to make a plane uncompressed backup. But if your backup file is quite large then it can be rather painful to move it between servers. Second solution is to add COMRLIB clause into the original query

RESTORE DATABASE mydb FROM “D:\dbbackup” TAKEN AT 20111215030019 TO “D:” WITH 2 BUFFERS BUFFER 1024 PARALLELISM 1 COMPRLIB C:\SQLLIB\BIN\db2compr.dll WITHOUT ROLLING FORWARD WITHOUT PROMPTING;

If you restore to an existing database you will get SQL2539 warning. Which just means that original database files will be deleted.

Use this workaround description at IBM site as a reference: IY71307: SQL2570N when restoring a compressed backup image to another platform or wordsize.