Posts Tagged ‘backup’

Dell Force10 Part 3: VLT Domain Configuration

July 31, 2016

dell-force10In my previous post here I went through VLT basics and how it helps to establish a loop-free network topology in a modern datacenter. Now lets dive deeper and see how VLT is configured from FTOS CLI.

VLT Configuration

The first step is to configure the backup links and VLT interconnect. Dell S4048-ON switches have six 40Gb QSFP+ ports, two of which 1/49 and 1/50 we will use for VLTi. Repeat the same configuration on both switches.

# int range fo 1/49-1/50
# no shutdown

# interface port-channel 127
# description “VLT interconnect”
# channel-member fo 1/49
# channel-member fo 1/50
# no shutdown

Now that we have a VLT interconnect set up, let’s join the first switch to a VLT domain:

# vlt domain 1
# back-up destination
# peer-link port-channel 127
# primary-priority 1

First switch points to the second switch management IP for a backup destination, uses port channel 127 as a VLT interconnect and becomes a primary peer, because it’s given the lowest priority of 1.

Do the same on the second switch, but now point to the first switch management IP for backup and use the highest priority to make this switch a secondary peer:

# vlt domain 1
# back-up destination
# peer-link port-channel 127
# primary-priority 8192

To confirm the VLT state use the following command:

# sh vlt brief


As you can see, the VLTi and backup links are up and the switch can see its peer. For some additional VLT specific information use these commands:

# sh vlt statistics
# sh vlt backup-link

I would also recommend to use the following command to see the port channel state and confirm that both VLTi links are in up state:

# sh int po127



In this part of the Dell Force10 switch configuration series we quickly went through the initial VLT setup. We haven’t touched on VLT LAG configuration yet. We will take a closer look at it in the next blog post.

Brocade 300 Firmware Upgrade

December 14, 2015

In my previous post Brocade 300 Initial Setup I briefly went through the firmware upgrade process, which is a part of every new switch installation. Make sure to check the post out for instructions on how to install a FTP server. You will need it to upload firmware to the switch.

I intentionally didn’t go into all details of firmware upgrade in my previous post, as it’s not necessary for a green field install. For a production switch the process is different. The reason is, if you’re upgrading to a Fabric OS version which is two or more versions apart from the current switch firmware revision, it will be disruptive and take the FC ports offline. Which is fine for a new deployment, but not ideal for production. 

Disruptive and Non-Disruptive Upgrades

Brocade Fabric OS major firmware release versions are 6.3.x, 6.4.x, 7.0.x, 7.1.x, 7.2.x, etc. For a NDU the rule of thumb is to apply all major releases consecutively. For example, if my production FC switch is running FOS version 6.3.2b and I want to upgrade to version 7.2.1d, which is the latest recommended version for my hardware platform, then I’ll have to upgrade from 6.3.2b to 6.4.x to 7.0.x to 7.1.x and finally to 7.2.1d.

First and foremost save the current switch config and make a config backup via FTP (give write permissions to your FTP user’s home folder). Don’t underestimate this step. The last thing you want to do is to recreate all zoning if switch loses config during the upgrade:

> cfgSave
> configUpload


In case you need to restore, you can run the following command to download the backed up config back to the switch:

> configDownload

Next step is to install every firmware revision up to the desired major release (-s key is not required):

> firmwaredownload


Brocade switch has two firmware partitions – primary and secondary. Primary is the partition the switch boots from. And the secondary partition is used for firmware upgrades.

After each upgrade switch does a warm reboot. All FC ports stay up and switch continues to forward FC frames with no disruption to FC traffic. To accomplish that, switch uses the secondary partition to upload the new firmware to and then quickly swap them without disrupting FC switching.

At a high level the upgrade process goes as follows:

  • The Fabric OS downloads the firmware to the secondary partition.
  • The system performs a high availability reboot (haReboot). After the haReboot, the former secondary partition is the primary partition.
  • The system replicates the firmware from the primary to the secondary partition.

Each upgrade may take up to 30 minutes to complete, but in my experience it doesn’t take more than 10 minutes. Once the first switch is upgraded, log back in and check the firmware version. And you will see how secondary partition has now become primary and firmware is uploaded to the secondary partition.


As a last step, check that FC paths on all hosts are active and then move on to the second switch. The steps are exactly the same for each upgrade.

Firmware Upload and Commit

Under normal circumstances when you run the firmwareDownload command, switch does the whole upgrade in an automated fashion. After the upgrade is finished you end up with both primary and secondary partitions on the same firmware version. But if you’re a large enterprise, you may want to test the firmware first and have an option to roll-back.

To accomplish that you can use -s key and disable auto-commit:


Switch will upload the firmware to the secondary partition, switch secondary and primary partitions after a reboot, but won’t replicate the firmware to the secondary partition. You can use the following command to restore firmware back to the previous version:

> firmwareRestore

Or if you’re happy with the firmware, commit it to the secondary partition:

>  firmwareCommit

The only caveat here, a non-disruptive upgrade is not supported in this scenario. When switch reboots, it’ll be disruptive to FC traffic.

Important Notes

When downloading firmware for your switch, make sure to use switch’s vendor web-site. EMC Connectrix DS-300B, Brocade 300 and IBM SAN24B-4 are essentially the same switch, but firmware and supported versions for each OEM vendor may slightly vary. Here are the links where you can get FC switch firmware for some of the vendors:

  • EMC: sign in to > find your switch model under the product section and go to downloads
  • Brocade: sign in to > go to Downloads section > enter FOS in the search field
  • Dell: includes a subset of Fabric OS versions, which are tested and approved by Dell
  • IBM: and are the links where you can download FOS for IBM switches. You can also go to, search for the switch in the Product Finder and find FOS under the “Downloads (drivers, firmware, PTFs)” section


Force10 MXL: Firmware Upgrade

March 19, 2015

Uploading new firmware image

MXL switches keep two firmware images – A and B. You can set either one of them to be active. Use the following command to list firmware version of all stack members and see which one is active:

 # show boot system stack-unit all


To upload firmware you will need a TFTP server. You can use TFTPD64 (also called TFTPD32), which can be downloaded from Philippe Jounin page here .

If the active firmware image is in A, upload new firmware to B. You’ll also be asked if you want to upload firmware to all switches in the stack.

# upgrade system tftp:// b:


At the time of upgrade the latest version was 9.7. This version was 2 weeks old and wasn’t recommended for use in production. Version 9.6 had a major bug. As a result version 9.5SP2 was chosen for the upgrade.

Double-check that new firmware has been successfully distributed to all switches:

 # show boot system stack-unit all


Backing up startup config

Make sure you do not miss this step. If something goes wrong and switch looses its config, you’ll have to recreate all configuration from scratch. Imagine the consequences.

# copy start tftp://

Reloading the stack

Once firmware is uploaded and config is saved, reload the stack. Be mindful that it’s a disruptive procedure and all links connected to the stack will go down. A reboot shouldn’t take more than a couple of minutes, but make sure you do that after hours.

# conf t
# boot system stack-unit all primary system b:
# exit
# copy run start
# reload

Confirm that the active firmware image is now B. And that concludes the switch stack upgrade.



NetApp VSC Single File Restore Explained

August 5, 2013

netapp_dpIn one of my previous posts I spoke about three basic types of NetApp Virtual Storage Console restores: datastore restore, VM restore and backup mount. The last and the least used feature, but very underrated, is the Single File Restore (SFR), which lets you restore single files from VM backups. You can do the same thing by mounting the backup, connecting vmdk to VM and restore files. But SFR is a more convenient way to do this.


SFR is pretty much an out-of-the-box feature and is installed with VSC. When you create an SFR session, you specify an email address, where VSC sends an .sfr file and a link to Restore Agent. Restore Agent is a separate application which you install into VM, where you want restore files to (destination VM). You load the .sfr file into Restore Agent and from there you are able to mount source VM .vmdks and map them to OS.

VSC uses the same LUN cloning feature here. When you click “Mount” in Restore Agent – LUN is cloned, mapped to an ESX host and disk is connected to VM on the fly. You copy all the data you want, then click “Dismount” and LUN clone is destroyed.

Restore Types

There are two types of SFR restores: Self-Service and Limited Self-Service. The only difference between them is that when you create a Self-Service session, user can choose the backup. With Limited Self-Service, backup is chosen by admin during creation of SFR session. The latter one is used when destination VM doesn’t have connection to SMVI server, which means that Remote Agent cannot communicate with SMVI and control the mount process. Similarly, LUN clone is deleted only when you delete the SFR session and not when you dismount all .vmdks.

There is another restore type, mentioned in NetApp documentation, which is called Administartor Assisted restore. It’s hard to say what NetApp means by that. I think its workflow is same as for Self-Service, but administrator sends the .sfr link to himself and do all the job. And it brings a bit of confusion, because there is an “Admin Assisted” column on SFR setup tab. And what it actually does, I believe, is when Port Group is configured as Admin Assisted, it forces SFR to create a Limited Self-Service session every time you create an SFR job. You won’t have an option to choose Self-Assisted at all. So if you have port groups that don’t have connectivity to VSC, check the Admin Assisted option next to them.


Keep in mind that SFR doesn’t support VM’s with IDE drives. If you try to create SFR session for VMs which have IDE virtual hard drives connected, you will see all sorts of errors.

Magic behind NetApp VSC Backup/Restore

June 12, 2013

netapp_dpNetApp Virtual Storage Console is a plug-in for VMware vCenter which provides capabilities to perform instant backup/restore using NetApp snapshots. It uses several underlying NetApp features to accomplish its tasks, which I want to describe here.

Backup Process

When you configure a backup job in VSC, what VSC does, is it simply creates a NetApp snapshot for a target volume on a NetApp filer. Interestingly, if you have two VMFS datastores inside one volume, then both LUNs will be snapshotted, since snapshots are done on the volume level. But during the datastore restore, the second volume will be left intact. You would think that if VSC reverts the volume to the previously made snapshot, then both datastores should be affected, but that’s not the case, because VSC uses Single File SnapRestore to restore the LUN (this will be explained below). Creating several VMFS LUNs inside one volume is not a best practice. But it’s good to know that VSC works correctly in this case.

Same thing for VMs. There is no sense of backing up one VM in a datastore, because VSC will make a volume snapshot anyway. Backup the whole datastore in that case.

Datastore Restore

After a backup is done, you have three restore options. The first and least useful kind is a datastore restore. The only use case for such restore that I can think of is disaster recovery. But usually disaster recovery procedures are separate from backups and are based on replication to a disaster recovery site.

VSC uses NetApp’s Single File SnapRestore (SFSR) feature to restore a datastore. In case of a SAN implementation, SFSR reverts only the required LUN from snapshot to its previous state instead of the whole volume. My guess is that SnapRestore uses LUN clone/split functionality in background, to create new LUN from the snapshot, then swap the old with the new and then delete the old. But I haven’t found a clear answer to that question.

For that functionality to work, you need a SnapRestore license. In fact, you can do the same trick manually by issuing a SnapRestore command:

> snap restore -t file -s nightly.0 /vol/vol_name/vmfs_lun_name

If you have only one LUN in the volume (and you have to), then you can simply restore the whole volume with the same effect:

> snap restore -t vol -s nightly.0 /vol/vol_name

VM Restore

VM restore is also a bit controversial way of restoring data. Because it completely removes the old VM. There is no way to keep the old .vmdks. You can use another datastore for particular virtual hard drives to restore, but it doesn’t keep the old .vmdks even in this case.

VSC uses another mechanism to perform VM restore. It creates a LUN clone (don’t confuse with FlexClone,which is a volume cloning feature) from a snapshot. LUN clone doesn’t use any additional space on the filer, because its data is mapped to the blocks which sit inside the snapshot. Then VSC maps the new LUN to the ESXi host, which you specify in the restore job wizard. When datastore is accessible to the ESXi host, VSC simply removes the old VMDKs and performs a storage vMotion from the clone to the active datastore (or the one you specify in the job). Then clone is removed as part of a clean up process.

The equivalent cli command for that is:

> lun clone create /vol/clone_vol_name -o noreserve -b /vol/vol_name nightly.0

Backup Mount

Probably the most useful way of recovery. VSC allows you to mount the backup to a particular ESXi host and do whatever you want with the .vmdks. After the mount you can connect a virtual disk to the same or another virtual machine and recover the data you need.

If you want to connect the disk to the original VM, make sure you changed the disk UUID, otherwise VM won’t boot. Connect to the ESXi console and run:

# vmkfstools -J setuuid /vmfs/volumes/datastore/VM/vm.vmdk

Backup mount uses the same LUN cloning feature. LUN is cloned from a snapshot and is connected as a datastore. After an unmount LUN clone is destroyed.

Some Notes

VSC doesn’t do a good cleanup after a restore. As part of the LUN mapping to the ESXi hosts, VSC creates new igroups on the NetApp filer, which it doesn’t delete after the restore is completed.

What’s more interesting, when you restore a VM, VSC deletes .vmdks of the old VM, but leaves all the other files: .vmx, .log, .nvram, etc. in place. Instead of completely substituting VM’s folder, it creates a new folder vmname_1 and copies everything into it. So if you use VSC now and then, you will have these old folders left behind.

Mounting VMware Virtual Disks

June 11, 2013

H_Storage04There are millions of posts on that topic all over the Internet. Just another repetition mostly for myself.

VMware has Virtual Disk Development Kit (VDDK) which is more of an API for backup software vendors. But it includes a handy tool called vmware-mount, which gives you an ability to mount VMware virtual disks (.vmdk) from wherever you want.

Download VDDK from VMware site. It’s free. And then run vmware-mount with the following keys:

> vmware-mount driveletter: “[vmfs_datastore] vmname/diskname.vmdk” /i:”datacentername/vm/vmname” /h:vcname /u:username /s:password

Choose drive letter, specify vmdk path, inventory path to VM (put ‘vm’ in lowercase between datacenter and vm name, upper case will give you an error) and vCenter or ESXi host name.

Note however, that you can mount only vmdks from powered off VMs. But there is a workaround. You can mount vmdk from online VMs in read-only mode if you make a VM snapshot. Then the original vmdk won’t be locked by ESXi server and you will be able to mount it.

To unmount a vmdk run:

> vmware-mount diskletter: /d

There are also several GUI tools to mount vmdks. But vmware-mount is enough for me.

NetApp thin-provisioning for VMware LUNs

May 22, 2013


LUN and Volume Thin Provisioning

I already described thin provisioning of VMware NFS volumes some time ago here. Now I want to discuss thin provisioning of LUNs.

LUNs are different from VMFS on top of NFS implementation, because LUN is an additional container inside of NetApp FlexVol. So if you’re using FC, you need to thin provision both LUN and volume:

> lun set reservation “/vol/targetvol/targetlun” disable
> vol options “targetvol” guarantee none

In fact, you can make the LUN thin and the volume thick. Then storage space that’s not used by the LUN, is returned to the volume level. But in this case it cannot be used by other volumes as a shared pool of space.

As the best practice, NetApp now recommends to set Fractional Reserve and Snap Reserve for your volumes to 0%. Don’t forget about that, if you want to save more storage space:

> vol options “targetvol” fractional_reserve 0
> snap reserve “targetvol” 0

Disable snapshots if you don’t use them:

> snap sched “targetvol” 0

It’s easy as that. Now you don’t waste your space by reserving it ahead, but use it as a shared pool of resources. But make sure to monitor aggregate free space. If you starting to run out of storage, plan purchase of new disks in advance or redistribute data between other aggregates.

Safety Features

Disabling volume level, LUN and snapshot reservations helps you to save storage space. The drawback of this approach is that you don’t have any mechanisms in place to prevent volume out-of-space situations. If you enable snapshots on the volume and they consume all the volume space, the volume goes offline. Very undesirable consequence. NetApp has two features that can serve as safety net in thin-provisioned environments: autosize and snap autodelete.

Snap autodelete automatically removes old snapshots if there is no space left inside the volume. Autosize, on the other hand, allows the volume to automatically grow to the specified limit (+20% to the volume size by default) in specified increments (5% of the volume size by the default). You can also specify what to do first autosize or snapshot autodelete by using ‘try_first’ option.

> snap autodelete “targetvol” on
> vol autosize “targetvol” on
> vol options “targetvol” try_first volume_grow

SnapMirror Considerations

If you use SnapMirroring and switch on the autosize on the source volume, then the destination volume won’t grow automatically. And SnapMirror will break the relationship if it runs out of space on the smaller destination volume. The trick here is to make the destination volume as big as the autosize limit for the source volume and thin provision the destination volume. By doing that you won’t run out of space on destination even if the source volume grows to its maximum.

Further reading

TR-3965: NetApp Thin Provisioning Deployment and Implementation Guide Data ONTAP 8.1 7-Mode

AIX at first glance

May 19, 2012

Recently I set up an AIX 5.1 on a RS/6000 box. Now, after some time working with the OS, I’d like to share my first impressions and features that distinguishes it from Linux.

FYI: Do not try to run AIX on x86, it won’t work. And it have never done. Only PowerPC and POWER RISC architectures.

System Management Services

The very first thing which may surprise you when you start a PowerPC system is absence of BIOS. PowerPC uses SMS which is an acronym for System Management Services. You enter SMS by pressing F2 during server startup. However, SMS implements same features as conventional server’s BIOS. Like configuring boot sequence, performing simple diagnostics, etc.

AIX default shell

AIX uses KornShell (ksh) by default. Bourne shell (bsh) is also available. But do not confuse it with Bourne-again shell (bash). It was developed two ears earlier (1989) than AIX 5.1 (2001), but wasn’t included. What’s interesting about ksh is that by default it works in vi editing mode. It means that initially you work in an input mode and enter commands by typing and hitting return as usual. Type ESC to enter control mode. For example type CTRL+V in control mode and you will find your ksh version. Mine is M-11/16/88f. If you type backslash (\) in control mode you will complete a file path. ksh88 shortcoming is that it doesn’t support commands completion.

System Management Interface Tool

AIX operating system is configured using the System Management Interface Tool (SMIT). It’s an equivalent of YaST in SuSE, redhat-config-* tools in Red Hat or Windows Control Panel. SMIT is very thorough configuration tool. For example, user add page consists of forty fields! SMIT has several handy functional keys. For instance, F5 sets field to the default value, using F9 you can temporarily invoke command shell, F4 generates a list if field implies it, like list of packages available to install from particular directory. Apart from that, SMIT has weird field hints: ‘-‘ says that field is numerical, ‘+’ means a list, ‘/’ is a path. Everything you do in SMIT is logged in /smit.log.

Web-based System Manager

On top of that, AIX has Web-based System Manager (WebSM) which lets you monitor your system and manage devices, backups, processes and virtually everything in your operating system. You can do that either from inside operating system itself or through standalone client which is available for Windows and Linux. To manage your AIX host via WebSM you need to have equal Manager and Remote Client versions.  To satisfy that you can download Windows version of Web-based System Manager Remote Client right from the AIX host using SCP or FTP from /usr/websm/pc_client/setup.exe. WebSM Client for AIX 5 is incompatible with Windows 7.


Object Data Manager

Feature which is unique to AIX is Object Data Manager (ODM) database, which maintains device configuration. ODM consists of Predefined Configuration Database (PCD) and Customized Configuration Database (CCD). Predefined Configuration Database keeps information on supported devices which means devices for which AIX has drivers and Customized Configuration Database hold information of devices which are currently connected to the system. Data in ODM is stored in terms of objects and their attributes. Access to ODM is implemented via special API. User can manage ODM by calling odmshow, odmadd, odmchange and odmdelete utilities. Additionally, AIX uses location codes to identify devices. Location code is effectively a path from a motherboard to a device. For example, location code of a SCSI device is in the form AB-CD-EF-G,H. Here AB is a bus type, CD – slot or adapter number, EF – connector ID, G – Control Unit Address of SCSI Device, H – Logical Unit Address of SCSI Device. I have two SCSI hard drives hdisk0 and hdisk1. For hdisk0 location code is 04-C0-00-5,0. Here 04 means PCI bus (00 – CPU bus, 01 – ISA bus, 05 – PCMCIA bus), C0 – integrated SCSI controller (A0 -ISA bus, B0 – secondary PCI bus), 00 – SCSI bus number, 5 – SCSI ID, 0 – LUN.

Logical Volume Manager

Did you know that LVM was implemented in AIX ten years earlier (1989) than in Linux (1998)? In fact, after AIX version of LVM was developed, its license was bought by HP. And only after that Heinz Mauelshagen developed Linux version with commands similar to the HP version. Windows Server platform still doesn’t have anything similar AFAIK.

Journaling File System

Another AIX achievement is JFS file system which is journaling by design. First JFS version was implemented in 1990 in AIX 3.1 Do you remeber when ext3 was developed? I believe somewhere in 2001. Journaling NTFS v3 was implemented in 2000 with Windows Server 2000. JFS file system in AIX 4.2 supported 64GB file size (it was 1996). With introduction of JFS2 in 2001, AIX 5 began to support 1TB files. Maximum file size for FAT32 was 4GB. All these facts are explainable. AIX was developed far earlier than Linux and Windows. But it’s still interesting how features firstly introduced in AIX (and other flavors of UNIX) migrate to younger OSes.

Full system recovery

Unlike Linux, AIX allows you to create full volume group backups with all logical volumes. Even in present times in Linux you work with antique tar, gzip, cpio and dd (or duplicity and bacula if you want something more sophisticated). In 2001 AIX already had savevg for backing up non-rootvg volume groups and mksysb which lets you backup rootvg along with system related data. mksysb creates installable image for full system recovery. I find these tools invaluable. I do not know of Linux alternative.

User/group administration

Additionally, AIX has several handy user administration features. For example, a user group can be either administrative or standard. If it’s administrative, then only root can add/remove users from it. If it’s standard, it means that ordinary users can administer that group. Feature I sometimes lack in Linux. Groups are configured in /etc/security/group and look like the following:

admin = true

admin = false
adms = pac,xander

Here system is an administrative group and jradmin is standard. admin field identifies group type and adms contains the list of group administartors (pac an xander). Also, in AIX you can assign portions of root authority to non-root users. There are several predefined roles, like ManageAllUsers, ManageShutdown, ManageBackupRestore, etc, defined in /etc/security/roles. Roles consist of a number of authorizations, which is a set of particular tasks that user can perform. For example, ManageAllUsers role consists of the following authorizations: UserAudit, ListAuditClasses, UserAdmin, RoleAdmin, PasswdAdmin, GroupAdmin. You can create your own roles from these authorizations. In AIX 5 Role-Based Access Control (RBAC) is rather primitive and restricted, but it’s better than nothing.

Error logging

And the last thing I’d like to talk about is error logging. In Linux logging is performed by syslogd, AIX has the same daemon. However, AIX error logging facility is augmented by errdemon. It is started as part of system initialization and continuously monitors /dev/error. When information is read from /dev/error errdemon checks its Error Record Template Repository /var/adm/ras/errtmplt and if it has any additional info on this error, demon writes this information into /var/adm/ras/errlog. Log is in binary format. To read it run errpt command:

errpt -a -s 0519000012

This will show you detailed information on log entries starting from 19th of May 2012 00:00 a.m.


My first experience working with AIX (even with such an outdated version) makes me think of it as a sophisticated and very well written operating system. Many major features were developed in AIX much earlier than in Linux and Windows and I believe it’s still true for modern AIX releases. It becomes obvious why Unix is the primary choice for many big organizations with strong IT infrastructure.

Migrating physical Linux host to the VMware ESXi

May 2, 2012

Well, perhaps the easiest way to accomplish that is using VMware Converter from the start. I believe there is a Linux version. However, I took another route. I already had an Acronis backup image. So my solution was to use this image as a source, which I fed into Windows version of VMware Converter, which in its turn converted it to VMware format and created VMware virtual machine on ESXi server automatically.

Using this simple procedure you can get a working system. Not in my case. Original OS used a software RAID of two hard drives. So I had to boot from a live CD. Then I changed fstab and GRUB’s menu.lst and set /dev/sda1 (root volume) instead of /dev/md0 and /dev/sda2 (swap) in place of /dev/md1. Additionally, I had to reinject GRUB’s boot files:

grub-install –root-directory=/media/sda1/boot /dev/sda

Then, if it’s SUSE you will have to change “resume” switch in GRUB’s boot menu line to /dev/null. Then after you boot into the system, recreate swap partition and point to it in “resume” switch. If you won’t do that, you will end up with the following error during boot process:

Kernel panic – not syncing: I/O error reading memory image

One tricky issue I had in all this story was related to kernel. As I’ve already mentioned original operating system worked on top of software RAID. And its initrd image won’t detect ordinary virtual SCSI hard drive during boot. So I had to boot from the SUSE installation CD and install standard kernel on top of original system. It solved the issue. Additionally I had to choose Russian language as primary during kernel installation, otherwise I ended up with unreadable symbols inside the system. But it’s not necessary for majority of cases.

I hope my experience will be helpful for other sysadmins.