NetApp uses Write Anywhere File Layout (WAFL) filesystem which is a key for NetApp’s efficient snapshot technology. If you’re already familiar with how snapshots are implemented in Data ONTAP, then understanding underlying mechanisms of deduplication is simple. Filer calculates hash for each data block it receives and preserves this in the form of metadata on the volume level. Then according to deduplication schedule (usually on weekends), filer runs metadata processing and for each duplicate hash changes data pointer to the original block of data.
Since for each data block filer needs to calculate hash on the fly, it has its penalty. On systems with CPU loaded by less than 50% performance impact is negligible. For heavily loaded systems, where CPU is nearly 100% busy, performance impact can be around 15%. For high-end 6000 systems penalty can jump up to 35% for random writes. Heavy sequential read operations can also suffer from deduplication, because read operations can be rerouted in random way across physical storage, depending on where the original data block actually is. In general, deduplication has low impact on system performance. But you can’t use it blindly and should keep in mind that in particular cases it can slow down your storage system.
Deduplication configuration is pretty simple. First of all, you need to activate deduplication on particular volume:
> sis on “targetvol”
If you already have data on the volume, you need to scan it. Otherwise, it won’t be deduped. It’s a common mistake. Deduplication is a low priority task, but keep in mind, however, that it can slightly impact your storage performance when done during business hours, especially if you run deduplication for several volumes simultaneously.
> sis start -s “targetvol”
To show the status of deduplication for particular volume:
> sis status “targetvol”
To see the deduplication schedule:
> sis config “targetvol”
And the most pleasant command to find out how much data you’ve saved:
> df -s “targetvol”
If you want to undo deduplication, first switch it off and then undo it using the following commands:
> sis off “targetvol”
> priv set advanced
*> sis undo “targetvol”
*> priv set
I, personally, was able to achieve 40% deduplication rate for VMware VMFS datastores, which is rather impressive, considering these were mixed OS + application data LUNs.
As a final note, I would like to point out, that deduplication is suitable only for environments with high percentage of similar data. VMware is a good example of it. You won’t get any significant deduplication ratio for swap file volumes, Exchnage mailboxes or Symantec Enterprise Vault which is already deduped.
Further reading:
TR-3958: NetApp Data Compression and Deduplication Deployment and Implementation Guide: Data ONTAP 8.1 Operating in 7-Mode