Posts Tagged ‘log’

NetApp NVRAM and Write Caching

July 19, 2013


NetApp storage systems use several types of memory for data caching. Non-volatile battery-backed memory (NVRAM) is used for write caching (whereas main memory and flash memory in forms of either extension PCIe card or SSD drives is used for read caching). Before going to hard drives all writes are cached in NVRAM. NVRAM memory is split in half and each time 50% of NVRAM gets full, writes are being cached to the second half, while the first half is being written to disks. If during 10 seconds interval NVRAM doesn’t get full, it is forced to flush by a system timer.

To be more precise, when data block comes into NetApp it’s actually written to main memory and then journaled in NVRAM. NVRAM here serves as a backup, in case filer fails. When data has been written to disks as part of so called Consistency Point (CP), write blocks which were cached in main memory become the first target to be evicted and replaced by other data.

Caching Approach

NetApp is frequently criticized for small amounts of write cache. For example FAS3140 has only 512MB of NVRAM, FAS3220 has a bit more 1,6GB. In mirrored HA or MetroCluster configurations NVRAM is mirrored via NVRAM interconnect adapter. Half of the NVRAM is used for local operations and another half for the partner’s. In this case the amount of write cache becomes even smaller. In FAS32xx series NVRAM has been integrated into main memory and is now called NVMEM. You can check the amount of NVRAM/NVMEM in your filer by running:

> sysconfig -a

The are two answers to the question why NetApp includes less cache in their controllers. The first one is given in white paper called “Optimizing Storage Performance and Cost with Intelligent Caching“. It states that NetApp uses different approach to write caching, compared to other vendors. Most often when data block comes in, cache is used to keep the 8KB data block, as well as 8KB inode and 8KB indirect block for large files. This way, write cache can be thought as part of the physical file system, because it mimics its structure. NetApp on the other hand uses journaling approach. When data block is received by the filer, 8KB data block is cached along with 120B header. Header contains all the information needed to replay the operation. After each cache flush Consistency Point (CP) is created, which is a special type of consistent file system snapshot. If controller fails, the only thing which needs to be done is reverting file system to the latest consistency point and replaying the log.

But this white paper was written in 2010. And cache journaling is not a feature unique to NetApp. Many vendors are now using it. The other answer, which makes more sense, was found on one of the toaster mailing list archives here: NVRAM weirdness (UNCLASSIFIED). I’ll just quote the answer:

The reason it’s so small compared to most arrays is because of WAFL. We don’t need that much NVRAM because when writes happen, ONTAP writes out single complete RAID stripes and calculates parity in memory. If there was a need to do lots of reads to regenerate parity, then we’d have to increase the NVRAM more to smooth out performance.

NVLOG Shipping

A feature called NVLOG shipping is an integral part of sync and semi-sync SnapMirror. NVLOG shipping is simply a transfer of NVRAM writes from the primary to a secondary storage system.  Writes on primary cannot be transferred directly to NVRAM of the secondary system, because in contrast to mirrored HA and MetroCluster, SnapMirror doesn’t have any hardware implementation of the NVRAM mirroring. That’s why the stream of data is firstly written to the special files on the volume’s parent aggregate on the secondary system and then are read to the NVRAM.


Documents I found useful:

WP-7107: Optimizing Storage Performance and Cost with Intelligent Caching

TR-3326: 7-Mode SnapMirror Sync and SnapMirror Semi-Sync Overview and Design Considerations

TR-3548: Best Practices for MetroCluster Design and Implementation

United States Patent 7730153: Efficient use of NVRAM during takeover in a node cluster


DFS Replication Troubleshooting

June 25, 2013

conceptual 3d rendered image of arrow isolated on whiteDFS Replication service doesn’t give you much information on how it’s replicating. It’s good to know some general commands to troubleshoot communication and data transfer issues.

Useful Commands

In Windows Server 2008 a new command was introduced to check what DFSR is doing at the moment. You won’t find it in Windows Server 2003:

> dfsrdiag replicationstate

If replication link isn’t feeling well you get lots of files in the backlog. To check if you have a backlog, run:

> dfsrdiag backlog /rgname:rgroup_name /rfname:folder_name /sendingmember:sending_server /receivingmember:receiving_server

If there are heaps of files in the backlog the best way to find the reason for it is to simply check the logs. DFSR logs are located in C:\Windows\debug. To get the most verbose information change the log severity level:

> wmic /namespace:\\root\microsoftdfs path dfsrmachineconfig set debuglogseverity=5

DFSR uses GUIDs to identify the replicated files, which look like: AC759213-00AF-4578-9C6E-EA0764FDC9AC. To get the meaningful data from the GUID use:

> dfsrdiag guid2name /guid:guid_identifier /rgname:group_name

There is one more command which allows you to find the exact path to the file in question. You should feed the uid field from the DFSR debug log to this command, which looks like {9EBE0A27-8AA9-4263-B942-DA9A92F30671}-v240880:

> wmic.exe /namespace:\\root\microsoftdfs path dfsridrecordinfo.Uid=”uid_identifier” call getfullfilepath

Sample Errors

1. When replicating between Windows Server 2008 R2 and Windows Server 2003 R2. On the source: “Ghosting is not enabled”. On the destination: “A failure was reported by the remote partner”.

I solved this error by applying the following patch: KB2462352. The reason for the issue is incompatibilities between protocol implementations.

2. The following error pops up in logs: “The system cannot find the file specified”.

Solution is described in KB951010. In Windows Server 2003 ConflictAndDeleted folder sometimes fills up above the 660MB quota and ConflictAndDeletedManifest.xml file may get corrupted. To solve the problem you need to cleanup the folder and delete the file by issuing:

> wmic /namespace:\\root\microsoftdfs path dfsrreplicatedfolderinfo where “replicatedfolderguid='<GUID>'” call cleanupconflictdirectory

To get the GUIDs of replicated folders run:

> wmic /namespace:\\root\microsoftdfs path dfsrreplicatedfolderconfig get replicatedfolderguid,replicatedfoldername

3. Near 100% CPU usage and the same error is written millions of times in the log files: “Failed to create stage file for GVSN gvsn_identitifer”.

I solved this issue by looking for the file specified by gvsn_identifier, which looks like {2ED37126-12C7-4617-AE6B-34509F467FEB}-v20748 and deleting it. These are files that are located in the staging folder.

Other Hepful Tools

You can create a Health Report from the DFS Management Console to see how many files have been transfered between replication members since the DFS service start. And if there are any DFS errors in the members’ event logs.

You can also use DFSRMon tool. But I personally don’t find it very useful.