In my previous post I gave a high level overview of how NetApp uses NVRAM for write caching. Now I want to move on to read caching in main memory and NetApp Flash Cache/Flash Pool features.
Layers of Memory
The first layer of read caching in NetApp is main memory. For example, there is 4GB of ECC main memory in FAS3140 as opposed to 512MB of NVRAM. It’s 12GB for FAS3220. You can check amount of main memory in your filer by running:
> sysconfig -a
But if you have a random read intensive environment and the size of main memory is too small for your workloads, instead of buying additional spindles you can consider using Flash Cache or Flash Pool features.
Flash Cache (formerly PAM – Performance Acceleration Module) is a PCIe card with flash memory chips on board. The most recent Flash Cache II modules have 2TB of flash. And you can fit as many as 4 such cards in high-end 6xxx NetApp series (or 8 x 1TB cards).
Flash Pool is basically a SSD drive Raid Group which is combined with HDD Raid Groups in the same aggregate to provide caching capabilities. Data is copied (not moved) from HDDs to SSDs to give faster access to more frequently used (hot) data blocks. Both Flash Cash and Flash Pool use FIFO logic to eject less frequently used (cold) data from cache.
Flash Cache is a second level of read cache memory. When filer decides to evict cached data from main memory, it’s actually moved to a Flash Cash card. Similarly, when client needs to read a data block and filer doesn’t have the data in main memory it now first looks up in Flash Cache and if it’s not there, data is retrieved from disk.
Flash Cache can operate in three modes: Metadata Caching, Normal User Data Caching (default) and Low-Priority Data Caching. First mode caches only metadata, second caches metadata and data blocks and the last one lets you cache data which is not normally cached, which is writes and sequential reads.
In fact, when write request comes into the system, it’s actually cached in main memory first and then logged in NVRAM. When CP occurs, data is sent to hard drives and becomes a first target for eviction from main memory after that. If you enable Low-Priority Data Caching this data goes to Flash Cash card instead. It’s not write caching per se, because writes have been already sent to disks. But it’s helpful in workloads, when data which has just been written may need to be accessed again in a short period of time. It’s called read-write caching.
Caching sequential reads is generally not a good idea, because they overwrite large amounts of cache. But if your environment may benefit from that, you again can use Low-Priority Data Caching option.
Flash Pool
Flash Pool has one significant difference from Flash Cache. It works at an aggregate level, not at system level as Flash Cache. If you have only one stack of shelves and one aggregate, it makes no difference. But it’s almost always not the case.
Read Caching
Flash Pool uses essentially the same mechanism for read caching. When data is first accessed it goes to main memory. When filer needs to free up some space in main memory, blocks are moved to SSD disks as part of a Consistency Point.
NetApp uses scanner to evict blocks from SSD cache (see figure above). When cache gets full, scanner kicks in and reduces each block’s temperature by one level. Blocks with the lowest temperature are evicted. Each time block is accessed by client it’s temperature is incremented.
Write caching
Flash Pool can be used for write caching of partial overwrites, in contrast to Flash Cache which is purely a read cache.
WAFL is optimized for writes, because filer can put data at any place in file system. And when new data comes in, filer makes a so called “full write”, which writes a full stripe of data and parity. But when part of stripe needs to be overwritten, all other data blocks from the stripe need to be read from disk to recalculate parity. It’s a very expensive operation. Flash Pool can be used to cache partial overwrites and even better optimize performance.
If write caching is enabled, instead of going to HDDs this data is written to SSD disks as part of a Consistency Point. And unlike read caching it exists temporarily only on SSDs.
After each scanner run, write cache block temperature is decremented by one. When write is overwritten, it’s temperature gets back to normal, but it can’t go higher than that. When block is about to be evicted, it’s read back to main memory and then written to HDDs as part of next CP.
Policies
Flash Pool read/write policies are almost the same as Flash Cache ones. Read policies are: meta, random read (default), random-read-write, none. Write policies: random-write, none (default).
Notes
Flash Pool and Flash Cache can be combined in one system and configured at per volume level. But Flash Cache can’t be used for volumes which are already being cached by Flash Pool. It’s either-or.
NetApp filers have Predictive Cache Statistics (PCS) feature built in, which allows you to analyse your workload and predict if storage system will benefit from additional cache.
Further Reading
TR-3832: Flash Cache Best Practices Guide
TR-3801: Introduction to Predictive Cache Statistics