A storage controller is a physical processing device that is used to store and retrieve data on behalf of one or more hosts. A network storage controller can be configured (e.g., by hardware, software, firmware, or any combination thereof) to operate as a storage server that serves one or more clients on a network, to store and manage data in a set of mass storage devices, such as magnetic or optical storage-based disks, tapes, or flash memory. Storage of data in the set of mass storage devices can be implemented as one or more storage volumes defining an overall logical arrangement of disk space. Some storage servers are designed to service file-level requests from hosts, as is commonly the case with file servers used in a network attached storage (NAS) environment. Other storage servers are designed to service block-level requests from hosts, as with storage servers used in a storage area network (SAN) environment. Still other storage servers are capable of servicing both file-level requests and block-level requests, as is the case with certain storage servers made by NetApp®, Inc. of Sunnyvale, Calif., employing the Data ONTAP® storage operating system.
Large data farms including multiple storage servers, where each storage server has multiple volumes of data, can be invaluable in environments where many applications and users from multiple locations access data stored on the volumes. However, as these data farms grow larger, system throughput can be decreased when a large number of applications or users are accessing the same data set on a particular storage volume (the origin storage volume), because the overall system throughput is limited by the throughput of the storage server hosting the origin storage volume. In addition to limited throughput, overall system performance may be further limited by network latency between a client and the storage server.
One solution to these limitations has been to fully replicate the origin storage volume on other storage systems so that the data set is available in multiple locations. However, full replication of large data sets can be expensive and hard to manage. Another, more reasonable solution is to use sparse volumes to cache the most frequently or most recently used files on high performance storage systems. A sparse volume is a volume that appears to users and applications to be a replication of the origin storage volume, but does not contain all of the data from the origin storage volume.
In a conventional storage system implementing sparse volumes, the sparse volumes are a write-through cache, meaning user data is not written to the sparse volume but directly to the origin storage volume. The write-through sparse volume implementation limits the effectiveness of a sparse volume cache for write heavy data sets. While sparse volumes are a good solution for read-only or “read-mostly” data sets, high write latencies and low write throughput make sparse volumes less economical for write-heavy data sets. This is particularly true where the sparse volume and the origin storage volume are separated by a high latency wide area network.