A network storage controller is a processing system that is used to store and retrieve data on behalf of one or more hosts on a network. A storage controller operates on behalf of one or more hosts to store and manage data in a set of mass storage devices, such as magnetic or optical storage-based disks or tapes. Some storage controllers are designed to service file-level requests from hosts, as is commonly the case with file servers used in network attached storage (NAS) environments. Other storage controllers are designed to service extent-level requests from hosts, as with storage controllers used in a storage area network (SAN) environment. In this description, the term “data extent,” or simply “extent,” is henceforth used to refer to the smallest unit of user data that is independently identified and manipulated by a file system in a storage system. The term “data extent” or simply “extent” is essentially synonymous with the term “data block” or simply “block” for purposes of this description.
Still other storage controllers are capable of servicing both file-level requests and extent-level requests, as is the case with certain storage controllers made by NetApp, Inc. of Sunnyvale, Calif.
One common application of storage controllers is data replication. Mirroring is a form of replication, in which a given data set at a source is replicated “exactly” (at least insofar as its users can see) at a destination, which is often geographically remote from the source. The replica data set created at the destination is called a “mirror” of the original data set. Mirroring typically involves the use of at least two storage controllers, e.g., one at the source and another at the destination, which communicate with each other through a computer network or other type of data interconnect to create the mirror.
When replicating a data set, such as a volume, the replica usually does not need to be an exact copy of the original; however, it should be close enough in its outward appearance to its users so that it is effectively the same as the original. In many storage systems, files and directories are a major part of what a user sees when looking at a volume. However, a volume usually also has other properties that can impact replication, such as how much space it occupies. A storage administrator is often concerned with these other properties, because provisioning adequate storage capacity is one of a storage administrator's main responsibilities.
Storage efficiency techniques such as compression and data extent sharing for deduplication can enable a volume effectively to hold far more data than the space it actually uses. Unless this efficiency is preserved during replication, however, a resulting replica may inflate to an intolerably large size and may require an inordinate amount of time to transfer from source to destination. In extreme but plausible cases, it may not be possible to create a replica at all, due to such data inflation. Yet preserving storage efficiency attributes such as extent sharing and compression across replicas has proven to be a significant challenge.
Additionally, the replication systems that are typically capable of preserving storage efficiency are source-driven replication systems. Generally, source-driven replication systems are more efficient when data extent sharing because the source side can read data from its volumes in a logical manner (i.e., via one or more logical extent pointers). Conversely, the destination side may not have knowledge of the physical layout of the source-side, and thus requesting missing data extents can be a challenge.
Missing data extents can occur in various scenarios. For example, if a “Zombie” condition exists on the source side then some deleted extents are still captured in a snapshot. These deleted extents confuse the replication engine causing the replication engine not to send active (i.e., non-deleted) extents. Unfortunately, obtaining these missing data extents at the destination in a source-driven replication system has proven to be a significant challenge.