A storage system is a computer that provides storage service relating to the organization of information on writable persistent storage devices, such as memories, tapes, or disks. The storage system is commonly deployed within a storage area network (SAN) or a network attached storage (NAS) environment. When used within a NAS environment, the storage system may be embodied as a file server including an operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on, e.g., the disks. Each “on-disk” file may be implemented as a set of data structures, e.g., disk blocks (or “data blocks”), configured to store information, such as the actual data for the file. A directory, on the other hand, may be implemented as a specially formatted file in which information about other files and directories are stored. As used herein a file is defined to be any logical storage container that contains a fixed or variable amount of data storage space, and that may be allocated storage out of a larger pool of available data storage space. As such, the term file, as used herein and unless the context otherwise dictates, can also mean a container, object, or any other storage entity that does not correspond directly to a set of fixed data storage de-vices. A file system is, generally, a computer system for managing such files, including the allocation of fixed storage space to store files on a temporary or permanent basis.
As will be understood by those skilled in the art, many storage systems store a checksum value with each data block, e.g., a count of the number of set bits in the data block. In this manner, when reading the data block, the checksum may be confirmed to ensure that the data block was read correctly, such as where a newly computed checksum based on the read data matches the stored checksum. In addition, certain storage systems are configured to store context information (or “context signatures”) along with the checksum. For instance, certain storage file systems, such as a Write Anywhere File Layout (WAFL®) file system (available from Network Appliance, Inc., of Sunnyvale, Calif.), may implement various techniques to point to physical storage locations for data block access. As such, while the checksum may be used to confirm that the data within a stored data block was read correctly, the context information may be used to confirm that the data block accessed is the correct data block.
For example, context information may comprise a buffer tree identifier (“bufftree ID”) of a volume (or other storage grouping representation, as described herein or as will be understood by those skilled in the art) that wrote/stored the data block, a data ID of the data block (e.g., a pointer to the data block as used by the volume), and a write time (e.g., a “generation count”) indicating when the data block was written. For instance, when reading a block of data, a volume may confirm that the bufftree ID and the data ID of the data block context signature match the expected bufftree ID and data ID. (Currently, the write time is generally only used to confirm valid data, i.e., that the data was not written in the future.) If the context signature does not match, however, then the storage system may determine that the data block has either been incorrectly written (e.g., a “lost write”) or that the data block has been moved (“reallocated”) to a new physical location (e.g., to defragment free space by cleaning segments, etc., as will be understood by those skilled in the art). Accordingly, then, the storage system may attempt to recover the data block, that is, attempt to locate the physical storage location for the re-allocated data block, or re-construct data from parity in the case of a “lost write”.
Occasionally, a copy or “clone” of a storage volume (i.e., of a “parent” volume) may be created, such as for backup purposes, for writeable copies, etc. Clones, generally, may be initially established by sharing the underlying data blocks (and physical storage) of the parent volume, and as each of the volumes are modified, the shared data blocks may begin to diverge into data that specifically belongs to one or the other volume. An example technique that may be used to copy/clone a volume is described in commonly owned, U.S. patent application Ser. No. 10/837,254, entitled CLONING TECHNIQUE FOR EFFICIENTLY CREATING A COPY OF A VOLUME IN A STORAGE SYSTEM, filed Apr. 30, 2004 by John K. Edwards et al., now issued as U.S. Pat. No. 7,409,511 on Aug. 5, 2008, and in commonly owned, U.S. patent application Ser. No. 10/836,112, entitled WRITEABLE CLONE OF READ-ONLY VOLUME, filed Apr. 30, 2004 by Robert L. Fair et al., now issued as U.S. Pat. No. 7,334,095 on Feb. 19, 2008, the contents of both of which are hereby incorporated by reference in their entirety. Currently, clones typically inherit the bufftree ID of their parent volumes in order to be able to read data blocks that are shared with their parents without triggering a context mismatch. In particular, when reading a data block shared with a parent volume, the clone volume still confirms that the bufftree ID and the data ID of the data block context signature match the expected bufftree ID and data ID, i.e., thus the bufftree ID currently must match both the parent and the clone's bufftree ID, otherwise creating a context mismatch. (For example, this may be necessary to avoid false positives while reading shared blocks between clones and parents.)
In many situations, using the parent volume's bufftree ID as the clone volume's bufftree ID is a viable solution. However, by having multiple volumes share a bufftree ID, in particular, multiple volumes that are independently able to move, delete, and create data blocks with corresponding data IDs, the potential for overlapping data block context signatures exist (i.e., contexts with the same bufftree ID and data ID, but with different data). Specifically, this potential for overlapping may result in data ID “aliasing”, which may result in incorrectly returned (and confirmed) data. As such, where data ID aliasing has occurred, it may be impossible to detect incorrect data blocks (e.g., for lost write detection) and to correct an incorrect data block, e.g., using a reconstruction mechanism from parity; and to correctly locate blocks in the case of re-allocation.
There remains a need, therefore, for a technique that uniquely identifies block ownership among related storage volumes (e.g., among parents and clones of a storage volume hierarchy, etc.).