A storage system typically comprises one or more storage devices into which information may be entered, and from which information may be obtained, as desired. The storage system includes a storage operating system that functionally organizes the system by, inter alia, invoking storage operations in support of a storage service implemented by the system. The storage system may be implemented in accordance with a variety of storage architectures including, but not limited to, a network-attached storage environment, a storage area network and a disk assembly directly attached to a client or host computer. The storage devices are typically disk drives organized as a disk array, wherein the term “disk” commonly describes a self-contained rotating magnetic media storage device. The term disk in this context is synonymous with hard disk drive (HDD) or direct access storage device (DASD).
Storage of information on the disk array is preferably implemented as one or more storage “volumes” of physical disks, defining an overall logical arrangement of disk space. The disks within a volume are typically organized as one or more groups, wherein each group may be operated as a Redundant Array of Independent (or Inexpensive) Disks (RAID). Most RAID implementations enhance the reliability/integrity of data storage through the redundant writing of data “stripes” across a given number of physical disks in the RAID group, and the appropriate storing of redundant information (parity) with respect to the striped data. The physical disks of each RAID group may include disks configured to store striped data (i.e., data disks) and disks configured to store parity for the data (i.e., parity disks). The parity may thereafter be retrieved to enable recovery of data lost when a disk fails. The term “RAID” and its various implementations are well-known and disclosed in A Case for Redundant Arrays of Inexpensive Disks (RAID), by D. A. Patterson, G. A. Gibson and R. H. Katz, Proceedings of the International Conference on Management of Data (SIGMOD), June 1988.
The storage operating system of the storage system may implement a high-level module, such as a file system, to logically organize the information stored on the disks as a hierarchical structure of directories, files and blocks. For example, each “on-disk” file may be implemented as set of data structures, i.e., disk blocks, configured to store information, such as the actual data for the file. These data blocks are organized within a volume block number (vbn) space that is maintained by the file system. The file system may also assign each data block in the file a corresponding “file offset” or file block number (fbn). The file system typically assigns sequences of fbns on a per-file basis, whereas vbns are assigned over a larger volume address space. The file system organizes the data blocks within the vbn space as a “logical volume”; each logical volume may be, although is not necessarily, associated with its own file system. The file system typically consists of a contiguous range of vbns from zero to n, for a file system of size n−1 blocks.
A known type of file system is a write-anywhere file system that does not overwrite data on disks. If a data block is retrieved (read) from disk into a memory of the storage system and “dirtied” (i.e., updated or modified) with new data, the data block is thereafter stored (written) to a new location on disk to optimize write performance. A write-anywhere file system may initially assume an optimal layout such that the data is substantially contiguously arranged on disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks. An example of a write-anywhere file system that is configured to operate on a storage system is the Write Anywhere File Layout (WAFL™) file system available from Network Appliance, Inc., Sunnyvale, Calif.
The storage operating system may further implement a storage module, such as a RAID system, that manages the storage and retrieval of the information to and from the disks in accordance with input/output (I/O) operations. The RAID system is also responsible for parity operations in the storage system. Note that the file system only “sees” the data disks within its vbn space; the parity disks are “hidden” from the file system and, thus, are only visible to the RAID system. The RAID system typically organizes the RAID groups into one large “physical” disk (i.e., a physical volume), such that the disk blocks are concatenated across all disks of all RAID groups. The logical volume maintained by the file system is then “disposed over” (spread over) the physical volume maintained by the RAID system.
A file system layout may apportion an underlying physical volume into one or more virtual volumes (vvols) of a storage system. An example of such a file system layout is described in U.S. patent application Ser. No. 10/836,817 titled EXTENSION OF WRITE ANYWHERE FILE SYSTEM LAYOUT, by John K. Edwards et al., now issued as U.S. Pat. No. 7,409,494 on Aug. 5, 2008 and assigned to Network Appliance, Inc. The underlying physical volume is an aggregate comprising one or more groups of disks, such as RAID groups, of the node. The aggregate has its own physical volume block number (pvbn) space and maintains metadata, such as block allocation structures, within that pvbn space. Each vvol has its own virtual volume block number (vvbn) space and maintains metadata, such as block allocation structures, within that vvbn space. Each vvol is a file system that is associated with a container file; the container file is a file in the aggregate that contains all blocks used by the vvol. Moreover, each vvol comprises data blocks and indirect blocks that contain block pointers that point at either other indirect blocks or data blocks.
File systems may incorporate a cloning technique that enables efficient and substantially instantaneous creation of a clone that is a writable copy of a “parent” virtual volume (vvol) in an aggregate of a storage system. An example of such a cloning technique is described in the above-incorporated U.S. patent application entitled CLONING TECHNIQUE FOR EFFICIENTLY CREATING A COPY OF A VOLUME IN A STORAGE SYSTEM. The clone is instantiated by, e.g., loading a file system associated with the new vvol onto the clone and bringing the clone “online”, with the only blocks owned by the clone comprising its modified volinfo block. The file system executes on the clone as it would on a typical vvol, such as the parent vvol. In fact, the file system within the clone resembles the file system within a base snapshot, since they comprise substantially the same blocks on disk. The resulting clone is thus a “full-fledged” vvol, i.e., it can service storage (read and write) requests and has its own logical properties. As a result, the cloning technique enables the clone and parent vvol to share on-disk blocks of data in a zero-copy fashion, while also allowing for modifications.
As can be appreciated, the splitting of blocks shared between a clone and its parent may consume substantial disk storage space. Consequently an administrator must “manually” estimate the amount of storage space required to perform a clone splitting operation. In known implementations, the administrator may then initiate an online clone splitting operation and wait until the clone splitting operation substantially completes to determine whether it succeeded or failed due to a lack of disk space. This presents a noted disadvantage in that the time required to perform the clone splitting operation may be on the order of hours, thereby delaying results (i.e., success) of the operation for that length of time. An alternative, “brute force” technique for determining the amount of space required for a clone splitting operation is to determine, for each block in the clone, whether it is located in the clone's parent (or parent's parent, etc) or within the clone's container file. As a clone may be on the order of gigabytes or terabytes in size, the time required to make this determination is substantial.