1. Technical Field
The present invention relates to data storage and retrieval generally and more particularly to a method and system of providing replica files within a fileset.
2. Description of the Related Art
Information drives business. Companies today rely to an unprecedented extent on online, frequently accessed, constantly changing data to run their businesses. Duplicate copies as well as slightly altered versions or editions of such data are commonly made and maintained, for example, to facilitate independent and simultaneous data access by a number of users and/or processes. Providing such copies or versions has until recently required data to be completely replicated for each copy or version made, often resulting in wasted storage space, particularly for infrequently-accessed and/or modified data. Alternatively, links (e.g., hard links or symbolic links) may be used to reference a single data image. Use of such links fails however to provide an exclusive copy of the data to each user or process. In other words, a change or write operation made by one user or process is immediately visible to all users or processes with an associated loss of prior or original data.
More recently, methods (e.g., storage checkpoints) have become available for providing persistent frozen or “point-in-time” images of data using copy-on-write technology. In a storage checkpoint, only that data which has been modified or written-to in a checkpointed data image following the storage checkpoint's creation is stored. In conventional data storage and/or processing systems however, such methods have been used primarily to facilitate off-host processing such as data backups and consequently, storage checkpointing or “cloning” has been typically performed on entire filesets as this is the most common level for performing backups, where a “fileset” is included in a conventional file system such as a Unix File System (UFS), NT File System (NTFS), File Allocation Table (FAT)-based file system or the like within a virtual file system (e.g., the Veritas File System provided by Veritas Software Corporation of Mountain View, Calif.) or a conventional, independent file system where no such virtual file system is present.
FIG. 1 illustrates a file system including a fileset and an associated storage checkpoint according to the prior art. File system 100 of FIG. 1 includes a primary fileset 110 and an associated storage checkpoint 120. Primary fileset 110 in turn includes namespace components (e.g., files, directories, etc.) 112, 114, and 116 and storage checkpoint 120 includes corresponding namespace components 122, 124, and 126. Although namespace components 112-116 are arranged in a hierarchical manner in the file system of FIG. 1, where namespace component 112 references namespace components 114 and 116, any number of namespace components arranged in any number of configurations could just as easily be implemented according to the prior art. In the illustrated prior art file system, namespace component 114 in turn references or is associated with data within data blocks 115A-115D and namespace component 116 references or is associated with data within data blocks 117A-117D.
In the illustrated example, storage checkpoint 120 is a so-called “virtual copy”, logically identical to primary fileset 110 when storage checkpoint 120 is created, but lacks any associated allocated data blocks and initially stores no data. Instead, storage checkpoint 120 initially includes only namespace components 122-126 within a hierarchical directory structure identical to that of primary fileset 110 and associated references (e.g., pointers) to data blocks (e.g., data blocks 115 and 117) associated with the primary fileset 110. One of skill in the art will recognize that such references may be implemented as one or more arrays of pointers to individual data blocks associated with primary fileset 110 or as one or more single pointers to a list of pointers to such data blocks. Storage checkpoint 120 is typically created within the free space available to primary fileset 110, and thereby minimizes the use of storage space. The following example describes creation and use of a storage checkpoint such as storage checkpoint 120 in further detail.
FIGS. 2A-2B illustrate a write operation within a file system including a fileset and an associated storage checkpoint according to the prior art. The file system 200 illustrated in FIG. 2A includes a primary fileset 210 and a storage checkpoint 220 similar to those described with respect to FIG. 1. Primary fileset 210 includes files 212 and 214 which are associated with data blocks 213A-E and 215A-E, respectively, utilizing one or more associated metadata file system objects (e.g., file indices, i-nodes, etc.) each including one or more associated data block maps. In the exemplary file system of FIG. 2A, file 214 contains/references a replicated copy of the same data contained/referenced by file 212 such that two distinct copies of the data are maintained. Blocks such as data blocks 213 and 215 may include file data directly, or may otherwise identify or reference (e.g., using disk block numbers) the actual data blocks containing such data.
Storage checkpoint 220 of FIGS. 2a and 2b includes sparse files 222 and 224 corresponding to files 212 and 214 of primary fileset 210 each in turn comprising one or more references (e.g., pointers) to data blocks associated with the primary fileset 210 and/or allocated blocks associated with persistent storage. It should be noted that there is a one to one correspondence between files within the primary fileset and sparse files or similar namespace components within storage checkpoint 220. The blockmaps of sparse files 222 and 224 may initially include references 223A-223E and 225A-225E, as illustrated, respectively corresponding to each block (e.g., data blocks 213 and 215) of files 212 and 214. In FIG. 2B, the file system of FIG. 2A is illustrated following write operations to data blocks 213B, 213E, and 215C of primary fileset 210. In this example, prior to performing any write operation(s), data block(s) are allocated as needed within sparse files 222 and 224 to store the prior contents of data blocks 213B, 213E, and 215C of files 212 and 214.
After the prior data contents have been stored (e.g., “pushed” or “pulled”) within the newly allocated data blocks 223B, 223E, and 225C of sparse files 222 and 224, new data may be written to blocks 213B, 213E, and 215C of primary fileset 210 and the write operation may be completed. This process may be continued as additional write operations are performed on files 212 and 214 until all of the original data contents of the files has been transferred such that sparse files 222 and 224 become complete and independent files themselves.