1. Technical Field
The present invention relates to data storage and retrieval generally and more particularly to a method and system of providing a write-accessible storage checkpoint.
2. Description of the Related Art
Information drives business. For businesses that increasingly depend on data and information for their day-to-day operations, unplanned downtime due to data loss or data corruption can hurt their reputations and bottom lines. Data can be corrupted or lost due to hardware and/or software failure, intentional malicious action, and/or user error. To increase data consistency and integrity and minimize the impact of data corruption and loss, a number of techniques have been developed and implemented. One such technique involves the creation of a “storage checkpoint” of a file system or file set, sometimes also referred to as a checkpoint, or file system/set checkpoint.
A storage checkpoint is a disk and I/O efficient snapshot technology for creating a consistent, stable, point-in-time view of a file system or file set. Instead of making a physically separate copy or “mirror,” a storage checkpoint identifies and maintains only changed data blocks via a copy-on-write mechanism, thus saving disk space and significantly reducing I/O overhead. Unlike a disk-based mirroring method, checkpoint technology does not require a separate storage pool. Rather, a storage checkpoint uses the free space pool of a file system for storage. Therefore, changed data blocks are maintained using the same underlying disk space. A storage checkpoint may be created based on another storage checkpoint as well as on a primary or “live” file system or file set. According to one technique, such storage checkpoints are created periodically based on a single file system or file set thus forming a storage checkpoint chain and providing a consistent image of data stored within a file system or file set at different points in time. This storage checkpoint chain may then be utilized to “rollback” the data to any instant in time represented by a storage checkpoint without requiring the storage of a complete copy of the data at each such instant.
A storage checkpoint of a primary or “live” file system or file set is generated by freezing the file system or file set for which the storage checkpoint is to be created, initializing the storage checkpoint's block map and thawing the previously frozen file system or set. A block map structure is used to provide a translation between an offset in a file and a data block on a disk. Freezing temporarily blocks all I/O operations so that current or pending I/O operations may be completed and the file system or file set is synchronized to disk.
After initializing the storage checkpoint's block map to reference data blocks of the file system or file set for which the checkpoint was created, the described file system or set is “thawed” to allow continued access. Typically, this operation is atomic, so that write ordering may be maintained. The storage checkpoint, when first created, does not contain any data blocks. Consequently, a storage checkpoint requires only enough storage initially to store its block map and may be created quickly relative to other volume management and file system operations.
FIG. 1 illustrates a primary file set and an associated storage checkpoint according to the prior art. In the embodiment of FIG. 1, a primary file set 110 including database 112 and an associated storage checkpoint 120 are depicted. Database 112 is shown as an example of a file set, although the invention can also be used for other types of file systems and files. Database 112 includes an emp.db namespace component 114 and a jun.dbf namespace component 116. As shown by arrow 117, data blocks 118A through 118E are stored within primary file set 110. In the accompanying drawing figures a series of blocks may represent a file system, a file set, or data blocks of a file system storage object (e.g., a data or “special” file, a hard or symbolic link, directory, or the like).
In this example, storage checkpoint 120 is logically identical to the primary file set 110 when storage checkpoint 120 is created, but storage checkpoint 120 does not contain actual data blocks. Storage checkpoint 120 includes database 122 having emp.db namespace component 124 and jun.dbf namespace component 126. Rather than containing a copy of the actual data, however, storage checkpoint 120 includes a reference 127 to the primary file set 110 data. One of skill in the art will recognize that reference 127 may be implemented in a variety of ways including as an array of pointers to individual data blocks within primary file set 110 or as a single pointer to a list of pointers to data blocks. Storage checkpoint 120 is created within the free space available to primary file set 110, and thereby minimizes the use of storage space.
FIGS. 2A–2C illustrates the generation of storage checkpoint(s) within a file system according to the prior art. At a first time, t0, represented by FIG. 2A, the illustrated file system includes a primary file set 200 including a plurality of data blocks 202A through 202E storing data A0 through E0, respectively, and a storage checkpoint 204 which in turn includes a plurality of references 206 (e.g., pointers, overlay extents, etc.) corresponding to data blocks 202 of primary file set 200 as shown. At a second time, t1, represented by FIG. 2B, writes of A1 and E1 are performed to data blocks 202A and 202E to update data A0 and E0 of primary file set 200. Before the blocks of data are modified however, data blocks 208A and 208E are allocated within storage checkpoint 204 and the original data, A0 and E0, are copied into corresponding newly-allocated blocks as shown. As is illustrated in FIG. 2B, data blocks 208A and 208E then exist independently, without references 206 from storage checkpoint 204 to data blocks 208A and 208E of primary file set 200.
This copy-on-write mechanism allows a storage checkpoint to preserve the image of the primary file set at the point in time when the checkpoint was made. This point-in-time image may then be reconstructed using a combination of data from the primary file set 200 and one or more storage checkpoints. As primary file set 200 continues to be updated, storage checkpoint 204 gradually will be filled with “before image” data blocks. This does not mean every update or write results in copying data to storage checkpoint 204. For example, in the embodiment depicted within FIG. 2B, subsequent updates to block 202E, now containing E1, will not trigger the copy-on-write mechanism because the original block data, E0, has already been saved. The storage checkpoint 204 accumulates these “before image” data blocks until it is removed or the next storage checkpoint is generated.
Changes to the primary file set after a subsequent storage checkpoint has been generated are copied to the subsequent storage checkpoint, ensuring that “before images” are copied only once and to the most recently generated storage checkpoint, without consuming additional I/O operations or disk space. At a third time, t2, represented by FIG. 2C, the illustrated file system includes an additional storage checkpoint 210 of primary file set 200 which in turn includes a plurality of references 212 corresponding to data blocks 202 of primary file set 200. Thereafter any changes to primary file set 200 are reflected in the most recently formed storage checkpoint 210 rather than in storage checkpoint 204. Storage checkpoint 204 and storage checkpoint 210 form a storage checkpoint “chain” representing images of primary file set 200 at each point at which a storage checkpoint was generated.
FIGS. 3A and 3B illustrate a storage checkpoint write operation according to a first prior art technique. At a first time, t0, represented by FIG. 3A, the illustrated file system includes a primary file set 300 including a plurality of data blocks 302A through 302E storing data A1, B0, C0, D1, and E3, respectively; a first storage checkpoint 304 including data blocks 306D and 306E storing data D0 and E1 and a plurality of references 308 corresponding to data blocks 302A through 302C; and a second storage checkpoint 310 including data blocks 312A and 312E storing data A0 and E0 and a plurality of references 314 corresponding to references 308 and data block 306D of storage checkpoint 304.
At a second time, t1, represented by FIG. 3B, a write of B1* is performed to the first storage checkpoint 304. Before the described write operation may be performed however, data blocks 306B and 312B must be allocated within storage checkpoints 304 and 310, respectively, the original data, B0, must be requested or “pulled” to storage checkpoint 310 and subsequently provided or “pushed” to storage checkpoint 310 from primary file set 200. Thus, a write to a target storage checkpoint (e.g., storage checkpoint 304) which is referenced by another storage checkpoint (e.g., storage checkpoint 310) in a conventional storage checkpoint chain suffers from a number of significant drawbacks. For example, each such write operation requires a read of previously-stored data (e.g., B0 of data block 302B of primary file set 300), a write of that previously-stored data to the referring storage checkpoint, and a write of the actual data to the target storage checkpoint. Write ordering or “serialization” must also be maintained between storage checkpoint writes and writes to the file system's primary file set, creating additional administrative overhead. Multiple copies of data must be simultaneously stored (e.g., data B0 within data blocks 302B and 312B) requiring additional storage resources. Additionally, any write directly to a storage checkpoint such as illustrated in FIG. 3B results in a loss of the point-in-time image of the primary file set at the time that storage checkpoint was created.
FIGS. 3C and 3D illustrate a storage checkpoint write operation according to a second prior art technique. Using the file system depicted in FIG. 3A and its accompanying description above as a reference, at an alternate second time, t1, represented by FIG. 3C an additional storage checkpoint 316 is generated based on, and includes a plurality of references 318 to, storage checkpoint 304. At a time t2, represented by FIG. 3D, a write of B1* is performed to storage checkpoint 316, rather than to storage checkpoint 304 as described with respect to FIG. 3B. While the alternative prior art technique illustrated in FIGS. 3C and 3D preserves the point-in-time image of the primary file set at the time storage checkpoint 304 was created, unlike the technique described with respect to FIGS. 3A and 3B, it nevertheless suffers from all of that technique's other described drawbacks.