The continued increase in data storage has been accompanied by an increasing need to have an accurate record of the state of particular data stores at specified times. A snapshot is a point-in-time image of a given data store. Snapshots may be created to effect recovery of data upon a catastrophic failure or to maintain a record of the state of the data at given times. Typical data storage systems may have a capacity of a terabyte (TB) or more. Such storage may be organized as a number of storage units of more practical size known as virtual logical units (VLUs). VLUs have their own well-defined virtual block address (VBA) space, and typically range in size upward from several hundred megabytes (MB). A snapshot may be created for an original VLU (parent VLU) at a user-specified time. The snapshot VLU (child VLU) then contains an exact copy of the parent VLU at the specified time. This child VLU can be accessed and modified just like any other VLU.
A basic approach to creating a snapshot is to make an actual copy of the entire VLU. For example, upon receiving a command to snapshot a VLU, all new data access requests (I/O requests—READs and WRITEs) to that VLU are halted, a child VLU of the same size is created, and the entire content of the parent VLU is copied into the child VLU. Both VLUs are then available to the user. Copying the contents of one VLU to another to create a snapshot is both time-consuming and an inefficient use of storage space. For example, a 1 TB VLU may require several hours or even days to completely copy, during which time the parent VLU is unavailable for data access. Moreover, the storage space required for the child VLU is equal to the size of the parent VLU.
Another typical approach is the “copy-on-write” approach, wherein data is not copied immediately when the snapshot command is received. Rather, a new VLU is created without actually allocating to it a full amount of storage space (i.e., an amount of storage space that is equivalent to the size of the parent VLU). In such a system, when a WRITE operation is received, the system first checks to see if the requested data block has already been copied into the child VLU. If the block has not yet been copied to the child VLU, the system explicitly makes the copy before allowing the requested operation to be serviced. A bitmap may be used to keep track of the data blocks that have been copied. A variant of this approach is for the system to initiate a background copying operation when the snapshot command is received without stopping the processing of new data access requests. This approach alleviates the problem of the VLU being inaccessible for long periods, but is still space inefficient.
A typical data storage system contains an array of disk drives, a controller for controlling access to the disk array, and a cache memory for storing recently accessed data so as to provide quick access to data that is likely to be accessed in the near-term without having to access the disk on every occasion. Since a particular file or block of data may be located on the disk or in the cache, the storage device typically includes metadata (MD) that registers all data blocks currently in the cache and, therefore, indicates whether a data block is on the disk or stored in cache. If the data block is in the cache, the MD indicates where the data block is stored in the cache. The MD may also indicate the current state of the data block (e.g., whether or not it has been “flushed” to disk). For such a system, another typical approach to creating a snapshot is to create a copy of the MD of the parent VLU when the snapshot command is received. The new copy of the MD is then assigned to the child VLU. With this approach, data access to the parent VLU is interrupted only long enough to make a copy of the MD. That is, because both copies of the MD point to the same data, the child VLU presents an image that is identical to the parent VLU immediately after the MD is copied. Thus, both the parent VLU and the child VLU can be made available to the user as soon as the MD is copied. Subsequently, if a WRITE is received for either VLU, the system checks to see if the MD of the child VLU and the MD of the parent VLU for the corresponding VBA are still pointing to the same data blocks. If not, the WRITE operation proceeds normally. Otherwise, a copy of the data block involved is made, and linked into the metadata for the child VLU before the WRITE operation is permitted to proceed. A bitmap or scoreboard may be used to keep track of the blocks that have been copied. Alternatively, the MD need not be entirely copied when the snapshot command is received. Instead, space for the MD and the bitmap is allocated, but left empty. A cleared “copied” bit implicitly indicates that a corresponding MD entry in the child VLU is identical to that in the parent VLU. An MD entry for the child VLU is filled in after the corresponding data block is copied. With such an approach, the time during which data access is interrupted is reduced because a relatively small amount of information (i.e., the MD) is copied before the VLUs are made available to the user again. Copying only the MD also has the advantage of needing only as much new disk storage space as the amount of changes made to the VLUs after the snapshot is created.
These solutions are quite efficient when there are a small number of snapshots in the system, but less so when multiple READ-WRITE-enabled snapshots are taken. This is frequently the case for cascaded snapshots of an original VLU. Cascaded snapshots are a succession of snapshots where each subsequent snapshot is a point-in-time copy of the preceding snapshot. FIG. 1 illustrates an example of a succession of cascaded snapshots in accordance with the prior art. In FIG. 1, VLU1 is a snapshot of VLU0 created at 9:00 A.M. VLU3 is a snapshot of VLU1 created at 2:00 P.M. Note VLU3 is a snapshot of a snapshot (i.e., VLU1) as opposed to VLU2, which is a later snapshot of the original VLU (i.e., VLU0). A subsequent snapshot VLU created from the same parent VLU is referred to as a sibling VLU. For example, VLU1 and VLU2 are siblings. To continue, VLU4 is a snapshot of VLU3 created at 4:00 P.M. VLUs VLU0, VLU1, VLU3 and VLU4 form a snapshot cascade. Cascaded snapshots are often employed in particular data-use situations that share common characteristics. For example, consider a complex computation or simulation application that takes a long time (e.g., 10 days) to run to completion. At some point it may be desirable to have the simulation branch in several different directions with small variations in one or more parameters. Subsequently it may be desirable to alter different parameters in each of the branches. For these applications, the subsequent child snapshots (i.e., cascaded snapshots) may often be used as original VLUs, on which more simulations are run. In such cases the data access pattern will likely be a mix of READs and WRITEs, and the performance of the active VLU and its children become equally important. The method of creating snapshots should avoid penalizing one or the other. That is, since each VLU may be actively used (receive READs and WRITEs), the performance of the parent and child(ren) VLUs should be optimized, while optimizing capacity is not as critical since there will be changes to the data of the VLUs anyway.