This invention relates to data storage in a computerized storage unit, such as a storage array in a storage area network (SAN). More particularly, the present invention relates to management of stored data in the storage unit using xe2x80x9csnapshotxe2x80x9d or xe2x80x9ccheckpointxe2x80x9d copies of the data with multiple images of the data contained in a single snapshot repository.
Current high-capacity computerized data storage systems typically involve a storage area network (SAN) within which one or more storage arrays store data on behalf of one or more host devices, which in turn typically service data storage requirements of several client devices. Within such a storage system, various techniques are employed to make an image or copy of the data. One such technique involves the making of xe2x80x9csnapshotxe2x80x9d copies of volumes of data within the storage arrays without taking the original data xe2x80x9coffline,xe2x80x9d or making the data temporarily unavailable. Generally, a snapshot volume represents the state of the original, or base, volume at a particular point in time. Thus, the snapshot volume is said to contain a copy or picture, i.e. xe2x80x9csnapshot,xe2x80x9d of the base volume.
Snapshot volumes are formed to preserve the state of the base volume for various purposes. For example, daily snapshot volumes may be formed in order to show and compare daily changes to the data. Also, a business or enterprise may want to upgrade its software that uses the base volume from an old version of the software to a new version. Before making the upgrade, however, the user, or operator, of the software can form a snapshot volume of the base volume and concurrently run the new untested version of the software on the snapshot volume and the older known stable version of the software on the base volume. The user can then compare the results of both versions, thereby testing the new version for errors and efficiency before actually switching to using the new version of the software with the base volume. Also, the user can make a snapshot volume from the base volume in order to run the data in the snapshot volume through various different scenarios (e.g. financial data manipulated according to various different economic scenarios) without changing or corrupting the original data in the base volume. Additionally, backup volumes (e.g. tape backups) of the base volume can be formed from a snapshot volume of the base volume, so that the base volume does not have to be taken offline, or made unavailable, for an extended period of time to perform the backup, since the formation of the snapshot volume takes considerably less time than does the formation of the backup volume.
Whereas a backup volume of the base volume contains a complete copy of the data in the base volume, the snapshot volume does not actually require a separate complete copy of the data. Instead, the snapshot volume maintains a xe2x80x9crepositoryxe2x80x9d (i.e. volume of data storage space in the storage array) that contains only those blocks of the original data that have been changed in the base volume since the point in time at which the snapshot volume was formed. Those data blocks that have not been changed are not copied to the snapshot repository, but remain in the base volume. The snapshot volume, therefore, does not contain any data, but rather relies on the relevant data blocks in the base volume and the snapshot repository to contain the data. Thus, at the moment that the snapshot volume is created, and before any of the data blocks in the base volume have been changed, all of the data for the snapshot volume is in the base volume. On the other hand, after the snapshot volume has been in existence for a while, and if all of the data blocks have been changed in one way or another in the base volume, then all of the data for the snapshot volume is in the snapshot repository. Most likely, however, at any given time after the formation of the snapshot volume, some of the data for the snapshot volume is in the base volume and the remainder of the data is in the snapshot repository.
The first time that data is written to a data block in the base volume after forming a snapshot volume, a copy-on-write procedure is performed to copy the original data block from the base volume to the snapshot repository before writing the new data to the base volume. Afterwards, it is not necessary to copy the data block to the snapshot volume upon subsequent writes to the same data block in the base volume.
Data may also sometimes be written to the repository of the snapshot volume, such as when testing a new version of software or developing scenarios, as described above. Some snapshot volumes, however, are write-protected, so the data in their repositories cannot be changed. Such write-protected snapshot volumes include those used for the limited purpose of serving as a known stable state of the base volume to which the base volume can be restored if the base volume becomes corrupted or invalid. The point at which the known stable state is formed is referred to herein as a xe2x80x9ccheckpoint,xe2x80x9d and the known stable state snapshot volume is referred to herein as a xe2x80x9ccheckpoint volume.xe2x80x9d By forming multiple checkpoint volumes at periodic intervals, the base volume can be restored, or xe2x80x9crolled back,xe2x80x9d to one of the known stable states represented by the snapshot or checkpoint volume that is considered to have the best valid data.
When a new snapshot volume (including a new checkpoint volume) is formed, a new repository volume must also be formed. When multiple snapshot volumes have been formed, with every write procedure to a previously unchanged data block of the base volume, a copy-on-write procedure must occur for every affected snapshot volume to copy the prior data from the base volume to each of the repository volumes. Therefore, with several snapshot volumes, the copying process can take up a considerable amount of the storage array""s processing time, and the snapshot volumes can take up a considerable amount of the storage array""s storage capacity.
It is with respect to these and other background considerations that the present invention has evolved.
An improvement of the present invention is that a single snapshot repository can contain multiple xe2x80x9cpoint-in-time imagesxe2x80x9d of the data from the base volume from which the snapshot volume or checkpoint volume is formed. Each image, in combination with relevant data blocks in any later-created images, represents the state of the data of the base volume at the point in time at which the image was formed. When a data block in the base volume is changed, the previous data in the data block is copied only to the most recently created image in the repository. Therefore, rather than including duplicate copies of data blocks, each image includes only the data blocks that were copied up until the time at which the next image was created. When performing a data write function to the base volume or the snapshot volume, only the newest image needs to be searched to determine whether the data block is present in the snapshot repository. When performing a data read function on the snapshot volume, the first image is searched and if the data block is not found, then the next-created image is searched, and so on until the last-created image. Additionally, only one copy-on-write procedure is performed to copy a data block for multiple checkpoint volumes that use a single repository. Therefore, the present invention reduces the average amount of processing time required to copy data for a snapshot or checkpoint volume and reduces the amount of storage space needed to form the snapshot repositories.
The present invention also reduces the complexity of managing the data blocks in the repository. The data blocks in one image are written sequentially into contiguous storage space in the repository. The data blocks are also usually used as xe2x80x9cread-onlyxe2x80x9d data blocks, particularly for the checkpoint volumes, so the data blocks are rarely changed. Therefore, it is not necessary for storage management software to maintain a detailed accounting of each data block. Instead, the storage management software keeps track of each image, typically a larger amount of storage space than the data blocks requiring less overhead for management, while the individual data blocks are maintained more simply in a sequential manner.
Each snapshot or checkpoint volume corresponds to one or more of the images in the snapshot repository. Since the checkpoint volume is a simplified special case snapshot volume (which cannot be written to and is saved for the sole purpose of maintaining a known stable state to which the base volume can be returned when necessary), more than one checkpoint volume can typically use the same repository and the images therein. However, for the snapshot volumes for which the data can be written or changed, it is preferable to limit the snapshot volumes to one snapshot volume per snapshot repository to avoid complications that may arise when using and changing the data. These snapshot volumes, however, still use more than one image in the snapshot repository.
When a checkpoint volume is formed, a new image is started in the repository. When the next checkpoint volume is formed, another new image is started in the same repository, and the first image is xe2x80x9cstoppedxe2x80x9d or xe2x80x9cclosed,xe2x80x9d so no more data can be placed in the first image. The data for the second checkpoint volume, in this situation, is formed only by the data in the second image. The data for the first checkpoint image, however, is formed by the data in both images. Some of the data in the second image, though, may not relate to the first checkpoint volume. For example, when a data block has been copied to the first image, then after the creation of the second image, if the same data block in the base volume is changed again, then the data block is copied to the second image as well. The second time the data block is copied, however, it will have different data that is not relevant to the first checkpoint volume. The first checkpoint volume, thus, relates to the data block in the first image, but not in the second image. When a data block that was not copied to the first image is copied to the second image, however, the data block relates to both checkpoint volumes, since the change to the data block in the base volume, in this case, is the first change to the data block after both checkpoint volumes were created.
A more complete appreciation of the present invention and its scope, and the manner in which it achieves the above noted improvements, can be obtained by reference to the following detailed description of presently preferred embodiments of the invention taken in connection with the accompanying drawings, which are briefly summarized below, and the appended claims.