1. Field of the Invention
This invention relates to computer systems and, more particularly, to backup management within computer systems.
2. Description of the Related Art
Many business organizations and governmental entities rely upon applications that access large amounts of data, often exceeding many terabytes of data, for mission-critical applications. Numerous different types of storage devices, potentially from multiple storage vendors, with varying functionality, performance and availability characteristics, may be employed in such environments.
Any one of a variety of failures, such as system crashes, hardware storage device failures, and software defects, may potentially lead to a corruption or a loss of critical data in such environments. In order to recover from such failures, various kinds of backup techniques may be employed. Traditionally, for example, backup images of critical data may have been created periodically (e.g., once a day) and stored on tape devices. As prices for random access media such as disk devices have continued to fall, some information technology (IT) organizations have begun to use random access media for storing backup images as well. In some storage environments, multiple layers of storage may be dedicated to storing backup images: e.g., backup images may be stored on disk or on a particular type of tape device initially, and staged periodically to a second type of tape device or other secondary media for long-term storage.
Backup solution vendors may allow users to create different types of backup images for a given data source (e.g., one or more file systems or logical volumes), such as full images and incremental images. A full image may include a complete copy of the data source, e.g., a copy of all the files within one or more file systems, and so may be used to restore the state of the data source as of the time the full image was created, without a need to access any other backup image. An incremental image may include changes that may have occurred at the data source over a period of time (e.g., over a period of time since a previous backup image was created), rather than the full contents of the data source, and may therefore typically require less storage than a full image. In general, backup images may typically be created and managed as part of a chain or sequence, where the chain includes at least one full backup image, and incremental images may be used in combination with a previously created full image within the chain to restore the state of the data source.
For example, in one environment, a full image (“F-Sun”) of a data source may also be created every Sunday, and incremental images (e.g., “I-Mon”, “I-Tue”, “I-Wed”) may be created every other day of the week. In this example, information contained within “I-Wed” may have to be combined not only with information contained within “F-Sun”, but also with information contained within all intermediate incremental images (i.e., “I-Mon” and “I-Tue”), in order to restore the state of the data source as of Wednesday. That is, an incremental image may only contain information on the changes affecting backup that may have occurred at the data source since an immediately previous backup image of any kind was created. The immediately previous image may be another incremental image or a full image. Restoration using a given incremental image may therefore typically require processing several backup images.
The growing size of data sources may lead to an increased reliance on incremental images rather than frequent creation of full images in many storage environments, especially where random-access media are used for the backup images. For example, as a result of the large storage costs associated with creating traditional full images, backup management staff may reduce the frequency at which full images are created, and create relatively long backup image chains consisting largely of incremental images. As the length of a given chain of backup images increases, however, storage costs for obsolete or redundant data blocks within incremental images themselves may become significant, and the total time required for restoration may also increase. Frequently changed blocks of the source data set may have to be included within several incremental images in a given chain. For example, if data block “B” of a data source happens to be modified on Monday, Tuesday, and Wednesday in a storage environment where incremental images are created every day, the incremental images for each of the three days may include a copy of data block B. Even though the versions of data block B stored in the incremental images for Monday and Tuesday in this example may be obsolete once the incremental image for Wednesday is created (i.e., even though the Monday and Tuesday versions may no longer be needed to restore the latest backed-up state of the data source after the Wednesday incremental image is created), the obsolete versions may still occupy storage space.
In order to reduce total backup storage requirements (e.g., by eliminating obsolete copies of data blocks within old incremental images), and to simplify management of backup image chains in general, techniques to create consolidated full images from an existing chain of images may be employed. Traditionally, such techniques have required extensive data copying, for example by first duplicating an existing full image from the chain, and then applying changes from succeeding incremental images in sequence to the duplicated full image. The time and storage required for such consolidations that involve copying data blocks from existing backup images may become prohibitive, especially for large data sources and long backup image chains.