1. Field of the Invention
This invention relates to computer systems and, more particularly, to backup management within computer systems.
2. Description of the Related Art
Many business organizations and governmental entities rely upon applications that access large amounts of data, often exceeding many terabytes of data, for mission-critical applications. Numerous different types of storage devices, potentially from multiple storage vendors, with varying functionality, performance and availability characteristics, may be employed in such environments.
Any one of a variety of failures, such as system crashes, hardware storage device failures, and software defects, may potentially lead to a corruption or a loss of critical data in such environments. In order to recover from such failures, various kinds of backup techniques may be employed. Traditionally, for example, backup images of critical data may have been created periodically (e.g., once a day) and stored on tape devices. As prices for random access media such as disk devices have continued to fall, some information technology (IT) organizations have begun to use random access media for storing backup images as well. In some storage environments, multiple layers of storage may be dedicated to storing backup images: e.g., backup images may be stored on disk or on a particular type of tape device initially, and staged periodically to a second type of tape device or other secondary media for long-term storage.
Backup solution vendors may allow users to create several different types of backup images for a given data source (e.g., one or more file systems), such as full images, differential images, and incremental images. A full image may include a complete copy of the data source, e.g., a copy of all the files within one or more file system, and so may be used to restore the state of the data source as of the time the full image was created, without a need to access any other backup image. Differential and incremental images may include changes that may have occurred at the data source over a period of time, rather than the full contents of the data source, and may therefore typically require less storage than full images. Differential images and incremental images may typically be created and managed as part of a sequence of backup images, where the sequence includes at least one full image, and may be used in combination with a previously created full image within the sequence to restore the state of the data source. Such a sequence of backup images for a data source may also be referred to as a backup set. Differential and incremental images may differ from each other in the number of backup images that may need to be analyzed or processed during restoration of the state of the data source.
For example, in one environment, a full image (“F-Sun”) of a data source may be created every Sunday, and a differential image (e.g., “D-Mon”, “D-Tue”, “D-Wed”, etc.) may be created every other day of the week. In such an example, a differential backup image “D-Wed” created on a Wednesday may include sufficient information that, when combined with the information stored in the previous full image “F-Sun”, allows the state of the data source as of Wednesday to be restored. Thus, a differential image may contain information on all the changes affecting backup that may have occurred at the data source since a previous full image was created.
In a second example, a full image (“F-Sun”) of a data source may also be created every Sunday, but incremental images (e.g., “I-Mon”, “I-Tue”, “I-Wed”) may be created every other day of the week. In this second example, information contained within “I-Wed” may have to be combined not only with information contained within “F-Sun”, but also with information contained within all intermediate incremental images (i.e., “I-Mon” and “I-Tue”), in order to restore the state of the data source as of Wednesday. That is, an incremental image may only contain information on the changes affecting backup that may have occurred at the data source since an immediately previous image of any kind was created. The immediately previous image may be another incremental image, a differential image, or a full image. Restoration using a given incremental image may therefore typically require processing more backup images than restoration using a differential image, especially as the number of intermediate incremental images between the last full image and the given incremental image increases.
Both incremental images and differential images may usually require less storage space than full images, and an incremental image may often require less storage space than a differential image created at about the same time for the same data source. For large data sources, such as file systems or volumes that collectively occupy terabytes of storage space, the difference in storage space requirements for the different backup image types may be substantial. A technique or method that reduces the amount of storage space needed to store backup sequences while retaining the ability to restore data source state as of desired points in time may therefore be desirable.