1. Field of the Invention
The present invention relates to a computer program product, system, and method for restoring a restore set of files from backup objects stored in sequential backup devices.
2. Description of the Related Art
In a network backup environment, client systems back-up their data in backup objects to a backup server over a network. The backup server maintains a database of backup objects providing information on stored backup objects. The client systems may restore files from the backup objects maintained by the backup server. The backup objects for a volume include a full volume backup object as of an initial point-in-time and delta backups that capture changes to the volume as of the initial point-in-time at different points-in-time. A full volume backup may be comprised of one object that represents the entire volume or comprised of multiple objects. The delta backups may comprise an incremental backup or differential backup. An “incremental backup” at a point-in-time comprises a backup object having files or blocks that have changed between the point-in-time of the last taken incremental backup or full volume backup, whichever is more recent, and the point-in-time of the incremental backup. A “differential backup” comprises a backup taken of a volume as of a point-in-time of the last full volume backup, so a differential backup set has all files or blocks that have changed between the point-in-time of the differential backup and the last full volume backup.
Tivoli Storage Manager (TSM) FastBack® is an example of a system that performs block level incremental backups. Other examples include image backup of a file system by a TSM client and FlashCopy® Manager, in which a local hardware snapshot is created and later backed up to a Tivoli Storage Manager server. (Tivoli Storage Manager FastBack and FlashCopy are registered trademarks of International Business Machines Corp. in the United States and other countries). In addition to incremental and differential backups, deduplication can be applied to further reduce the backup repository storage requirements.
The backup client and server may implement data deduplication, which removes redundant data during a backup operation to optimize storage space and conserve network bandwidth. The backup operation may back-up data in chunks or extents of data, such that if multiple backup objects share the same extent, then only one instance of the extent is stored in backup storage.
Deduplication during storage backup activities can be performed at the data source (client), data target (server), or on a deduplication appliance connected to the backup server. The restoration of deduplicated data from the server to the client involves reconstruction of the data from deduplicated chunks or extents. In current systems, the deduplicated data is stored on disk, and the backup server will access the extents for backup objects to restore from the disk, and then return full backup objects to the client, including objects that have common extents. Even if the same extent is found in many backup objects (or even in the same object) selected for restore, that chunk or extent will be restored and transmitted from the server to the client multiple times. The backup server may access the extents from disk in any order due to the random access nature of the disk-based storage.
An operation to restore data from the backup objects may require applying data from a full and associated incremental or differential backups (which may have been deduplicated). The restore process involves reconstruction of the client image which can become fragmented in the backup repository due to a number of possible data transformations and placement location in the backup repository. For instance, the source client image can be broken into multiple objects in the backup repository (e.g. breaking a 0.5 TB volume into 1 GB backup objects). These objects could span multiple volumes (disk or tape) in the backup repository. For incremental or differential point-in-time backups, each point-in-time backup could be on different volumes (disk or tape) in the backup repository. Further, performing deduplication of the source client image may result in deduplicated chunks of a source image existing on multiple volumes (disk or tape) in the backup repository. The server backup program may have management operations that move, expire or reclaim data. These operations could result in changing the order of objects or extents on sequential-access media.
There is a need in the art for improved techniques for handling the restoration of backup objects in different storage environments.