1. Field of the Invention
This invention relates to computer systems and, more particularly, to replication and restoration of backup files within computer systems.
2. Description of the Related Art
There is an increasing need for organizations to protect data that resides on a variety of client devices via some type of backup mechanism. For example, numerous client devices may be coupled to a network to which one or more backup servers are also coupled. The backup servers may be further coupled to one or more tape drives or other backup media. A backup agent on each client device may convey data files to the backup server for storage on backup media according to a variety of schedules, policies, etc. For example, large backup datasets may be moved from a client device to a media server configured to store data for later retrieval, thereby protecting data from loss due to user error, system failure, outages, and disasters, and so on. Additionally, such backup procedures may be utilized for purposes of regulatory compliance, workflow tracking, etc.
In order to minimize the size of storage pools required to store backup data, Single Instance Storage (SIS) techniques are sometimes employed at each backup location. In some SIS techniques, data is stored in segments with each segment having a fingerprint that may be used to unambiguously identify the segment. For example, a data file may be segmented, and a fingerprint calculated for each segment. Duplicate copies of data segments are then replaced by a single instance of the segment and a set of references to the single instance. In order to retrieve a backup file, a set of fingerprints is sent to a backup server, where it is compared to the fingerprints of data stored in an associated storage pool. For each matching fingerprint, a data segment is retrieved. The resulting segments are re-assembled to produce the desired file.
In order to make data more readily available, it may be desirable to replicate portions of a storage pool. For example, the contents of a storage pool may be replicated and stored at a remote location from which they may be retrieved (e.g., to recover from a disastrous data loss). Alternatively, a multi-national enterprise may replicate a storage pool or a portion thereof during off hours to make data more easily retrievable from a variety of locations, perhaps on different continents, without the need to transmit large amounts of information on demand. In conventional systems, replication typically involves re-assembling the files to be replicated from their respective data segments stored in a source storage pool and sending them to a target storage pool where SIS techniques may be re-applied. Unfortunately, this process may lead to multiple re-assemblies of data for which there are multiple references. In addition, transmitting the resulting large datasets is costly in terms of time and bandwidth consumption. These issues also arise when data needs to be reverse replicated back to its original source storage pool, such as in the event of a server failure. In view of the above, an effective system and method for replicating single-instance storage pools that accounts for these issues is desired.