1. Field of the Invention
This invention relates to computer systems, and more particularly, to efficient methods and mechanisms for backing up data in computer systems.
2. Description of the Relevant Art
There is an increasing need for organizations to protect data that resides on a variety of client devices via some type of backup mechanism. For example, numerous client devices may be coupled to a network that is coupled to a backup store. The backup store may comprise one or more backup servers, which may be further coupled to a disk storage unit, one or more tape drives or other backup media. A backup agent on each client device may convey data files to the media server for storage according to a variety of schedules, policies, etc. For example, large backup datasets may be moved from a client device to a media server configured to store data for later retrieval, thereby protecting data from loss due to user error, system failure, outages, and disasters, etc. as well as archiving information for regulatory compliance, workflow tracking, etc.
In order to make data more readily available and to reduce the storage capacity required, single-instance storage techniques may be used. In a single-instance storage system, data is typically stored in segments, with each segment having a fingerprint that may be used to unambiguously identify it. For example, a data file may be segmented, and a fingerprint calculated for each segment. Duplicate copies of data segments are replaced by a single instance of the segment and a set of references to the segment, one for each copy. In order to retrieve a backup file, a set of identifiers (e.g., fingerprints) is sent to the single-instance storage system, where it is compared to the fingerprints of data stored in a storage pool. For each matching fingerprint, a data segment is retrieved. The resulting segments may be re-assembled to produce the desired file.
When a new client device requires an initial backup of its data, the new client may utilize a single-instance backup mechanism described above. An initial backup requires all unique data segments, along with all the related metadata, whether unique or not from the new client to be conveyed to the data store via the network which results in heavy network traffic. As a consequent, performance of the available network may be reduced.
In view of the above, efficient backup methods and mechanisms are desired.