Data deduplication provides a number of techniques to remove redundant data during a backup operation, thereby reducing required storage and potentially conserving network bandwidth. In a typical configuration, a disk-based storage system, such as a storage-management server or VTL (virtual tape library), has the capability to detect redundant data “extents” (also known as “chunks”) and reduce duplication by avoiding the redundant storage of such chunks. For example, the deduplicating storage system could divide file A into chunks a-h, detect that chunks b and e are redundant, and store the redundant chunks only once. The redundancy could occur within file A or with other files stored in the storage system.
Known techniques exist for deduplicating data objects. In a typical client-server software system, deduplication during storage activities can be performed at the data source (client), data target (server), or on a deduplication appliance connected to the server.
The restoration of deduplicated data from the server to the client involves reconstruction of the data from deduplicated chunks. In current deduplication systems the reconstruction process takes place on the server that sends each chunk of the fully reconstructed data to the client. Moreover, even if the same chunk is found in many files (or even in the same file) selected for restore, that chunk will be restored and transmitted from the server to the client multiple times.
Although deduplication provides benefits for backup and storage of client data, it can adversely affect restore performance because of the time needed to reconstruct original data from numerous chunks, especially if those chunks are fragmented in the storage server (such as when the chunks are stored throughout multiple volumes, or the chunks have been migrated to a tape volume). Additional inefficiency arises because existing deduplication systems resend all data to the client without regard to whether that data already exists on the client. Techniques are needed to optimize the restore of deduplicated data to clients within data storage systems.