Client systems may store duplicate data for a variety of reasons. For example, client systems may store duplicate data to avoid accidental data removal. Client systems may also store multiple versions of a file to preserve the file's modification history, which typically results in storing duplicate data. As another example, database applications (e.g., ORACLE, SQL SERVER, etc.) may pre-allocate space for data files. In such situations, unused data file space may be generated using the same data patterns (e.g., unused data file space may be filled with zeros). Thus, database files (and backups of database files) may include a significant amount of duplicate data. As a third example, multiple virtual machines running on the same physical system may result in a significant amount of duplicate data being stored on the physical system.
Backup and archiving systems may implement deduplication to preserve storage space when backing up or archiving data from a client system. In such situations, the backed-up or archived data may be stored in a deduplication server. In a traditional deduplication system, a client system may retrieve all data from the deduplication server during a restoration process. A segment with ten duplicates on the client system may be stored as a single segment on the deduplication server but may be retrieved ten times from the deduplication server to restore client backup data. Retrieving the same data multiple times from the deduplication server may increase restoration time, consume extra network bandwidth, and create extra workload for the deduplication server. What is needed, therefore, is a more efficient process for restoring deduplicated data.