Some computing systems backup data in multiple locations in order to increase the safety of the data. Backing up data in multiple locations can be accomplished through the use of a primary backup system and a replica backup system. Each time a backup is performed, the state of the computing system is determined, and all information in the current state is recorded into a backup in the primary backup system. After the backup in the primary backup system is created, it is replicated, e.g., copied to the replica backup system. Some backup storage systems store both incremental backups, comprising only the changes in state since the previous backup, and full backups, comprising the complete state of the storage system at the time it was made. Some backup storage systems store only full backups.
Some data storage systems compress data using deduplication, e.g., by breaking data into chunks and only storing each chunk once regardless of how many times it occurs in the original data. Replicating data in a deduplicating storage system can be accomplished by transmitting identifying information for each chunk from the primary backup system to the replica backup system, using the identifying information to determine which chunks are stored on the replica, and transmitting data chunks determined to not already be stored on the replica. If the chunk is already stored it does not need to be copied to the replica a second time.
Since a full backup captures the complete state of a computing system, it is typically a very large file, and replicating a full backup requires a great deal of information to be transmitted from the primary backup to the replica backup. In a deduplicating system, although more efficient than transmitting an entire full backup, even just transmitting the identifying information for each data chunk of a full backup can incur substantial overhead and require too much time and bandwidth.