Large amounts of documents, files and other forms of data are being produced and managed on computer systems worldwide, every day. Backup systems, backup storage and backup algorithms are in use in many of these computer systems, at consumer, commercial and institutional levels. Backups allow recovery from crashes, in which data would otherwise be lost were it not for the existence of backup copies of the data. Data deduplication improves efficiency of many aspects of backing up, by eliminating redundant copies of data in the backup storage. Storage efficiency is thus improved, as space that would be occupied by redundant copies of the data can be used for storing additional data. In addition, backup time is reduced as the time that would have been spent storing redundant copies of the data is eliminated. Data deduplication can be performed as a post-processing operation to eliminating redundant copies through selective deletion after the data is stored or in the alternative data deduplication can be performed prior to storage.
Whether performed as a pre-processing or post-processing operation, many if not most deduplication systems and algorithms make use of fingerprints of data units. These fingerprints allow comparison with a newly generated fingerprint of a newly arriving data unit. The fingerprint comparison proceeds much more quickly than one-to-one comparison of the data units themselves. However, corruption in a fingerprint database or disruption of communication between a processor and a fingerprint database can cause a backup system to malfunction and even become inoperable. An incomplete backup can leave a computer system vulnerable to irrecoverable failure.
It is within this context that the embodiments arise.