Some existing storage systems support deduplication, usually in backup storage. However, much of the data transmitted to a backup storage system is duplicate or only slightly modified. Some existing deduplication solutions maintain a record of data written to storage. Some of those records are organized as one or more key-value tables. In that example, the records are indexed by a hash of a block of the data (e.g., the hash of the block of data is the key), and the value associated with the hash of the block of data is the reference count for that block of data, and its address in storage (e.g., HashOfData is a key into <ReferenceCount, AddressOfData>).
In existing asynchronous deduplication systems, data are first written out to storage media without checking for duplicates. Subsequently, data are read back to calculate the hash and look for duplicates. If no duplicates are found, data are inserted into the key-value table. Later when data are overwritten, they will be removed from the key-value table. However, updating the key-value table, and reading from storage for deduplication, may have a high resource cost.
Corresponding reference characters indicate corresponding parts throughout the drawings.