Block deduplication is the process of (i) finding block mappings that map to separate instances of identical data, and (ii) updating those block mappings to refer to a single instance of that data. Using block deduplication, data storage systems are able to eliminate storage of redundant copies of host data.
One conventional approach to performing block deduplication in a data storage system involves closely evaluating each block of host data stored by the data storage system for possible deduplication. In particular, the data storage system applies a hash algorithm to each block of host data stored by the data storage system. After the data storage system computes a hash result from a particular block of host data, the data storage system compares that hash result to a database of stored hash results previously computed from other blocks of host data. If the data storage system finds a matching hash result in the database, the data storage system performs a bit-by-bit comparison to determine whether the blocks of host data are identical. If so, the data storage system shares a single instance of the block of host data among block mappings. Otherwise, the data storage system adds a new record to the database, i.e., the data storage system adds the hash result computed from the particular block of host data to the database for possible matching in the future.
When a host modifies a block of host data that has been deduplicated, the data storage system splits that shared block of host data into separate instances. Along these lines, suppose that a data storage system maintains a first block mapping and a second block mapping to a single instance of host data. Further suppose that a host issues an IO command to modify the block of host data as referenced by the second block mapping, while the first block mapping is intended to continue to reference the original block of host data. The data storage system responds by maintaining the original instance of the host data on behalf of the first block mapping, and creating a new instance which includes the modification on behalf of the second block mapping.