Data block deduplication for secondary storage is a process of finding multiple instances of the same data block, and storing just a single instance of that data block in secondary storage. The multiple objects that contain that data block (e.g., a logical unit of storage or LUN, a file, etc.) are then adjusted to refer to that single instance. Such a process conserves space within secondary storage.
One conventional approach to finding multiple instances of the same data block involves calculating a hash of each data block. These hashes are then scanned for duplicate hashes. If duplicate hashes are found, the data blocks corresponding to the duplicate hashes are then compared to confirm that the data blocks are the same.