Data storage systems are arrangements of hardware and software that typically include multiple storage processors coupled to arrays of non-volatile data storage devices, such as magnetic disk drives, electronic flash drives, and/or optical drives. The storage processors service host I/O requests received from host machines. The received host I/O requests specify one or more data objects (e.g. logical disks or “LUNs”) and indicate host data that is to be written to or read from the data objects. The storage processors include specialized hardware and software that processes the incoming host I/O requests and that performs various data storage tasks that organize and secure the host data that is received from the host machines and stored on the non-volatile data storage devices of the data storage system.
Previous data storage systems have performed deduplication on the host data that they store. Some previous data storage systems have performed block-level deduplication. In block-level deduplication, duplicate copies of blocks of data are eliminated in order to improve overall storage utilization. To accomplish in-line block-level deduplication, prior to storing a block of data in non-volatile storage of the data storage system, the block is compared to previously stored blocks in order to determine whether the block is a duplicate. In order to facilitate the comparison process, a crypto-digest may be generated for each block of data using a cryptographic hash function (e.g. SHA-1, SHA-2, etc.), and then compared with crypto-digests that were previously generated for previously stored blocks. If a crypto-digest for a new block to be stored matches a crypto-digest that was generated for a previously stored block, then a relatively small pointer to the previously stored block may be stored instead of the new block, thus reducing overall non-volatile storage requirements. When the pointer is subsequently retrieved while processing a subsequently received I/O request that is directed to the location of the new block, the pointer can simply be replaced with a copy of the previously stored block, e.g. as retrieved from the non-volatile storage of the data storage system.