Data storage systems are arrangements of hardware and software in which storage processors are coupled to arrays of non-volatile storage devices, such as magnetic disk drives, electronic flash drives, and/or optical drives. The storage processors service storage requests, arriving from host machines (“hosts”), which specify blocks, files, and/or other data elements to be written, read, created, deleted, and so forth. Software running on the storage processors manages incoming storage requests and performs various data processing tasks to organize and secure the data elements on the non-volatile storage devices.
Some data storage systems support data compression and/or deduplication for promoting storage efficiency. Compression works by reducing the size of data sets so that they consume less storage space on disk. Storage systems may perform compression in hardware, in software, or via a mixed hardware/software approach. Deduplication works by replacing redundant data with pointers to a single retained copy (or a smaller number of such copies). In a typical deduplication scheme, a data storage system maintains a database that associates digests (e.g., hash values) of stored data blocks with corresponding pointers to locations of those data blocks in the system. When a new data block arrives or is fetched from disk, the storage system computes a digest of the new data block and attempts to match the new digest to one already stored in the database. If a match is found, the data storage system configures pointer metadata for the new data block so that it points to the previously-stored data pointed to by the matching entry in the database. Both compression and deduplication may be performed in the background, e.g., using a background process that works on already-stored data, or inline with data writes, such that newly arriving data blocks are compressed and/or deduplicated upon arrival.