As the value and use of information continue to increase, individuals and businesses seek additional ways to process and store information. One option available to users is information handling systems. An information handling system (IHS) generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes, thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, IHSs may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in IHSs allow for IHSs to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, an IHS may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
An IHS can be configured in several different configurations ranging from a single, stand-alone computer system to a distributed, multi-device computer system, to a networked computer system with remote or cloud storage systems.
IHSs that receive and store significant amounts of data from external sources, generally referred to herein as user data, may include data deduplication features to conserve the amount of storage space required. Data deduplication applications may translate comparatively large amounts of data, referred to herein as data blocks, into comparatively small representations, referred to herein as block hashes or, more simply, hashes. Data deduplication applications may operate on data at its source or at its ultimate destination or target and may process variable or fixed-size data blocks. As an example, a fixed block data deduplication application may translate or “hash” a 4 Kb data block into a 32 bytes (256 bit) block hash.
Block hashes for previously received data blocks may be stored in a data structure referred to herein as a “data dictionary” or, more simply, “dictionary” that maps a block hash to a storage location where the data block is or will be stored. When a data storage device receives a new data block, a data deduplication application may generate a block hash for the data block and use the block hash to query the data dictionary for any matching block hashes.
If the query “hits” in the dictionary, i.e., if the block hash of the newly received data block matches a block hash previously stored in the dictionary, the data deduplication application may verify any query that hits in the data dictionary by determining whether the matching block hashes indicate duplicate data blocks or whether the matching block hashes represent a false positive that can occur when two data blocks that are not duplicates produce the same block hash.
Verification of a query hit may include a read and compare of the two applicable data blocks. In some cases, verification may be assumed or omitted when, as an example, the algorithm used to generate the block hashes is sufficiently “collision resistant.”
If the data deduplication application verifies a query hit or otherwise concludes that a query hit corresponds to duplicate data blocks, the data deduplication application may generate and store a reference or pointer to the more senior data block in lieu of storing the newly received data block. In this manner, the amount of unique data that the data storage device contains may be increased.