As the value and use of information continue to increase, individuals and businesses seek additional ways to process and store information. One option available to users is information handling systems. An information handling system generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes, thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, an information handling system may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
An information handling system can be configured in several different configurations ranging from a single, stand-alone computer system to a distributed, multi-device computer system, to a networked computer system with remote or cloud-based information handling resources.
Information handling systems for managing large and frequently accessed and modified databases may employ techniques, features, and data structures to achieve various data storage efficiencies. These efficiencies may include, as non-limiting examples, minimizing or reducing the amount and/or cost of storage capacity required to store and manage a dataset, increasing the size of a dataset that can be achieved in a given amount of physical storage, reducing the time required to search for and/or access a particular record, reducing the risk of lost data as well as the risk of lost compute cycles that may occur when, for example, a power failure occurs while data is being archived or otherwise managed, and so forth.
Data deduplication is an example of a technique used to reduce the amount of storage required to store a dataset. Deduplication aims to achieve data storage efficiencies by detecting and eliminating or reducing the storage of duplicated data blocks or data patterns. While data deduplication may achieve an increase in data density, i.e., the ratio of data represented to storage required, deduplication introduces its own complexities.
For example, whereas a storage system without deduplication exhibits a 1:1 ratio between stored data and references to the stored data, a deduplicated database may be characterized as exhibiting an N:1 ratio between data references and data patterns, where a data pattern refers to the block's content, i.e., the block's pattern of 1's and 0's. Accordingly, when a reference to a deduplicated data pattern is removed, the corresponding data pattern cannot be removed unless the dataset includes no other valid or active references to the data pattern. Similarly, if a particular data pattern becomes corrupted, all references to the data pattern must be identified. In the absence of reverse mapping information, i.e., mappings, for each data pattern, to each of its references, identifying all instances of references to a data pattern is, as a general rule, infeasible or inefficient since one would have to scan every data reference associated with a dataset every time a reference to a data pattern is removed.
More generally, large and/or frequently accessed and updated databases may need to maintain supporting data structures to ensure reasonable performance for basic operations, including inserting, deleting, querying, and archiving data records, as well as more advanced operations for summarizing one or more aspects of a dataset.