Many businesses live or die based on the efficiency and accuracy by which they can store, retrieve, process, and/or analyze data. “Data,” as used herein, is digital information that is electronically stored on storage device(s). Data may be maintained on an individual storage device, such as a local hard disk or solid state drive, a CD-ROM, or a memory module. Alternatively, data may be distributed over multiple storage devices, such as storage devices that are working together to provide a cloud storage service or storage devices that are operating separately to store subsets of the data. One or more database servers may operate in parallel to provide read and/or write access to the data. Large sets of data, whether stored on one device or distributed among many devices, may consume a significant amount of storage space and/or processor time to store, retrieve, process, and/or analyze.
Data may be described in terms of fields and values. “Fields,” as used herein, refer to containers or labels that provide contexts. “Values,” as used herein, refer to information that is stored according to or in association with the contexts. For example, a single table may have many different columns that provide contexts for the values that are stored in the columns. The different columns may store different sets of data having different contexts, and the different sets of data may or may not be of different data types. In another example, a single document may have many attributes and/or elements that provide contexts for the values that are nested in the attributes and/or elements. Elements or attributes that share the same name, path, or other context may collectively store a set of data that shares the context. Different elements or attributes may store different sets of data having different contexts, and the different sets of data may or may not be of different data types.
To alleviate some of the overhead for storing, retrieving, processing, and/or analyzing large sets of data, some computer systems utilize metadata that is created and stored in association with the sets of data. “Metadata,” as used herein, is data that describes other data. Metadata may describe data in a manner that allows the described data to be stored, retrieved, processed, and/or analyzed more efficiently or more accurately. For example, metadata for a given set of data may include a mean, median, mode, minimum, and/or maximum of the given set of data, such that these value(s) may be quickly retrieved without being recalculated each time the set of data is accessed. The metadata may be used to plan for data processing such that a data processor can effectively allocate resources for the data processing.
General statistics such as the mean, median, mode, minimum, or maximum value(s) may be helpful when storing, retrieving, processing, and/or analyzing a set of data. However, these general statistics are not always helpful for predicting whether a non-uniform set of data has specific values.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.