The present invention relates generally to the field of data management, and more particularly to building data indexes.
A data set is a collection of data where every column of the table represents a particular value and each row corresponds to a given member of the data set (e.g., a database table). The data set lists values for each of the variables, such as height and weight of an object, for each member of the data set. Commonly, a data set corresponds to the contents of a single database table or a single statistical matrix. The values in a data set may be numbers, such as real numbers or integers (e.g., representing a person's height in centimeters) but may also be nominal data (i.e., not consisting of numerical values), for example, representing a characteristic of a person. More generally, values may be of any of the kinds described as a level of measurement. For each variable, the values are normally all of the same kind. However, there may also be missing values.
A database index is a data structure that can improve the speed of data retrieval operations in a database table (e.g., a data set). Database indexes utilize additional writes and storage space to maintain the index data structure. Indexes can be utilized to quickly locate data without having to search every row in a database table each time a database table is accessed. Further, an index can be a copy of select columns of data from a database table that can be searched very efficiently that also includes a low-level disk block address or direct link to the complete row of data it was copied from. Database indexes can be implemented utilizing a variety of different data structures, such as balances treed, B-trees, and hashes.