Relational databases store information in tables that are organized into rows and columns. In non-relational terminology, a row is a record and a column is a field. A user retrieves information from the tables by entering a request that is converted to queries by a database application, which then submits the queries to a database server. In response to the queries, the database server accesses the table(s) specified by the query to determine which information within the table(s) satisfies the queries. The information that satisfies the queries is then retrieved by the database server and transmitted to the database application and ultimately to the user.
FIG. 1 depicts a logical layout of an exemplary table 100 within a relational database. Table 100 comprises three user columns, column A 104, column B 106, and column C 108, and eleven rows 110-128. Table 100 also contains an internal column or pseudocolumn 102, referred to as a rowid. A table's rowid is retrievable by query and uniquely identifies a row in the table, but is not normally displayed when the structure of the table is listed. For example, a rowid of 221 uniquely identifies row 110, which contains the values of 3 in column A 104, 5 in column B 106, and 2 in column C 108. In this example, the values of the columns A 104, B 106, and C 108 are integers, but it is to be understood that columns of a database table can hold values of any of a variety of types including floating point numbers and variable length strings of characters, dates, etc.
For any given database application, the queries to retrieve information from a table must conform to the rules of a particular query language. Most query languages provide users with a variety of ways to specify the information to be retrieved. For example, in the Structured Query Language (SQL), the query, select A from T where A<5, requests the retrieval of the information contained in column A of specified rows of table T that satisfies a specified condition. The conditions in the where clause specify one or more predicates, in this example A<5, which must be satisfied by matching rows. In the example, rows 110, 114, 116, 118, 120, 124, and 128 of table 100 satisfy this query because the corresponding values of column A 104 are 3, 3, 2, 1, 4, 2, and 4, respectively. On the other hand, rows 112, 122, and 126 of table 100 do not satisfy this query because the corresponding values of column A 104 are 6, 7, and 8, respectively.
In order to process the exemplary query, a full table scan may be implemented that reads all the rows from table 100. Thus, the full table scan reads the data for all the columns in table 100, even though only the information from column A 104 was necessary to process the query. If the amount of the rows in a table and/or the data for the columns not used in the query is very large, then the full table scan methodology becomes very inefficient because of the unnecessary amount of disk input/output.
Accordingly, many database systems provide indexes to increase the speed of the data retrieval process. An index is a body of entries ordered by key values of a key. An index key may be, for example, a column of a table. Such a key is referred to herein as a key column. Each index entry identifies a record (e.g. row).
One implementation of a database index is a B-tree, whose logical layout is illustrated in FIG. 2. A B-tree index is a hierarchical arrangement of two types of nodes: leaf nodes and branch nodes. Leaf nodes reside at the lowest level of the B-tree hierarchy and contain values from the actual column or columns upon which the index is built and the rowid of the corresponding rows. Leaf nodes may comprise multiple entries that contain data for many rows, e.g. 100 entries corresponding to 100 rows, but, for purposes of example, leaf nodes are illustrated as containing a single entry. For example, B-tree index 200, being built upon column A 104 of table 100, has leaf nodes 210-228 collectively holding the values of column A 104. Specifically, leaf node 210 holds the index value 1 from column A 104 and the rowid 118, which identifies row 118 of table 100. As another example, leaf node 228 contains an index value of 8 from column A 104 and a rowid of 123, identifying row 126 of table 100. In the depicted example, each leaf node contains a pointer or other link to a previous and subsequent leaf node if such a leaf node exists. For example, leaf node 226, which contains an index value of 7, points to leaf node 228, which contains an index value of 8, and to leaf node 224, which contains an index value of 6.
The non-leaf nodes of a B-tree index are branch nodes. Branch nodes contain information that indicates a range of values. In the illustrated B-tree index 200, nodes 202, 204, 206, and 208 are branch nodes and, therefore, each corresponds to a range of values. The range of values indicated by each branch node is such that all nodes that reside below a given branch node correspond to the values that fall within the range of values for the given branch node. For example, node 206 is a branch node that corresponds to the numerical range from 4 to 6. Consequently, leaf nodes 220, 222, and 224, all of which reside below branch node 206 in the hierarchy, correspond to values that fall within the range from 4 to 6.
It is not uncommon for a table to have multiple indexes that reference rows of that table. Also, while a typical example of an index indexes a single column of a table, an index that indexes multiple columns of a table is common. As a result of these two factors, a set of indexes for a single table may require more storage space than the actual table. In fact, total index storage for some database systems exceeds 100 GB in size. Due to the size required to store indexes, it becomes imperative to efficiently compress indexes.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.