In a typical relational database system, users store, update, and retrieve information by interacting with user applications. The user applications respond to a user's interaction by submitting commands to a database application, or server, responsible for maintaining the database. The database server responds to commands by performing the specified actions on the database. To be correctly processed, the commands must comply with the database language that the database server supports. One popular database language is known as Structured Query Language (SQL).
Various access methods may be utilized to retrieve data from a database. The access methods used to retrieve data may significantly affect the speed of the retrieval and the amount of resources consumed during the retrieval process. Many information retrieval applications make use of indices when performing content-based searches on the database data. Examples of database indices include R−trees, quadtrees, and B−trees.
Database indices provide organization and reference to the data in a database to permit a user to find particular items of data in the database or determine relationships among the data in the database. Database indices can also permit relationships between the data in a database and data not included in the database to be determined.
Typically, such indices are implemented using a B−tree index. The index is typically built on the columns of the table, each index entry in the index having the form <column_value, rowid>, where “rowid” uniquely identifies the row within the table that corresponds to the index entry.
Many applications involve data sets in which individual rows are identified by a primary key. The primary key uniquely identifies each row within the table and may be formed from a single column, such as social security number for an employee, or may be formed from multiple columns, such as an area code plus a seven digit local phone number. For such a table, a primary key index can be utilized.
Entries within a primary key index may have the form <primary key value(s), rowid>. A primary key index typically provides good query performance when only indexed columns are referenced in the query, in so-called index-only scans. If non-indexed columns are referenced, an index-based scan may be utilized. In an index-based scan, the primary key index structure may be traversed to obtain a rowid, which may be then utilized to access the rows within the table structure.
To avoid duplication of the primary key values as well as to speed up primary key-based retrieval efforts, a primary B+tree structure has been introduced, which may include not only the indexed column values, but also all the remaining column values of the table in the leaves of the index structure, thus rendering maintenance of a separate table unnecessary. Each row, therefore, may include both key and non-key columns. The non-key columns may be stored along with the key columns. As a result, primary B+tree structures do not incur additional input/output overhead to access the non-key columns as with a conventional table's index-based scan.
As with a conventional heap-organized table/index pair, primary B+tree structure provides good query performance for data included in the column or columns forming the primary key. To aid in the retrieval efforts of queries based upon non-primary key columns, secondary index structures may be provided. Entries within the secondary structure may have the form <secondary key value, primary key value>.
For index-only scans, data may be extracted directly from the secondary index structure. For index-based scans, the secondary index structure may first be traversed to obtain the corresponding primary key value. The primary key value may then be utilized in a full primary key index scan of the primary B+tree structure to obtain values from other columns that are of interest.
Primary B+trees may themselves be indexed. Typically, primary key based logical row identifiers are used in secondary indexes for primary B+tree structures. However, if primary key is large, then the size of secondary index will be large leading to high storage costs, and degradation in secondary index scan performance. No known solutions exist that can address this issue.