The invention relates generally to relational database systems, and more particularly to indexing in relational database systems.
Users perceive a relational database as a collection of tables, each of which arranges data for items and item attributes in rows and columns: a table row corresponds to an item (referred to as a record); a table column corresponds to an attribute of the item (referred to as a field). A table record can generally be identified by a record identifier (RID), indicating the page and offset within the page where the record is stored.
FIGS. 1-3 illustrate examples of tables in a relational database. Data stored in the tables can be accessed using indexes such as those illustrated in FIGS. 4-6, which aid in accessing data by providing a logical ordering of the table records. FIG. 4 illustrates a conventional index 400 on the CUST table 100 of FIG. 1, using the customer name 410a as the index key. Index 400 provides an alphabetical ordering of the records of CUST table 100 by customer name 120, and associates each name 410a with record identifiers 410b for table records having the customer name in their cust.sub.-- name field 120.
A query for phone numbers of customers named Smith would normally be processed by searching index 400 for an entry having the key value "Smith" 440a, using the associated record identifier 440b to retrieve a record in CUST table 100, and extracting data in the cust.sub.-- phone field 140 of the retrieved record. Because index 400 is ordered by customer name, the index search is faster than the linear search of CUST table 100 that would be required without an index. Indexes may be implemented using various methods such as B-Trees and hash functions.
As exemplified by index 400, index entries generally have two attributes: an index key value, and record identifiers of table records. An index key generally is a field or a combination of fields from a table, and each index entry associates a key value with record identifiers of table records.
Conventional indexes are single-table, associating key values based on table fields with record identifiers of records of that same table. Index 400 of FIG. 4 is an example of a single-table index. Various types of single-table indexes exist. FIG. 7 illustrates an example of a "virtual column" index, created by defining the index key as an expression using one or more fields of a table record. For example, to order the records of the order table ("ORD table") of FIG. 2 by price including discount, a virtual column index is created using the index key (ORD.price-ORD.discount).
Another type of single-table index is a "partial index," which indexes only those table records satisfying a specified condition. For example, FIG. 8 illustrates an index associating values for ORD.date only with record identifiers of table records having a price of at least $50,000.00.
In relational database systems, an index may also be multi-table, using fields from a first table in its index key, and associating index key values with record identifiers of records of a second table. Multi-table indexes are generally referred to as join indexes.
FIGS. 4-6 together illustrate an implementation of an index for retrieving index records based on customer name, created on the ITEM table 300 (FIG. 3), using the name field 120 from the CUST table 100 as an index key (CUST.cust.sub.-- name). As shown, this index has three levels. The first level (index 400, FIG. 4) associates values for CUST.cust.sub.-- name 410a with record identifiers of records of the CUST table 410b; the second level (index 500, FIG. 5) associates values for CUST.cust.sub.-- no 510a with record identifiers of records of the ORD table 510b having the same value for ORD.cust.sub.-- no; and the third level (index 600, FIG. 6) associates values for ORD.ord.sub.-- no 610a with record identifiers of records of the ITEM table 610b having the same value for the ITEM.ord.sub.-- no.
Indexes 400-600 aid in retrieving item records corresponding to a customer name. For example, to retrieve item records for a customer Smith, index 400 is used to retrieve the CUST.cust.sub.-- no (1002) of the CUST records corresponding to Smith (002); index 500 is used to retrieve the ORD.ord.sub.-- no (002, 003) of the ORD records (101, 102) corresponding to the CUST.cust.sub.-- no (1002); and finally, index 600 is used to retrieve the ITEM records (201, 202, 203) corresponding to the ORD.ord.sub.-- no (002, 003).
Although such multi-table indexes do simplify accessing data in relational database systems, conventional indexing for relational databases has limitations. For example, a conventional index associating key values of fields of a first table with record identifiers of records of a second table generally has more than one level, as illustrated by FIGS. 4-6.
A consequence of having multiple levels in multi-table indexes is that such indexes require more memory space than single-table indexes. Another consequence of having multiple levels is that more processing is required to handle queries. For example, retrieving item records corresponding to a customer name "Smith" requires multiple processing steps.