In a relational database, records are stored as rows (tuples) within tables (relations), with each data field in a record represented as a column in the table. The table data is commonly stored in a balanced tree structure (B-tree), and can be associated with other B-tree structures used to sort the data on selected fields of the record.
Primary key B-trees, which are also called "base tables," are B-trees sorted according to a "primary key" field. For any given table of data, one field or group of fields of the record is selected as the primary key. The primary key must contain data unique to that record. For example, in a record defined to contain a person's name, social security number, and data of birth, the social security number could be selected as the primary key field since this will contain unique data for each record. In a primary key B-tree structure, the top (root) and intermediate nodes of the B-tree are index pages which contain index records that determine how the tree is to be traversed to find the requested data record(s). Each index record contains a primary key value and a page number. The page number indicates the page in the next level of the B-tree that contains records with a primary key value which is less than or equal to the primary key value in the current index record. The bottom (leaf) nodes, or "data pages," contain the data records.
The user may also decide that it is valuable to create a B-tree on a non-unique "secondary index" field of the table data. The resulting B-tree structure is called a "secondary index B-tree". Although only a single primary key B-tree exists for a given table, multiple secondary index B-trees may be created as the user deems necessary. Like a primary key B-tree, a secondary index B-tree contains root and intermediate nodes that indicate how the secondary index B-tree is to be traversed to find the requested secondary index value. In a common implementation of a secondary index B-tree, the secondary index B-tree does not contain data records at the leaf nodes. Instead, the leaf nodes contain records which correlate secondary index values to records in the primary key B-tree. The records of the secondary index B-tree records store the primary key values of one or more records that have the designated secondary key values. The primary key B-tree is searched using these primary key values to retrieve the associated data records.
A "select" query command retrieves records having columnar values that satisfy certain selection criteria. A select query is commonly a boolean operation specifying multiple data fields that must meet a certain criteria. A select query is most efficient when all the data fields in the selection criteria are keys for a primary or secondary index B-tree. A sample query is used to illustrate select query processing for a table that contains employee records when the fill name of the employee is used as the primary key of the employee table and the employee's salary is used as a secondary key: "Select employee from employee table where last-name=Smith and salary &gt;$50,000."
The select query process searches the secondary index B-tree for the salary field and creates a set of secondary index records for employees paid more than $50,000. The select query process uses the primary key values stored in the secondary index records in the set to find the corresponding primary index records in the primary index B-tree for the employees paid more than $50,000. The select query process searches these primary index records for employees with the last name of Smith, thus completing the query. This select query process must perform a minimum of two file I/Os to retrieve the leaf node page containing the secondary index records, a minimum of two I/Os to retrieve the index page containing the primary index records that correspond to the secondary index records, and a minimum of one I/O to retrieve the employee data records themselves. When retrieving records from a large database, the number of file I/O to find records that satisfy the selection criteria can become very large and costly in terms of performance.
A select query may require joining records from more than one table into a set of records that satisfy the criteria. However, all select operations begin by searching for records from a single table that meet the selection criteria; records from other tables are then searched for as required. Because select operations are the basic building blocks of all queries issued against a relational database in a transactional environment, increasing the speed of the search process results in direct improvement in the overall performance of the database system.
Therefore, there is a need for an search process that reduces the number of file I/O's necessary to find records satisfying selection criteria for a database query and thus increases the performance of the database system.