A database system is a computerized record keeping system which allows a user of a database to manipulate information. One type of database which has gained popularity because of its ease-of-use and simple structure is a relational database. Consider the ubiquitous spread-sheet.
In a relational database, the relationships of the data are usually maintained in a table having rows and columns. The rows organize instances of related data, and the columns group data having common characteristics or attributes.
The operation of locating data in the database is called a query. Typically, queries are expressed in a query language, for example a standard query language (SQL). The statements of the query language set out the search criteria in logical or "Boolean" expressions. A simple query for a census database could be paraphrased as: locate all persons having an income in the range of $20,000 to $30,000. In this statement, the attribute of the subject persons to be examined is "income," the logical condition to be satisfied is "between", and the values which must satisfy the logical condition are in the range of 20000 to 30000.
In a query, the expressions which determine the search criteria are called predicates. Most predicates in queries are phrased in terms of a range of values, for example greater than, less than, between, etc. These type of predicate relationships are called RANGE predicates. Other type of query specifications, for example mathematical relationships, are called non-RANGE predicates.
In a query, range, and non-RANGE predicates can be related by logical operators such as AND, OR, and NOT. During processing of the query, if a particular value falls within the specified range, a logical true condition is returned for the predicate, otherwise a false condition is returned. The true and false conditions of all query predicates can be combined to determine which data are selected from the database.
Processing queries is a frequent, and often time consuming task since the database system can include a large amount of data, and a given query may need to retrieve data from different portions of the database. A recurring problem in the design of database system is to make the processing of queries more efficient.
One way to locate data is to sequentially scan the rows of the table, e.g., the records of the database, for the attribute values which satisfy the search criteria. This is known as a table scan, because the relationship table is scanned from beginning to end. Although the table may imply that the rows storing related data are in sequential order, in fact, the dam may be spread-out randomly over multiple storage devices. Therefore, table scans are generally inefficient and time consuming.
In most database systems, a separately maintained index structure is used to facilitate the locating of records of data. The index includes keys having values and addresses. The keys, corresponding to records, are usually stored in a selected order of their values, and the addresses correspond to physical locations of the disk blocks and memory pages where the data associated with the key are stored. If the keys of the index are constructed from several database attributes or columns, the key is called a compound key, and the individual portions of the key corresponding to the several attributes are called key "segments."
One method of organizing an index is a B-tree with a "root," intermediate "branches" and interior nodes depending from the root, and "leaves" depending from the end of branches. The term B-tree is short for "balanced tree," and refers to the roughly equal number of branches which depend from the root and intermediate branches, and the uniform or balanced distribution of the leaves among the branches. The leaves correspond to the actual index entries storing key and address data.
A particular index entry can be found by following the branches from the root to the desired leaf. For example, to scan the rows of a relational table in a selected index order, the leaves are scanned sequentially, and the index entries at the leaves are used to locate the rows of the database.
When an index scan is used to find keys, the search for data which satisfy the query conditions can be performed more efficiently than for a table scan. For example, prior to scanning the B-tree, it could be determined that certain branches do not include any leaves which satisfy the search criteria. The searching of the leafs of such branches can be skipped in their entirety to speed up the querying process.
Most prior art database systems have included a component called a query optimizer. The query optimizers transforms a given query into an equivalent optimized query which may be processed more efficiently than the given query. The design of a query optimizer is complex, and is made more so by the fact that certain factors which influence the optimizing transformation are not known until the query is actually processed.
Range optimizers extract true and false ranges of key values from the query expression. Only those branches which include keys within the true range are scanned. Branches with key values in the false ranges are skipped as if they were "pruned" from the B-tree. However, if the query includes multiple conditions, the true and false ranges may overlap so that the scan of the index is less than optimal. Also, the values of the predicates of the queries are frequently expressed in terms of variables which are only resolvable when the query is processed.
In another type of query optimizer, the query is rearranged according to the frequency of true evaluations of the underlying query predicates in order to decrease the average length of time required for Boolean evaluation and consequently for scanning the index. However, range optimization is usually substantially more efficient than scanning according to frequencies because it reduces the number of physical accesses to the dam, and because most queries are phrased in terms of range conditions.
Understandably, there is a need for a query optimizer which can process a query having multiple ranges more efficiently. Furthermore, the optimizer should be able to optimize queries which include values which are expressed as variables. Furthermore, the optimizer should further reduce processing time by eliminating unnecessary physical accesses of the data while executing of the query. The invention satisfies these objectives.