1. Field of the Invention.
This invention relates in general to database management systems performed by computers, and in particular to a method and apparatus for generating dynamic and hybrid sparse indices for workfiles used in SQL queries in a relational database management system.
2. Description of Related Art.
Relational DataBase Management System (RDBMS) products using a Structured Query Language (SQL) interface are well known in the art. In RDBMS products, all data is externally structured into tables. The SQL interface allows users to formulate relational operations on the tables either interactively, in batch files, or embedded in host languages such as C, COBOL, etc. Operators are provided in SQL that allow the user to manipulate the data, wherein each operator operates on either one or two tables and produces a new table as a result.
An SQL query often generates workfiles to store temporary result tables while performing the query. Such workfiles are generally created to store intermediate relations from the query. However, there are numerous problems in the use of workfiles. For example, workfiles containing millions of tuples are not uncommon in large databases. As a result, it is important to employ efficient techniques when manipulating workfiles.
One approach may linearly scan workfiles in order to find matching tuples. For every tuple in an outer probing sequence (outer relation), a workfile corresponding to an inner relation is scanned from the beginning (except in a sort-merge join) until a match is found, or until a value greater than the probing value (in a workfile sorted in descending order) is found, or until an end-of-file is encountered. This is an inefficient technique for searching large workfiles, especially when the workfile and outer, probing sequence contain a large number of tuples. For example, if there are n.sub.1 tuples in the workfile and n.sub.2 tuples in the outer probing sequence, then the required number of comparisons is of the order of n.sub.1 n.sub.2.
Another approach may employ a bi-directional linear search technique that is similar to the above linear scan except that workfiles are employed only if it is cost beneficial to do so. In the bi-directional linear search technique, the position of the previous probe is used as the starting location for the next search. This position is also used to partition the workfile into two intervals. Depending upon the next value, either of the two partitions are linearly scanned. Although the average number of comparisons in this techniques is equal to half the average number of comparisons in the previous technique, the required number of comparisons in the worst case is still of the order of n.sub.1 n.sub.2.
Thus, there is a need in the art for techniques for optimizing the performance of workfiles used in SQL queries.