A common form of information retrieval system for managing computerized records contained in a database is a relational database management system. Between an actual database (that is, data stored for use by a computer) and users of the contents of that database is a software layer known as the relational database management system (RDBMS or DBMS). The DBMS is responsible for handling all requests for access to the database and shielding the users from the details of any specific hardware and/or software implementation. Using relational techniques, the DBMS stores, manipulates and retrieves data in table form. Typically, these relationships are defined by a set of columns and a set of rows. The columns are also referred to as attributes, or data types. The rows are also referred to as records or tuples, of data.
A standard language for dealing with DBMSs is the Structured Query Language (SQL). SQL comprises both data definition operations and data manipulation operations. To maintain data independence, a set of SQL commands, referred to as a query, provides the DBMS with instructions to perform. However, the query does not describe directions for performing the instructions. Thus, the DBMS comprises a query processor for generating various query plans of execution and choosing a preferred plan. Due to the high-level nature of relational expressions and a variety of implementation techniques, automatic query optimization is possible and often necessary to ensure efficient query processing.
In accordance with well-known query translation processes, an SQL query is processed in stages. The initial stage casts the source query into an internal form such as the Query Graph Model (QGM) following the preliminary steps of lexing, parsing and semantic checking. The goal of the QGM is to provide a more powerful and conceptually more manageable representation of queries to reduce the complexity of query compilation and optimization. The internal QGM is a data structure for providing the semantic relationships of the query for use by query translator and optimizer components for rewriting the query in a canonical form. In a next phase, a plan optimizer produces a query execution plan such as by generating alternate plans and choosing a best plan based on estimated execution costs. A plan refinement stage may be employed to refine the optimum execution plan in accordance with run-time requirements.
Accessing a table of data for carrying out a query typically requires transferring all or part of the table to random access memory (RAM) of the computer. When information is needed from a plurality of tables, the tables may be joined by the DBMS. Join operations, referred to hereafter as joins, allow information to be obtained across tables in a meaningful way.
One such join is a nested loop join (NLJN). The NLJN selects one table as an outer table and the other as the inner table. When executing a nested-loop join, the outer table is scanned row by row. For each row in the outer table, the inner table is scanned looking for matching rows. A NLJN is efficient if one of the tables is small, or has been filtered such that it has only a few qualifying rows, and the other table has an index on the column that joins the tables. Typically, the smaller table is selected as the outer table because it is the fastest method of executing the NLJN.
FIG. 1a illustrates an exemplary first table T1 and a second table T2 respectively. The first table T1 comprises four (4) columns C1, C2, C3, and C4 and six (6) rows, R1, R2, R3, R4, R5, and R6. Similarly, the second table T2 comprises four (4) columns D1, D2, D3, and D4 and eight (8) rows, P1, P2, P3, P4, P5, P6, P7, and P8. A query 102 requires combining the rows of tables T1 and T2 as follows:Select * from T1, T2 where T1.C1=T2.D1 and T2.D2=7 and T2.D3=9   (1)
FIG. 1b illustrates exemplary table indices for the second table T2. A first index T2.idx.D1 represents the second table T2 indexed by column D1. A second index T2.idx.D2 represents the second table T2 indexed by column D2. A third index T2.idx.D3 represents the second table T2 indexed by column D3. Indices are often provided for facilitating access to a table in a predetermined order. Each index comprises a sorted column and a row identifier (RID) for identifying rows having a value indicated by the sorted column. In the present example, table T1 is small and table T2 has an index on the join column D1. Thus, table T1 is selected as the outer table and table T2 is selected as the inner table.
FIG. 1c illustrates an exemplary output table TO of the NLJN for the tables shown in FIG. 1a. For the NLJN, table T1 is selected as the outer and table T2 is selected as the inner table. Thus, for each row R1, R2, R3, R4, R5, R6 in table T1, every row P1, P2, P3, P4, P5, P6, P7, P8 in table T2 is scanned for a match on C1 and D1 respectively. Thus, for example, the value of C1 for the row R1 of table T1 is a ‘7’. Table T2 is scanned for determining which rows have a value of ‘7’ in column D1. As a result of the scan, rows P1, P3, and P7 are returned. Each of these rows P1, P3, P7 are evaluated for determining whether they satisfy the predicate ‘T2.D2=7 and T2.D3=9’. Only the row P7 satisfies the predicate, and thus it is joined with row R1 of table T1 and returned as an output as illustrated by row S1 of the output table TO.
However, evaluating the predicates for each join outer is both impractical and expensive. This is especially true since most database tables can be very large and processing such tables can be expensive in terms of computer resources. Therefore, it is important that techniques for joining tables be as efficient as possible. Accordingly, a solution that addresses, at least in part, this and other shortcomings is desired. The need for such a solution has heretofore remained unsatisfied.