1. Field of the Invention
The present invention relates generally to computer database systems, and more particularly to systems and methods for executing multi-table joins in response to a query to a computer database for information.
2. Description of the Related Art
When a user requests data from a database by submitting a query to the database, it is often necessary that the database management system (dbms) that is associated with the database combine, i.e., join, data from two or more tables that contain data. For example, it might be desired to know the names and salaries of employees in a company, by department. Using the database query language known as "SQL", two tables could be created as follows:
"create table DEPT (DNO, NAME, BUDGET)", where DNO is an integer representing the department number, NAME is a variable character alpha-numeric name of the department, and BUDGET is a floating point variable representing the department's budget; and PA1 "create table EMP (DNO integer, NAME varchar, SALARY float)", where DNO is an individual employee's department number, NAME is the employee's name, and SALARY is the employee's salary. PA1 select DEPT.NAME // department name PA1 from DEPT, EMP // the two tables discussed above PA1 where DEPT.DNO=EMP.DNO//such that the employee department number equals a department number in the "department" table
Thus, in the above example, the relevant data is arranged in a "department" table and in an "employee" table. In the "department" table, each row has the above-listed three department entries, and in the "employee" table, each row has entries corresponding to a particular employee's department number, name, and salary. To return the employee names, individual salaries, and department names requires joining at least one column from the "department" table with at least one column from the "employee" table.
In the particular example discussed above, the department number entry from the "department" table is commonly referred to as a key (it defines a department), whereas the individual employees' department numbers are referred to as foreign keys (they define the respective departments in which the employees work). The columns DEPT.DNO and EMP.DNO establish a join key, and a row in the "department" table is joined to a row in the "employee" table only if the rows match a query-defined join predicate that in turn essentially defines the join key, that is, only if DEPT.DNO=EMP.DNO for that row. The above-described join can be undertaken to associate employees by department using the following join query syntax:
EMP.NAME // employee name PA2 EMP.SALARY // employee salary
The result of the above join is a third table in which employee names and salaries have been associated with department names by joining a row in the department table with a corresponding row in the employee table.
A method known to those skilled in the art for effecting the above-described join is referred to as "hash loops join". Using a hash loops join, one of the of two tables to be joined is designated a build table, and the other table is designated a probe table. Either table can be a base table, i.e., a table stored in the database, or a derived table, i.e., a table that is the result of a previous, intermediate join, or other query operations. In any case, if the entire build table can be stored in the main memory of the dbms computer, it is written into main memory. Each row is entered into a hash table on a hash of the join columns. The probe table is then hashed on its join columns and checked against the build table for matches on the join key. The hash is undertaken on the join key by mathematically operating on the value of the join columns at each of the rows of the build/probe table in accordance with a hash function to yield respective hash values.
Sometimes, however, the entire build table is too large to be written into the main memory of the dbms computer. Under these circumstances, as much of the build table that can fit into main memory is written into memory and hashed. The probe table is processed against the chunk of the build table in memory. After processing the chunk of the build table in memory, the next build table chunk is written into memory, hashed, and the probe table rows are again checked against the build table rows as summarized above, and so on until the entire build table has been processed and the probe phase completed. When the probe phase is complete, the table join has been executed in response to the query.
As recognized by the present invention, however, it is sometimes possible that, while not all rows in the probe table have a corresponding build table match and, thus, will not be output as part of the join, it might nevertheless be desirable to output the unmatched probe table rows. For example, if one or more employees in the employee table happen to be unassigned to a department, it might nevertheless be desired to output their names and salaries with an "unassigned" department name for completeness of a salary report. This operation of preserving rows of a table that do not join to or match rows in the other table is known as a left (or right) outer join. Hash loops join as described above does not support left (or right) outer join, particularly when the build table does not fit in memory in its entirety.
Moreover, the present invention recognizes that for some queries, it is not necessary to output a complete list of all matching table rows. Consequently, processing and outputting all matches under such circumstances unnecessarily prolongs query processing. As an example, suppose it is desired to know just the names of departments that have one or more employees. In this case, the name of each department having one or more employees need be joined and output only once. This operation, known as early-out join, is usefull in many circumstances to improve the efficiency of query processing, see U.S. Pat. No. 5,548,754, incorporated herein by reference. Unfortunately, conventional hash loops join does not support early-out query processing when the build table is too large for memory. The present invention, however, understands that it is possible to use hash loops join in combination with an early-out join and/or a left (or right) outer join.