1. Field of the Invention
The present invention relates to the performance of queries in database processing systems. More particularly, the invention concerns a method, article of manufacture, and apparatus to more efficiently evaluate and execute star/join type queries.
2. Description of the Related Art
Relational databases are organized into tables which consist of rows and columns of data. The rows are formally called tuples. A database will typically have many tables and each table will typically have multiple tuples and multiple columns. The tables are typically stored on random access storage devices (DASDs) such as magnetic or optical disk drives. Data may be retrieved from this type of database by a variety of methods. For example, a computer program can extract information from the database without human intervention or a user can interact with a query system program which serves as a front-end to the database.
"Accessing a table" is used in the art to mean reading information from a table. Since the tables are normally stored on DASD, accessing a table requires transferring all or part of the table from DASD into the random access memory (RAM) of the computer system. When information is needed from a plurality of tables, the tables may be joined by the database software or firmware. "Joins" allow additional information to be obtained across tables in a meaningful way. In a simple example, an Employee table lists an employee's department number as `76`, but the definition of department `76` requires reference to another table, the Department table, which lists the full department title associated with each department number. In the Department table, the row for department `76` also contains a column with the department title "Information Systems Department." Thus, a user desiring to generate a report containing a list of all employees including their department titles may want to establish a join relationship between the department number column in the Employee table and the department title column in the Department table, so that the employee's department can be printed in a symbolic form instead of the numerical form.
Ways of specifying and performing joins are the subject of substantial effort. Database tables can be very large and processing tables can be expensive in terms of computer resources. Therefore, it is important that methods for joining tables be efficient. In this respect, queries are often "optimized" to improve their efficiency of execution.
One type of query that is especially difficult to optimize and execute efficiently is the "star/join" query. FIG. 1 shows a graphical representation of a typical star/join query 100. The query 100 includes a fact or "base" table 102 and several dimension tables 104-107, where all tables are shown as "nodes". Arcs, such as the arc 110, interconnect the tables and represent join predicates involving columns of the linked tables. The query 100 may also be represented by query statements, as shown in Table 1, below.
TABLE 1 ______________________________________ select D1.C1, D2.C2, D3.C3, sum (F.M1), max (F.M2) from D1, D2, D3, F where D1.K1 = F.DIM1 and D1.A1 = "New Jersey" and D2.K2 = F.DIM2 and D2.A2 = "45" and D3.K3 = F.DIM3 and D3.A3 = "1995" ______________________________________
The typically huge size of the fact table is one reason that star/join queries are so difficult to execute efficiently. With this fact table, even a restrictive join predicate between fact and dimension tables result in unwieldy answer sets. A typical fact table may easily include one billion rows. Thus, with this fact table, even a restrictive join predicate that yields one-thousandth of the fact table's rows would still yield an answer set with one million rows.
Engineers and scientists have developed a number of different approaches to address the difficulties of star/join queries. One approach is the "Cartesian product" technique, which is employed by the DB2 database product of International Business Machines Corp. This technique applies the query to the dimension tables by taking their Cartesian product, and then applies this result to the fact table using composite indexes matching the join columns of the fact table. Query execution, though, can still be lengthy if the dimension table predicates are not sufficiently restrictive due to the large size of the Cartesian product of the dimension tables.
Another known approach to optimize star/join queries is the "bit map index" technique. This technique creates one bit map index for each fact table column that is joined with a dimension table column. Bit map index entries corresponding to qualifying dimension table join column values are OR-ed together within each dimension, and the results from each dimension are then AND-ed. Rows from the fact table identified by the final bit vector can then be accessed. This approach is described in O'Neil et al., Multi-Table Joins Through Bitmapped Join Indices, SIGMOD Record, Vol. 24, No. 3, September 1995.
Although query optimizing techniques such as these constitute a significant advance and enjoy widespread use today, International Business Machines Corp. has continually sought to improve the performance and efficiency of query optimizing techniques. Chiefly, improvement is still sought in the area of query optimization and execution speed.