1. Field of the Invention
The present invention relates generally to relational database systems, and more particularly to systems and methods for undertaking star joins of data tables in relational databases.
2. Description of the Related Art
Relational database systems store large amounts of data, including business data that can be analyzed to support business decisions. For example, a relational database system can be used to store sales data including information on sales by time period, product, product line, geographic area, and so on, which can be usefully presented in response to a query to support a particular business decision. It will readily be appreciated that data can be presented in numerous ways to support a decision, depending on the particular decision (and, hence, query) being made.
To respond to a user query, relational database systems often undertake table joins, i.e., operations that relate data in two or more tables. As an example, suppose it is desired to know about sales of a particular product in a particular region during a particular quarter. To respond to such a query, the exemplary relational database system mentioned above could create four tables, namely, a PRODUCT table identifying products by name and identification, a REGION table identifying sales regions by name and identification, a TIME table defining fiscal quarters, and a relatively large SALES table identifying all sales by product identification, region identification, and quarter. To respond to the query, these four tables could be joined using a "star join", so called because of the appearance of the graphical representation used to depict the join.
More specifically, in a star join, a fact table is represented at being at the center node of a star, and lines (referred to as "edges") to so-called dimension tables that are represented by nodes in a graph radiate outwardly from the fact table. Usually, the fact table is an order of magnitude larger or more than the dimension tables, and the edges between the fact table and dimension tables represent the join predicates, i.e., the desired relationships between the tables as defined by the query. It further happens that a dimension can itself include multiple tables, with such a multi-table dimension being referred to as a "snowflake" because of the snowflake-like appearance of its graph. To simplify discussion, every dimension can be regarded as a snowflake, because single table dimensions are essentially degenerate snowflakes with one node. Accordingly, "snowflake" is synonymous with "dimension". Each snowflake has a root which corresponds to the snowflake node that is connected to the fact table F. In the above example, the SALES table would be the fact table of a star join, and the PRODUCT, REGION, and TIME tables would be the snowflakes.
As recognized herein, star joins typically process large amounts of data, and consequently special execution plans based on bitmap ANDing might be invoked by the query optimizer of a relational database system to improve the performance of the system by minimizing the number of fact table pages that must be read from disk. In bitmap ANDing, each bit corresponds to a fact table row. Continuing with the above example to illustrate, the PRODUCT table is accessed by the desired product name to determine the desired product identification, which is then used to probe a SALES table index on product identification. As intended by the present invention, the particular index used can be a so-called "B-tree", or bitmap index, or other appropriate index.
In any case, the index probes are used to produce a bitmap, wherein each bit corresponds to a row in the SALES table. Rows in the SALES table for sales of the desired product have their corresponding bits in the bitmap set to "1", while the remainder of the rows would have their bits set to "0". Similarly, second and third bitmaps are produced using the REGION and TIME tables, and then the three bitmaps are ANDED together, with the desired result--rows having all three corresponding bits in the three bitmaps equal to one--being fetched from the SALES table. While the present disclosure focusses on bitmaps for ease of disclosure, it is to be understood that the principles set forth herein can apply to bloom filters as described in the present assignee's U.S. patent application Ser. No. 08/919,739 now U.S. Pat. No. 5,960,428, incorporated herein by reference.
In light of the above, it may now be appreciated that efficiently and effectively identifying and executing star joins pursuant to a query can be an important tool for supporting decision making. As further recognized herein, however, a star join can be embedded within a complex query, and it may not always be obvious which parts of a query qualify as "snowflakes" and indeed which table should be regarded as the fact table. For example, U.S. Pat. No. 5,848,408 is directed to how to process snowflakes under certain limited conditions, but not how to find or identify snowflakes. Nonetheless, for the reasons set forth above a query optimizer must be able to identify not only which part of a query qualifies as a star join, but it must also break the star join portion of the query into a form suitable for bitmap ANDing.
The above difficulty in knowing what parts of a query qualify for bitmap ANDing and how to render an appropriate star join is particularly acute when predicates between tables in different snowflakes (or within a snowflake) cause cycles in the join graph. A cycle can be thought of as a closed polygon in the star join graph that is established by edges representing predicates between not only the fact table and tables in the snowflakes, but also between tables in different snowflakes. Such cycles can be introduced into the query either directly or by the addition of transitive predicates added by the query optimizer. In any case, the present invention recognizes the above-noted problems and provides the solutions disclosed herein.