The present invention relates generally to database systems, and more particularly to methods and apparatus for the processing of queries having join predicates by database systems.
A database is a collection of information. A relational database is typically illustrated as one or more two-dimensional tables. Each table arranges items and attributes of the items in rows and columns. Each table row corresponds to an item (referred to as a record), and each table column corresponds to an attribute of the item (referred to as a field). In a relational database a collection of tables can be related to each other through a common attribute or "key". The common key enables information in one table to be automatically cross-referenced to corresponding information in another table.
A complex search may be performed on a database with a "query". A query includes one or more predicates to specify the information for a database system to retrieve from the database. A join query is a query which requests information from more than one table. For example, in a database which stores customer information in one table and credit card transactions in another table, a join query may request the transactions in which customers made a purchase in the same state as their residence. A join query must include at least one join predicate to specify the criteria to select records from the two tables (e.g., that the state of residence of the customer be the same as the state in which the transaction occurred). A join query may also include one or more single-table predicates to select records from the individual tables.
To perform a join query, a conventional database system examines every record in the second table for each record in the first table to determine whether any records satisfy the join predicate. Such records may be said to "match." The database system then constructs a query table from the matching records.
In many circumstances, conventional database operations may be unacceptably slow when performing a join query. Several techniques have been developed to reduce the time required to process join queries. One technique to improve the performance of a join query is to reduce the amount of data searched by the database system. In particular, records that cannot satisfy the join predicate should be eliminated from both tables of the query.
An example of such a technique may be illustrated with the following query: ##EQU1##
This query attempts to find all records that satisfy the join predicate R.r=S.s.
Since records from Table R must also satisfy the predicate R.r&lt;10 and since R.r must equal S.s, by applying the algebraic rule of transitivity, the database system can determine that matching records of Table S must also satisfy the condition S.s&lt;10.
After deducing the predicate on S, the database system may apply the predicate to eliminate records from the scan of S that cannot possibly contribute to the result of the join.
Fragmentation is another technique used to increase the processing speed of queries. Fragmentation breaks a table into horizontal sections called fragments. After fragmentation, the table consists of a collection of fragments.
A fragment contains a set of records. The criteria for storing a record in a fragment is defined by the user and is known as the `fragmentation scheme.` It is possible for a fragment to be empty if none of the records satisfy the criteria.
A `scan` is the process of reading a fragment of a table. Fragments may be stored independently on separate disks or on separate nodes in a cluster or network architecture. Logically, all fragments may be scanned simultaneously, thereby increasing the overall rate at which the complete table can be read. By definition, to read more than one fragment simultaneously, the database system must use multiple scan operators.
`Fragment elimination` is a process by which the database system can identify fragments from a table that cannot participate in the result of the query and remove those fragments from consideration. An example of fragment elimination may be illustrated with the following query: EQU SELECT*FROM R where R.month&gt;"September" [Example 2]
Assume that Table R has 12 fragments--one for each month of the year--and that the R.month column identifies the fragment for each record in Table R. For example, fragment #1 contains all records whose R.month value is "January", fragment #2 contains all records whose R.month value is "February," and so on.
Using the query in Example 2 and the fragmentation scheme, the database system is able to eliminate all but three fragments, namely fragments #10-#12, corresponding to records whose R.month value is "October", "November" or "December". The eliminated fragments #1-#9 cannot possibly return records satisfying the query and need not be scanned. Thus, fragment elimination allows the database system to scan three fragments instead of all twelve.