The invention relates to query optimization in database systems, and more particularly, to join order optimization in relational database systems.
A database is a collection of information. A relational database is a database that is perceived by its users as a collection of tables. Each table arranges items and attributes of the items in rows and columns respectively. Each table row corresponds to an item (also referred to as a record or tuple), and each table column corresponds to an attribute of the item (referred to as a field or more correctly, as an attribute type or field type). The cardinality of a table T is the number of records it contains, and is denoted .vertline.T.vertline..
A "primary key" for a table is a simple or composite attribute that uniquely identifies records of the table. The key must be inherently unique, and not merely unique at a particular point in time. It is possible, but not typical, to have a table where the only unique identifier is the composite attribute consisting of all the attributes of the table. It is also possible, but not typical, for a table to have more than one unique identifier. In such a case, the table may be said to have multiple candidate keys, one of which could be chosen and designated as the primary key; the remaining candidates then would be said to be alternate keys. Primary and alternate keys must satisfy two time-independent properties. First, at no time may two records of the table have the same value for the key. Second, if the key is composite, then no component of the key can be eliminated without destroying the uniqueness property.
A "foreign key" is a possibly composite attribute of a table whose values are required to match those of the primary key of some table, which is typically but not necessarily distinct from the table for which the foreign key is defined. A foreign key value represents a reference to the record containing the matching primary key value, which may be called the referenced record or the target record. The table that contains the foreign key may be referred to as the referencing or foreign table and the table that contains the corresponding primary key, as the referenced or target or primary table.
Searching the data in a relational database is done by means of a query. A query contains one or more predicates which specify the information which the query should retrieve from the database. A join query is a query which requests information from more than one table. For example, in a database which stores telephone directory information in one table and employment information in another table, a join query might request the names of all people who live and work in the same city.
A join query must contain at least one "join predicate" which specifies the criteria which are used to select records from the two tables. A join query may also include one or more single-table predicates to select records from the individual tables (e.g., employees whose home phone numbers are in the 876 exchange). The join selectivity of a join query is the ratio of the number of matches which result from the join (i.e. the join cardinality) to the product of the cardinalities of the joined tables.
A primary key of a table P is denoted P.Pk. A foreign key of a table S with respect to some primary key is denoted S.Fk. A primary key--foreign key join is denoted a Pk-Fk join. A join of two keys which are both foreign keys with respect to the same primary key is denoted an Fk-Fk join.
A naive way of performing a join query is to examine every record in the second table for each record in the first table in order to determine whether any records satisfy the join predicate. Such records may be said to match. The database system may then construct an intermediate (or final) result containing the matching records joined together.
Performing multiple join queries on large databases can be time consuming and resource intensive. One way to make a join query with multiple joins more efficient is by optimizing the order in which tables are joined. A good join order can reduce the number of comparisons made or the size of intermediate results, thereby decreasing the resources expended and the total time necessary to perform the entire query. Obtaining a good join order, however, is generally difficult. Conventional join order optimizers rely heavily on estimates of the cardinalities of intermediate results. Estimating these cardinalities is also generally difficult. The methods conventionally used are approximate at best and can be fairly inaccurate in some cases. A more efficient and accurate method for estimating optimal join ordering is therefore desirable.