When a relational database has very large tables consisting of potentially millions of rows, for example, it is desirable to divide the tables into subtables ("partitions") of a more manageable size. The operation of dividing a table into partitions is typically accomplished with a partitioning key. A partitioning key is a key that is used to determine to which partition a particular record (row) belongs. The partitioning key can be defined in terms of one or more attributes (columns) of the table. The resulting partitioned table consists of several partitions, each containing the same columns as the partitioned table. However, each partition contains only a subset of the rows of the partitioned table.
Partitioning a table has a positive effect on query processing. By using information regarding the partitioning scheme of a table, the amount of time required to execute a query that accesses the table may be reduced. As will be illustrated further with reference to FIG. 1, partitioning allows a query to be processed with a partial table scan rather than a full table scan if the query involves a predicate containing the partitioning key. If the query involves the partitioning key, the number of partitions that need to be searched may be reduced prior to executing the query. For example, the query optimizer can generate a query plan that excludes partitions that cannot possibly contain rows that satisfy the user specified conditions.
FIG. 1 is a simple table with four columns: order #, ship date, receive date, and items. The table is partitioned by the ship date column. More specifically, the table is partitioned by the year of the ship date. Therefore, records having the same ship date year are stored in the same partition. The table contains n partitions, partition 1 contains records having a ship date year of 1968, partition 2 contains records with a ship date year of 1969, partition 3 contains records with a ship date year of 1970, partitions 4 through n-1 (not shown) contain ship date years 1971 through 1995, and partition n contains records with a ship date year of 1996.
To illustrate how partitioning can be used to reduce query execution time, assume the user has requested to see a list of records with fifty or more items and shipped in the year 1969. In this example, the first partition can be skipped altogether because it contains records associated with ship dates occurring in the year 1968. Partitions 3 through n can also be skipped because none contain records having a ship date year of 1969. The query plan generated by the query optimizer accordingly can be limited to a search of partition 2.
As illustrated above, partitioning is useful for reducing the amount of time required to execute a query. However, the advantages of this technique can only be enjoyed when the query contains a predicate directly involving the columns in the partitioning key. Referring again to FIG. 1, if the user had requested a list of records having 50 or more items and a receive date in 1970, then the query optimizer would be unable to eliminate any of the partitions from the query plan. As a result, all of the partitions would have to be searched. Therefore, in this example, partitioning has provided no benefit at all. This example illustrates the limitations inherent in this partitioning method. The value of partitioning to query optimization is limited to situations where the query uses the partitioning key directly.
It is desirable, therefore, to provide reduced query execution time even when the partitioning key is not directly involved in the query. It is also desirable to maintain and recognize per-partition predicates on something other than the partitioning key that can be used by the query optimizer to treat individual partitions differently. When values in a set of one or more columns are correlated to the columns of the partitioning key, it is desirable to take advantage of this correlation to optimize queries.