The present invention relates to computer systems and, more particularly, to techniques for performing joins between objects within computer systems.
In conventional relational database tables, rows are inserted into the table without regard to any type of ordering. Consequently, when a user submits a query that selects data from the table based on a particular value or range of values, the entire table has to be scanned to ensure that all rows that satisfy the criteria are identified. Partitioning is a technique that, in certain situations, avoids the need to search an entire table (or other database object).
With partitioning, an object, such as a database table, is divided up into sub-tables, referred to as xe2x80x9cpartitionsxe2x80x9d. The most common form of partitioning is referred to range partitioning. With range partitioning, each individual partition corresponds to a particular range of values for one or more columns of the table. For example, one column of a table may store date values that fall within a particular year, and the table may be divided into twelve partitions, each of which corresponds to a month of that year. All rows that have a particular month in the date column would then be inserted into the partition that corresponds to that month. In this example, partitioning the table will increase the efficiency of processing queries that select rows based on the month contained in the date column. For example, if a particular query selected all rows where months equals January, then only the partition associated with the month of January would have to be scanned.
Typically, the criteria used to partition a database object is specified in the statement that creates the database object. For example, the following Structured Query Language SQL) statement creates a table xe2x80x9csalesxe2x80x9d that is range partitioned based on date values contained in a column named xe2x80x9csaledatexe2x80x9d:
create table sales
(saledate DATE,
productid NUMBER, . . . )
partition by range (saledate)
partition sal94Q1 values less than to_ate (yy-mm-dd, xe2x80x9894-04-01xe2x80x99)
partition sal94Q2 values less than to_ate (yy-mm-dd, xe2x80x9894-07-01xe2x80x99)
partition sal94Q3 values less than to_ate (yy-mm-dd, xe2x80x9894-10-01xe2x80x99)
partition sal94Q4 values less than to_ate (yy-mm-dd, xe2x80x9895-01-01xe2x80x99)
Execution of this statement creates a table named xe2x80x9csalesxe2x80x9d that includes four partitions: sal94Q1, sal94Q2, sal94Q3, and sal94Q4. The partition named sal94Q1 includes all rows that have a date less than 94-04-01 in their saledate column. The partition named sal94Q2 includes all rows that have a date greater than or equal to 94-04-01 but less than 94-07-01 in their saledate column. The partition named sal94Q3 includes all rows that have a date greater than or equal to 94-07-01 but less than 94-10-01 in their saledate column. The partition named sal94Q4 includes all rows that have a date greater than or equal to 94-10-01 but less than 95-01-01 in their saledate column.
When a database server receives a request to perform an operation, the database server makes a plan of how to execute the query. If the operation involves accessing a partitioned object, part of making the plan involves determining which partitions of the partitioned object, if any, can be excluded from the plan (i.e. which partitions need not be accessed to execute the query). The process of excluding partitions from the execution plan of a query that accesses a partitioned object is referred to as xe2x80x9cpartition pruningxe2x80x9d.
Unfortunately, conventional pruning techniques can only be applied to a limited set of statements. For example, the database server can perform partition pruning when the statement received by the database server explicitly limits itself to a partition or set of partitions. Thus, the database server can exclude from the execution plan of the statement xe2x80x9cselect * from sales PARTITION(sal94Q1)xe2x80x9d all partitions of the sales table other than the sal94Q1 partition.
The database server can also perform partition pruning on statements that do not explicitly limit themselves to particular partitions, but which select data based on the same criteria that was used to partition the partitioned object. For example, the statement:
select * from sales where saledate between (94-04-01) and (94-07-01) does not explicitly limit itself to particular partitions. However, because the statement limits itself based on the same criteria (saledate values) as was used to partition the sales table, the database server is able to determine, based on the selection criteria of the statement and the partition definitions of the table, which partitions need not be accessed during execution of the statement. In the present example, the database server would be able to perform partition pruning that limits the execution plan of the statement to sal94Q2.
Similarly, database servers can perform partition pruning for queries with WHERE clauses that (1) specify equalities that involve the partition key (e.g. where saledate=94-02-05), (2) include IN lists that specify partition key values (e.g. where saledate IN (94-02-05, 94-03-06)), and (3) include IN subqueries that involve the partition key (e.g. where salesdate in (select datevalue from T)).
Another form of partitioning is referred to as hash partitioning. According to hash partitioning, one or more values from each record are applied to a hash function to produce a hash value. A separate partition is established for each possible hash value produced by the hash function, and rows that hash to a particular value are stored within the partition that is associated with that hash value. Similar to range based partitioning, hash partitioning increases the efficiency of processing certain types of queries. For example, when a query selects all rows that contain a particular value in the column that is used to perform the hash partitioning, the database server can apply the value in the query to the hash function to produce a hash value, and then limit the scan of the table to the partition that corresponds to the hash value thus produced.
A table that is hash partitioned into four partitions may be created by the following statement:
create table sales
(saledate DATE,
productid NUMBER, . . . )
partition by hash (saledate)
partitions 4;
Similar to range partitions, hash partitions may be used for queries with WHERE clauses that (1) specify equalities that involve the partition key, (2) include IN lists that specify partition key values, and (3) include IN subqueries that involve the partition key. However, unlike range-based partitioning, partition pruning cannot be performed for statements with predicates that specify ranges of partition key values. Consequently, hash-based partitioning is often used when the nature of the partition key is such that range-based queries are unlikely, such as when the partition key is xe2x80x9csocial security numberxe2x80x9d, xe2x80x9carea codexe2x80x9d or xe2x80x9czip codexe2x80x9d.
Due to the benefits that result from partition pruning, it is clearly desirable to provide techniques for performing partition pruning for a wider variety of statements.
Techniques are provided to expand the concept of partitioning in variety of ways. For example, both hash partitioning and range partitioning can be characterized as single-dimension partitioning because they use a single criteria to divide up the partitioned objects. One aspect of the invention is to perform multiple-dimension partitioning. In multiple-dimension partitioning, a database object is divided into partitions based on one criteria, and each of those resulting partitions is divided into sub-partitions based on a second criteria. The process of partitioning partitions based on different criteria may be repeated across any number of dimensions. In addition, entirely different partitioning techniques may be used for each level of partitioning. For example, database objects may be partitioned across one dimension using range-based partitioning, and each of those range-based partitions may be partitioned across another dimension using hash based partitioning techniques.
Another aspect of this invention relates to how the database server takes advantage of partitions when processing queries that include joins. In particular, techniques are provided for performing a full parallel partition-wise join, and a partial parallel partition-wise join. In a partial parallel partition-wise join, one of the join tables is statically partitioned on the join key and another join table is dynamically partitioned in a way that corresponds to the partitioning criteria of the statically partitioned table. In a full parallel partition-wise join, both of the tables involved in the join have already been statically partitioned on the join key based on the same criteria. The join operation is performed in parallel in a way that takes advantage of that static partitioning.