In conventional relational database tables, rows are inserted into the table without regard to any type of ordering. Consequently, when a user submits a query that selects data from the table based on a particular value or range of values, the entire table has to be scanned to ensure that all rows that satisfy the criteria are identified. Partitioning is a technique that, in certain situations, avoids the need to search an entire table (or other database object).
With partitioning, an object, such as a database table, is divided up into sub-tables, referred to as “partitions”. The most common form of partitioning is referred to as range partitioning. With range partitioning, each individual partition corresponds to a particular range of values for one or more columns of the table. For example, one column of a table may store date values that fall within a particular year, and the table may be divided into twelve partitions, each of which corresponds to a month of that year. All rows that have a particular month in the date column would then be inserted into the partition that corresponds to that month. In this example, partitioning the table will increase the efficiency of processing queries that select rows based on the month contained in the date column. For example, if a particular query selected all rows where month equals January, then only the partition associated with the month of January would have to be scanned.
Typically, the criteria used to partition a database object is specified in the statement that creates the database object. For example, the following Structured Query Language (SQL) statement creates a table “sales” that is range partitioned based on date values contained in a column named “saledate”:
create table sales  (saledate DATE,  productid NUMBER, . . . )  partition by range (saledate)   partition sal94Q1 values less than to_date (yy-mm-dd, ‘94-04-01’)   partition sal94Q2 values less than to_date (yy-mm-dd, ‘94-07-01’)   partition sal94Q3 values less than to_date (yy-mm-dd, ‘94-10-01’)   partition sal94Q4 values less than to_date (yy-mm-dd, ‘95-01-01’)
Execution of this statement creates a table named “sales” that includes four partitions: sal94Q1, sal94Q2, sal94Q3, and sal94Q4. The partition named sal94Q1 includes all rows that have a date less than 94-04-01 in their saledate column. The partition named sal94Q2 includes all rows that have a date greater than or equal to 94-04-01 but less than 94-07-01 in their saledate column. The partition named sal94Q3 includes all rows that have a date greater than or equal to 94-07-01 but less than 94-10-01 in their saledate column. The partition named sal94Q4 includes all rows that have a date greater than or equal to 94-10-01 but less than 95-01-01 in their saledate column.
When a database server receives a request to perform an operation, the database server makes a plan of how to execute the query. If the operation involves accessing a partitioned object, part of making the plan involves determining which partitions of the partitioned object, if any, can be excluded from the plan (i.e. which partitions need not be accessed to execute the query). The process of excluding partitions from the execution plan of a query that accesses a partitioned object is referred to as “partition pruning”.
Unfortunately, conventional pruning techniques can only be applied to a limited set of statements. For example, the database server can perform partition pruning when the statement received by the database server explicitly limits itself to a partition or set of partitions. Thus, the database server can exclude from the execution plan of the statement “select * from sales PARTITION(sal94Q1)” all partitions of the sales table other than the sal94Q1 partition.
The database server can also perform partition pruning on statements that do not explicitly limit themselves to particular partitions, but which select data based on the same criteria that was used to partition the partitioned object. For example, the statement:
select * from sales where saledate between (94-04-01) and (94-06-15)
does not explicitly limit itself to particular partitions. However, because the statement limits itself based on the same criteria (saledate values) as was used to partition the sales table, the database server is able to determine, based on the selection criteria of the statement and the partition definitions of the table, which partitions need not be accessed during execution of the statement. In the present example, the database server would be able to perform partition pruning that limits the execution plan of the statement to sal94Q2.
Similarly, database servers can perform partition pruning for queries with WHERE clauses that (1) specify equalities that involve the partition key (e.g. where saledate=94-02-05), (2) include IN lists that specify partition key values (e.g. where saledate IN (94-02-05, 94-03-06)), and (3) include IN subqueries that involve the partition key (e.g. where salesdate in (select datevalue from T)).
Another form of partitioning is referred to as hash partitioning. According to hash partitioning, one or more values from each record are applied to a hash function to produce a hash value. A separate partition is established for each possible hash value produced by the hash function, and rows that hash to a particular value are stored within the partition that is associated with that hash value. Similar to range based partitioning, hash partitioning increases the efficiency of processing certain types of queries. For example, when a query selects all rows that contain a particular value in the column that is used to perform the hash partitioning, the database server can apply the value in the query to the hash function to produce a hash value, and then limit the scan of the table to the partition that corresponds to the hash value thus produced.
A table that is hash partitioned into four partitions may be created by the following statement:
create table sales(saledate DATE,productid NUMBER, ...)partition by hash (saledate)partitions 4;
Similar to range partitions, hash partitions may be used for queries with WHERE clauses that (1) specify equalities that involve the partition key, (2) include IN lists that specify partition key values, and (3) include IN subqueries that involve the partition key. However, unlike range-based partitioning, partition pruning cannot be performed for statements with predicates that specify ranges of partition key values. Consequently, hash-based partitioning is often used when the nature of the partition key is such that range-based queries are unlikely, such as when the partition key is “social security number”, “area code” or “zip code”.
Another form of partitioning is referred to as list-based partitioning. Similar to hash-based partitioning, list-based partitioning can be used with data values that are discrete and cannot be logically ordered. List-based partitioning gives users precise control over which data belongs in each partition. For each partition, the user can specify a list of key values that map to that partition. The set of key values that are mapped to any given partition are referred to herein as the “key value groups” for that partition.
List partitioning often complements the functionality of range partitioning when used in data warehouses. Range partitioning is useful for segmenting a table along a continuous domain, such as time. Tables in data warehouses are often range-partitioned by time, so that each range partition contains the data for a given range of time values such as one partition per month or per week. In contrast, list-based partitioning is useful for segmenting a table along a discrete domain, such as products. Each partition in a list-partitioned table contains data matching a list of discrete partition-key values.
For example, assume that a data warehouse contains data for many different countries. The user may choose to list partition a large table by the regions europe, north-america, south-america and asia:
create table sales_history (...)partition by list (country)(partition europe values (′United Kingdom′, ′Germany′, ′France′),partition north_america values (′United States′, ′Canada′, ′Mexico′),partition south_america values (′Brazil′, ′Argentina′),partition asia values (′Japan′, ′Korea′)Rows for any of the countries listed above will automatically be placed in the appropriate regional partitions.
A user may use the partitioning scheme above if data is accessed or modified according to region or country. Regional partitions enable data centers to take advantage of the different time zones of different regions. For example, Europe's night hours overlap American business hours, so the European partition can be maintained while American partitions remain in full operation.
Due to the benefits that result from partition pruning, it is clearly desirable to provide techniques for performing partition pruning for a wider variety of statements.