1. Field of the Invention
The present invention is related generally to optimizing database query execution and, in particular, to optimizing database query execution using query parallelism and a data partitioned secondary index.
2. Description of the Related Art
In a relational data model, a table has rows made up of columns of data, and the table is created within a tablespace. If the tablespace is partitioned, the table is partitioned across the partitions of the tablespace. A tablespace may be stored in one or more physical partitions on one or more storage devices, and a single table partition is stored on a single physical partition. A physical partition is a fixed size division of storage. Statement (1) is an example of a SQL statement that may be used to create a tablespace named “TS1” with three partitions (represented by “NUMPARTS 3” in the SQL statement), referred to as partition 1, partition 2, and partition 3 for this example.CREATE TABLESPACE TS1 NUMPARTS 3;  (1)
Statement (2) is an example of a SQL statement that may be used to create a table named “Q1” with at least four columns, DATE, CUSTOMER_NUMBER, STATE, AND PURCHASE_AMOUNT. The ellipses in statement (2) indicate that additional columns may be included. Table “Q1” is created in tablespace “TS1”, with a partitioning key on the DATE column. The partitioning key on the DATE column indicates that dates up to and including ‘2002-01-31’ are in partition 1, dates greater than ‘2002-01-31’ up to and including ‘2002-02-28’ are in partition 2, and dates greater than ‘2002-02-28’ up to and including ‘2002-03-31’ are in partition 3. In other words, partition 1 is for January 2002, partition 2 is for February 2002, and partition 3 is for March 2002. In particular, if a DATE column value falls in January 2002 for a row, then that row is stored in partition 1 of the tablespace. If a DATE column value falls in February 2002 for a row, then that row is stored in partition 2 of the tablespace. If a DATE column value falls in March 2002 for a row, then that row is stored in partition three of the tablespace.
CREATE TABLE Q1 (DATE, ...,(2)CUSTOMER_NUMBER, ...,STATE, ...,PURCHASE_AMOUNT...,)IN TS1PARTITIONING KEY (DATE)(PART 1 VALUES (‘2002-01-31’),PART 2 VALUES (‘2002-02-28’),PART 3 VALUES (‘2002-03-31’));
A partitioning index is used to direct the placement of records into physical partitions based on index key values. An index key is a set of one or more columns in a table used to determine the order of index entries. Typically, one partitioning index is associated with one tablespace, which may be stored across multiple physical partitions. Statement (3) is an example of a SQL statement that may be used to create a partitioning index on table “Q1” with a key on the DATE column.
CREATE INDEX DATE_IX ON Q1 (DATE)(3)PARTITIONED CLUSTER;
A database query may be submitted against one or more tables. The database query includes one or more predicates for selection of data. A predicate is an element of a search condition that expresses or implies a comparison operation. For example, for a table storing employee data and named “emp”, the predicate (emp.lastname=“Smith”) is used to determine whether a last name stored in an employee table matches “Smith”. Additionally, an index may be used to access data. An index is set of pointers that are logically ordered by the values of a key. Indexes provide quick access to data and can enforce uniqueness on the rows in the table. An index has an index key. For example, for the “emp” table, an index may have a key of last name. If a predicate in the database query references a column of a table that is also a key column of an index, the index may be used to access the data associated with rows of data having the requested column value.
One type of index is a data partitioned secondary index (DPSI), which is a physically partitioned secondary index. That is, the data partitioned secondary index is itself partitioned. Each data partitioned secondary index partition is associated with a physical partition of a partitioned table. The data partitioned secondary index is created on one or more columns of a table. Unlike a partitioning index, a given key value may occur in multiple partitions of the data partitioned secondary index. Statement (4) is an example of a SQL statement that may be used to create a data partitioned secondary index on table “Q1” with a key on the STATE column.CREATE INDEX STATE_IX ON Q1 (STATE) PARTITIONED;  (4)
Statement (5) is an example of a SQL query against table “Q1” in which a predicate (WHERE DATE BETWEEN ‘2002-01-01’ AND ‘2002-02-28’ AND STATE=‘CA’) references a key of the partitioning index (via reference to DATE) and references a key of the data partitioning secondary index (via reference to STATE).
SELECT CUSTOMER_NUMBER, PURCHASE_AMOUNT(5)FROM Q1WHERE DATE BETWEEN ‘2002-01-01’ AND ‘2002-02-28’ AND   STATE = ‘CA’ORDER BY CUSTOMER_NUMBER;
One relational data model is a Relational DataBase Management System (RDBMS) using a Structured Query Language (SQL) interface. In certain RDBMSs, query parallelism may be enabled when the RDBMS is set up. Query parallelism refers to the use of parallel tasks to execute a database query. Query parallelism is accomplished by triggering multiple concurrent execution requests within a single database query. The term “execution requests” refers to both input/output (“I/O”) requests and central processing unit (“CPU”) processing requests.
Existing techniques for index access query parallelism involve key range partitioning. That is, the index includes one or more key columns whose values identify one or more rows. For example, for a table storing employee data, there may be a last name column. Then, the index for the table may include a last name key column. The index may be partitioned according to key ranges. For example, one key range may include values for the last name key column that start with A–M, while a second key range may include values for the last name key column that start with N–Z. Thus, “key range partitioning” refers to assigning different key ranges of an index to different parallel tasks. Each of the parallel tasks then executes the query for its assigned key range of the index, and the results of the parallel tasks are combined to form a single result from execution of the database query.
The techniques for index access query parallelism involving key range partitioning have disadvantages when extended to data partitioned secondary index access. With a data partitioned secondary index with a key of last name, last names beginning with any letter (i.e., A–Z) may be found in any or all physical partitions. Using the A–M and N–Z key range partitioning, each parallel task may need to search all of the data partitioned secondary index partitions, leading to possible I/O contention at the index level. That is, if multiple parallel tasks try to access the same data partitioned secondary index partition, contention may arise in the event that an I/O subsystem that manages access to the data partitioned secondary index partition cannot satisfy all requests to access the data partitioned secondary index partition concurrently. Additionally, there may be I/O contention at the data page level due to multiple parallel tasks attempting to access the same data page concurrently. This leads to increased elapsed time (i.e., the time from the start of execution of a database query to the time results of the database query are returned).
Therefore, there is a need in the art for improved database query execution.