1. Field of the Invention
The present invention relates to the process for optimizing the execution of a database query. More specifically, the present invention relates to a method and an apparatus for optimizing the execution of a database query that uses the partitioning schema of a partitioned database-object to select a subset of partitions in a partitioned table.
2. Related Art
Today, many companies are storing company-wide data in large centralized databases, often called data warehouses, so that they can leverage sophisticated analytical tools to process the data to glean insights that will give them a competitive edge in the marketplace. The increasing popularity of data warehousing and other similar applications that require large databases has resulted in an explosive growth in database sizes, which has created a strong demand for technologies that can improve the manageability and performance of large databases.
Partitioning is one such key technology for building and managing large databases. In partitioning, a database-object is subdivided into smaller units, called partitions, which enables the database administrator to simplify the management of large databases. For example, partitioning can be used to support a “rolling window” load process, in which, each week's sales data can be loaded by simply adding a partition to the database. Adding a partition to the database is much more efficient and easier to manage than loading the data into a non-partitioned table. Moreover, partitioning can also be used to improve the database performance by limiting the amount of data that needs to be examined, and by enabling parallel execution of queries on multiple partitions.
In a partitioned database, it is useful, and sometimes necessary, to access a set of partitions. For example, if we want to compute statistics (e.g., number of distinct keys, number of leaf blocks, etc.) for a given index partition, then we need to access data only from that index partition. Moreover, it is very convenient to use the same construct to access data from a set of partitions, regardless of the type of the partitioned database-object, or the partitioning technique, or the partition definitions.
Typically, database user-interfaces, e.g., standard SQL, do not have constructs to access a set of partitions. Note that, while it is possible to generate a query in standard SQL to access a set of partitions of a range-partitioned or list-partitioned database-object, the query is dependent on the type of the partitioned database-object, the partitioning technique, and the partition definitions. Moreover, it is impossible to access a set of partitions of a hash-partitioned database-object by using standard SQL syntax.
Incidentally, some databases implement a partition-mapping function that uses the partitioning schema of a partitioned database-object and a list of values (usually columns from a partitioned table) to map rows to partitions of the partitioned database-object. The partition-mapping function can be used to identify a partition for a given set of column values.
Furthermore, some databases expose the partition-mapping function at the user level by extending standard SQL. When used in a predicate, this form of the partition-mapping function allows the user to specify a set of partitions. An example of a database query that has a predicate that uses the partition-mapping function is shown belowselect * from X where OPT(Y, X.C1, X.C2, X.C3)=N.In this query, “OPT(Y, X.C1, X.C2, X.C3)=N” is the predicate, wherein “OPT” is the partition-mapping function, “X” is a table, “Y” is the partitioned database-object, and “X.C1, X.C2, X.C3” is a list of columns in the table. In this example, if X is partitioned and the partitioning schema of X is identical to the partitioning schema of Y, then the predicate specifies a set of partitions containing only one partition: the Nth partition. Note that, this method for specifying a set of partitions works regardless of the type of the partitioned database-object, or the partitioning technique, or the partition definitions. Moreover, note that, this method can be used for specifying a set of partitions for a hash-partitioned database-object.
The problem with executing a database query that has a predicate that uses a partition-mapping function is that it results in poor performance, because, regardless of the set of partitions that the user wants to access, the system will have to scan through all of the partitions in the partitioned database-object and use the predicate in the query to filter out the rows that don't belong to the given set of partitions.
Thus, what is needed is a method and apparatus to execute a database query that has a predicate that uses a partition-mapping function without having to scan through all of the partitions in a partitioned database-object.