A “query” is a statement or collection of statements that is used to access a database. Specialized query languages, such as the structured query language (“SQL”) are often used to interrogate and access a database. The query will contain the identity of the database object(s) being accessed to execute the query (e.g., one or more named database tables). If the query accesses two or more database objects, the query also specifies the link between the objects (e.g., a join condition or column). The typical query also defines selection criteria, which is often referred to as a matching condition, filter, or predicate. A query may define which fields in the database object are to be displayed or printed in the result. For purpose of illustration, this document will be explained with reference to SQL statements and relational database structures such as tables and rows. It is noted, however, that the inventive concepts disclosed herein are applicable to other types of languages, objects, structures, and operations in a database.
Optimization is the process of choosing an efficient way to execute a query. Many different ways are often available to execute a query, e.g., by varying the order or procedure in which database objects are accessed to execute the query. The exact execution plan or access path that is employed to execute the query can greatly affect how quickly or efficiently the query statement executes.
For example, consider a very large database table that is used to store every sales record for a company. If an index does not exist for the table, then a query on the table will necessarily need to perform a scan operation to scan each row of the table to execute the query. However, if an index exists for the table that is relevant to a given query, then it is possible that using the index will result in a more efficient way to specifically identify and access the relevant rows of the table that are needed to execute the query, especially if the selectivity of the query predicate lends itself to an index-based approach to optimize the query execution plan.
Conventional optimizers take a “one size fits all” approach to determine the appropriate access path for a query. In other words, for a table that is accessed by a query, the optimizer will determine a selected access path for the table, e.g., the execution plan for the query will use either an index or a sequential scan (also referred to as a table scan) to execute the query for the entire table.
The problem is that a first portion of the table being queried may actually be more efficiently queried using an access path that is different from the access path that would be efficient for a second portion of the same table. For example, consider a large database table that has been partitioned, where the contents of that large table has been decomposed into a number of smaller and more manageable portions or “partitions.” Assume that a first partition of the table is associated with an index but a second partition is not associated with an index, and further assume that a given query would be more efficiently executed against the first partition if the index is used as the access path in the execution plan. However, since the second partition does not correspond to an index, the “one size fits all” approach of conventional optimizers would necessarily select the sequential scan approach to execute the entirety of query, including the querying of the first partition (which is indexed).
The present invention provides an improved method, system, and computer program product that is suitable to address these issues with the conventional approaches. According to some embodiments, a hybrid approach is provided that allows different subsets of data accessed by a query to be optimized with different optimizer decisions, execution plans, and/or execution approaches. For example, embodiments of the present invention can be used to optimize a first subset of data with a different access path, join order, or join method than is used to optimize a second subset of data. Transformations may be performed to re-write the query, which restructures the query in a way that facilitates the hybrid optimization process. Multiple transformations may be interleaved to produce an efficient re-written query.
Further details of aspects, objects, and advantages of the invention are described below in the detailed description, drawings, and claims. Both the foregoing general description and the following detailed description are exemplary and explanatory, and are not intended to be limiting as to the scope of the invention.