The present invention relates to databases, and more particularly to transparently caching and reusing database query execution plans.
Today""s object-oriented database environments are typically used as front-ends to more simplistic but efficient data models, such as flat files and relational databases. In a relational database, data is perceived by its users as a collection of tables. The tables in a relational database include a row of column names specifying one or more attribute fields, and zero or more data rows containing one scalar value for each of the column fields.
In operation, a client submits a semantic-rich query at the object-level to the database system. Because the object-oriented model is much more sophisticated and complicated than a relational model, it is usually more expensive to process a query in the object space. Therefore, the database system converts the object-oriented query into a relational query that queries the relational database. This is accomplished by generating an execution plan specifying what queries to run against the relational databases, and how to combine those results to compute the final result.
As an example, assume that a user has the following query: xe2x80x9cFind all employees having a salary greater than $100,000.xe2x80x9d This query, which is expressed in English, could be represented as the following object space query (depending on the computer language): xe2x80x9cSelect e from EmpClassMOHome e where e.salary greater than 100000xe2x80x9d. In the example, xe2x80x9cEmpClassMOHome,xe2x80x9d identifies a class of objects similar to a table in a relational space, and xe2x80x9cexe2x80x9d represents a specific instance in that class.
This query would then be converted into a corresponding relational query in the execution plan. Relational queries to a relational database are carried out through high-level commands or statements, such as SELECT, INSERT, UPDATE and DELETE, which are examples of statements from standard SQL (Structured Query Language). The object space query from the example above would be translated into a query execution plan containing the following query:
SELECT *
FROM Employee
WHERE salary greater than 100,000
The SELECT command specifies the desired attributes to be returned FROM a specified table WHERE some specified condition is true. In this example, a xe2x80x9c*xe2x80x9d in the SELECT clause means that all attributes from the xe2x80x9cEmployeexe2x80x9d table should be returned for all records meeting the condition in the WHERE clause. The WHERE clause includes one or more literals, each of which includes at least one attribute (e.g., salary), an operator (e.g.,  greater than ), and either a constant value (e.g., 100,000) or another attribute.
Unfortunately, generating an execution plan from an object query is CPU intensive and may take significantly longer time than even executing the plan against the relational database. Most time in generating the execution plan is spent parsing and analyzing the complex metadata that maps the object attributes to relations.
To minimize this overhead, some systems cache the queries and their corresponding plans. If a new user query matches a cached query, the cached execution plan is reused for the new query. There are generally two methods for caching queries and plans.
One method for caching queries is manual in that it requires the database user to explicitly request the database system to prepare a query (also generate and save the execution plan internally), and then return a handle to it. The user may then reuse that query several times by submitting the same handle to the database system each time the user desires to issue that same query.
Another method replaces all constant values in the query with host variables. An example is:
SELECT id
FROM Employee
WHERE name=:hostv1
The user then specifies a constant value for the host variable (e.g., :hostv1=xe2x80x9cVaderxe2x80x9d) and submits the query. The host-variable query can then be reused several times, each time specifying different constant values for host variables.
Although the manual and host-variable techniques cache and reuse certain queries, the caching and reuse of the execution plans is non-transparent to the user because the user is aware of, and is required to explicitly facilitate reuse of the execution plans. In addition, some systems that do not support host-variables are limited because they only reuse query execution plans when a new query string exactly matches a cached query string, including constant values. Thus, given the following to queries:
(1) SELECT id FROM Employee WHERE name=xe2x80x9cLukexe2x80x9d and
(2) SELECT id FROM Employee WHERE name=xe2x80x9cSkywalkerxe2x80x9d,
conventional methods would consider the two queries different and would spend the time to generate an entirely new execution plan for the second query.
Accordingly, what is needed is a system and method for transparently and efficiently caching and reusing database query execution plans. The present invention addresses such a need.
The present invention is a method and system for transparently caching and reusing database query execution plans. The method and system include caching a first query that contains a first constant. A first query execution plan is also generated for the query that represents the first constant with a parameter name. The method and system further include receiving a new query that contains second constant. The new query is compared with the first query and a match is determined to exist even when the second constant in the new query fails to match the first constant from the first query. Upon a match, the first query execution plan is reused by substituting the parameter name in the query execution plan with the second constant from the new query without user intervention, thereby avoiding generating a new query execution plan for the new query.
According to the present invention, a more flexible definition of a query match is provided, leading to more frequent reuse of execution plans and increase in system speed. Moreover, the reusable execution plan is obtained for a matching new query at a fraction of the cost of generating a new execution plan.