A database is generally a collection of related records of data, each record containing one or more fields of data. The data structure is combined with a set of operations to search the data, sort the data, add/modify/delete the data, and other data management and manipulation functions. The search function of a database is often implemented through a tool (e.g., a user interface or application program interface) that permits construction of a query to access the database. In relational databases, the data is usually represented as one or more tables of such records. The tables are related and/or indexed by identified fields (e.g., by “keys” or “key fields”). Tables may be joined with other tables—typically through use of related fields in records in each table. A join of two or more tables also represents a view of the joined information. Records may be selected from one or more tables or views based on values of particular fields of records of the tables (e.g., by key field values or non-key field values). Such a selection operation also represents a view of the records so selected. In other words, operations such as select, project, join, and group may be performed on one or more tables and/or views. Expressions may be formed from such operations on one or more tables and/or views.
Structured query language (“SQL”) has evolved in the industry as a widely accepted, de-facto standard for expressing queries against data in relational databases. SQL expresses a database query as a number of expressions applying operations to one or more tables and/or views of the data in the database. Some expressions may determine which records are to be selected from one or more tables and/or views. Some expressions may represent join operations to be performed among a plurality of tables and/or views of the database. Still other expressions may define how the selected records are to be grouped in a results table generated by executing the SQL query. The rich set of features in SQL is generally well known to those of ordinary skill in the art.
A query expressed in SQL is generally compiled from a first syntactic form into an internal representation of the various expressions and operations to be performed within those expressions. Each expression may represent one or more operations to be performed on one or more table and/or view in the database. The query so compiled into an internal form is often referred to as a query plan. The query plan is essentially an ordered set of steps to evaluate the various expressions of the query.
In large database applications, elapsed time to execute a query (e.g., to perform the steps of a query plan) can be critical. For example, large databases storing consumer product/account information required for customer service needs to be accessed quickly enough to satisfy customer expectations for rapid service and response. To speed execution of a query, it is common in SQL query processing systems that a query plan is analyzed by a query optimizer to look for enhancements that may speed execution of the query plan. Present query optimization techniques attempt to analyze the query to determine an optimal order for executing the query (e.g., order of executing the steps of the query plan). By analyzing the order in which expressions are evaluated and potentially re-ordering the expression evaluation execution, significant improvements in query execution elapsed time may be achieved. In particular, a query optimizer may analyze the expressions of a query to identify common expressions used multiple times within a complex query. The expressions in the complex query may then be evaluated in an order that optimizes re-use of previously evaluated expressions so as to avoid the processing needed to re-evaluate the same, common, shared expression multiple times.
Academic and industry research has studied issues of query optimization techniques that exploit recognition and re-use of common, shared expressions. However, no commercial SQL query processing systems yet perform such optimization relating to recognition and re-use of common, shared expressions to speed up query execution. The principal reason for this lack of commercial success may be that current optimization techniques that recognize and re-use common, shared expressions are impractical for large, complex queries potentially having a large number of common, shared expressions. Present optimization techniques require significant processing time to recognize common, shared expressions and to evaluate the benefits of re-use of each of a large number of common, shared expressions to determine which common, shared expressions may realize the most benefit to the overall query execution. This processing time can be so significant as to reduce or totally eliminate the potential benefit of the resulting optimization. In other words, the elapsed time to identify and effectuate re-use of common, shared expressions in a complex query may exceed any savings realized by enacting the identified optimizations.
Where multiple related queries are executed on a common database, the problems of costs to identify and evaluate potential optimization are exacerbated. There are proportionately more opportunities to identify common shared expressions among the plurality of queries. In view of the increased complexity of such multiple queries, the benefits potentially realized by optimizations to identify and re-use common, shared expressions are larger but so too is the complexity of processing to identify and determine the best choice of common, shared expressions to exploit in the query plan. As noted above, the cost of identifying and selecting among numerous alternative common, shared expressions in a complex plurality of related queries may reduce or eliminate the benefits of enacting such optimizations.
It is evident from the above discussion that it is an ongoing problem to efficiently and effectively identify possible common, shared expressions among a plurality of queries and to effectuate re-use of the identified common, shared expressions so as to reduce the total elapsed time to execute a query or a plurality of queries.