1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to the optimization of scalar subqueries in a relational database management system.
2. Description of Related Art
Computer systems incorporating Relational DataBase Management System (RDBMS) software using a Structured Query Language (SQL) interface are well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American Nationals Standard Organization (ANSI) and the International Standards Organization (ISO).
In RDBMS software, all data is externally structured into tables. The SQL interface allows users to formulate relational operations on the tables either interactively, in batch files, or embedded in host languages such as C, COBOL, etc. Operators are provided in SQL that allow the user to manipulate the data, wherein each operator operates on either one or two tables and produces a new table as a result. The power of SQL lies on its ability to link information from multiple tables or views together to perform complex sets of procedures with a single statement.
In the SQL standard, a SELECT statement is used to retrieve data and generally comprises the format: xe2x80x9cSELECT  less than clause greater than FROM  less than clause greater than WHERE  less than clause greater than GROUP BY  less than clause greater than HAVING  less than clause greater than ORDER BY  less than clause greater than .xe2x80x9d The clauses generally must follow this sequence, but only the SELECT and FROM clauses are required. The result of a SELECT statement is a subset of data retrieved by the RDBMS software from one or more existing tables or views stored in the relational database, wherein the FROM clause tells the RDBMS software the name of the table or view from which data is being selected. The subset of data is treated as a new table, termed the result set, which typically comprises a temporary table. In general, the items specified in the SELECT clause of the SELECT statement determine the columns that will be returned in the result table from the table(s) identified in the FROM clause.
The WHERE clause determines which rows should be returned in the result table. Generally, the WHERE clause contains a search condition that must be satisfied by each row returned in the result table. The rows that meet the search condition form an intermediate set, which is then processed further according to specifications in the SELECT clause. The search condition typically comprises one or more predicates, each of which specify a comparison between two values comprising columns, constants or correlated values. Multiple predicates in the WHERE clause are themselves typically connected by Boolean operators.
A join operation makes it possible to combine tables or views by combining rows from one table or view to another table or view. The rows, or portions of rows, from the different tables or views are concatenated horizontally. The join operation is implied by naming more than one table or view in the FROM clause of a SELECT statement. Although not required, join operations normally include a WHERE clause that identifies the columns through which the rows can be combined. The WHERE clause may also include a predicate comprising one or more conditional operators that are used to select the rows to be joined.
In a SELECT-FROM-WHERE query, identifying situations where at most one tuple is retrieved by the query (termed a xe2x80x9c1-tuple conditionxe2x80x9d) permits important query transformations and optimizations. Certain instances of the identification of 1-tuple conditions have been performed in the prior art, as disclosed in the following publications and patent, which are incorporated by reference herein:
1. Hamid Pirahesh, Joseph Hellerstein, and Waqar Hasan, xe2x80x9cExtensible/Rule Based Query Rewrite Optimization in STARBURST,xe2x80x9d Proceedings of ACM SIGMOD ""92 International Conference on Management of Data, San Diego, Calif., 1992;
2. Hugh Darwen, xe2x80x9cThe Role of Functional Dependence in Query Decomposition,xe2x80x9d Relational Database Writings 1989-1991, Chapter 10, pp. 133-154, 1992;
3. G. N. Paulley and Per-Ake Larson, xe2x80x9cExploiting Uniqueness in Query Optimization,xe2x80x9d Proceedings of IEEE ""94 International Conference on Data Engineering, Houston, Tex., 1994, pp. 68-79; and
4. U.S. Pat. No. 5,615,361, issued Mar. 25, 1997, to T. Y. Leung, M. H. Pirahesh, D. E. Simmen, L. G. Strain and S. Tiwari, entitled xe2x80x9cExploitation of Uniqueness Properties using a 1-Tuple Condition for the Optimization of SQL Queriesxe2x80x9d.
The 1-tuple condition has been heavily relied upon for transforming scalar subqueries into join operations. However, there is a need in the art for improved methods of optimizing scalar subqueries, and in particular, to the optimization of scalar subqueries in UPDATE statement.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus, and computer program carrier for optimizing SQL queries. A query is analyzed to determine whether it includes one or more scalar subqueries and one or more existential subqueries. The scalar subqueries and the existential subqueries are transformed into join operations by exploiting subsumption methods, and the join operations are merged into a single query block, which allows the generation of more efficient execution plans for retrieving data from the relational database.
It is an object of the present invention to optimize SQL queries that include both scalar subqueries and existential subqueries. More specifically, it is an object of the present invention to transform one or more scalar subqueries into one or more join operations, to transform one or more existential subqueries into one or more join operations, and to merge the resulting join operations into a single query block. This unique combination of query rewrite transformations results in the reduction of scalar and existential subqueries to join operations, as well as the elimination of redundant join operations, thereby providing superior performance.
Another object of the present invention is to prove, at query compilation, that a scalar subquery returns at most one row and that the scalar subquery does not return an empty set, so that the scalar subquery can be transformed into join operations, which can be merged or reduced into one or more query blocks. As a result, the present invention provides dramatically improved query execution time.