1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to a method and apparatus for the enumeration of projections (i.e., "SELECT DISTINCT" operations) in SQL queries containing outer and full outer joins in the presence of inner joins without introducing any regression in performance.
2. Description of Related Art
Computer systems incorporating Relational DataBase Management System (RDBMS) software using a Structured Query Language (SQL) interface are well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO).
In RDBMS software, all data is externally structured into tables. The SQL interface allows users to formulate relational operations on the tables either interactively, in batch files, or embedded in host languages such as C, COBOL, etc. Operators are provided in SQL that allow the user to manipulate the data, wherein each operator operates on either one or two tables and produces a new table as a result. The power of SQL lies on its ability to link information from multiple tables or views together to perform complex sets of procedures with a single statement.
The execution time of a SQL query can be reduced significantly by considering different schedules for the operations specified in the query. The current state-of-the-art in SQL query optimization provides techniques for optimizing queries that contain binary operations such as inner join, outer join and full outer join, as reflected in the following publications:
1. Galindo-Legaria, C., and Rosenthal, A., "How to Extend a Conventional Optimizer to Handle One- and Two-Sided Outerjoin," Proceedings of Data Engineering, pp. 402-409, 1992, (hereinafter referred to as "[GALI92a]"); PA1 2. Galindo-Legaria, C. A., "Algebraic optimization of outer join queries," Ph.D. dissertation, Dept. of Applied Science, Harvard University, Cambridge, 1992, (hereinafter referred to as "[GALI92b]"); PA1 3. Rosenthal, A. and Galindo-Legaria, C., "Query graphs, implementing trees, and freely-reorderable outer joins", SIGMOD, pp. 291-299, 1990, (hereinafter referred to as "[ROSE90]"); and PA1 4. U.S. patent application Ser. No. 08/326,461, filed Oct. 20, 1994, by G. Bhargava, P. Goel, and B. Iyer, entitled "METHOD AND APPARATUS FOR REORDERING COMPLEX SQL QUERIES CONTAINING INNER AND OUTER JOIN OPERATIONS," (hereinafter referred to as "[BHAR94]").
In addition, the publication Dayal, Umeshwar, Goodman, N. and Katz, R. H., "An extended relational algebra with control over duplicate elimination", Proc. ACM PODS, pp. 117-123, 1982, (hereinafter referred to as "[DAYA82]"), presented an extended relational algebra to handle duplicates by either keeping the count of replication with each tuple or assigning a unique tuple identifier to each tuple.
Moreover, the publication Pirahesh, H., Hellerstein, J. M. and Hasan, W., "Extensible/Rule Based Query Rewrite Optimization in Starburst," ACM SIGMOD, pp. 39-48, San Diego, Calif., June 1992, (hereinafter referred to as "[PIRA92]"), employed tuple identifiers in their rule based query re-write system which removes projections specified between binary operations such as inner joins. This publication showed that the execution time of a query can be significantly improved by first removing projections and then generating the optimal plan for binary operations. This prior art technique transforms a given query into a new query in which binary operations are adjacent to each other, and then generates the optimal plan by considering different schedules for inner joins.
Notwithstanding the above, there are numerous problems with prior art techniques. While these prior art techniques can generate different schedules for binary operations, they generally do not consider different schedules for projections (i.e., SELECT DISTINCT operations in SQL). In addition, the prior art assumes that binary operations are adjacent to each other. However, since unary operators like selection and projection can appear anywhere in queries, binary operations may not be adjacent to each other. Moreover, since the cost of binary operations depend on the cardinalities of intermediate relations, it may be beneficial to remove the duplicates from intermediate relations.
Thus, there is a need in the art for techniques for removing projections from SQL queries, and for generating different schedules for SQL queries containing both projections and binary operations. Moreover, there is a need in the art for such techniques that do not introduce any regression in performance in the execution of such queries.