1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to a method, apparatus, and article of manufacture for the reordering of complex structured query language (SQL) queries involving GROUPBYs, joins, outer joins and full outer joins.
2. Description of Related Art
The current state-of-the-art in query optimization has few results for optimizing query expressions involving GROUPBYs, joins, outer joins, and full outer joins specified with predicates that either reference more than two relations or reference columns generated by aggregations. Several researchers have performed work in this area, as reflected in the following publications, all of which are incorporated by reference herein:
Bhargava, G., Goel, P. and Iyer, B., "Reordering of complex queries involving joins and outer joins," IBM Technical Report TR03.567, July 1994, (hereinafter referred to as "BHAR94!");
Bhargava, G., Goel, P. and Iyer, B., "Hypergraph based reorderings of outer join queries with complex predicates," SIGMOD, 1995, pp. 304-315, (hereinafter referred to as "BHAR95!");
Ganski, R., and Wong, H. K. T., "Optimization of nested SQL queries revisited," SIGMOD, 1987, (hereinafter referred to as "GANS87!");
Galindo-Legaria, C., and Rosenthal, A., "How to extend a conventional optimizer to handle one- and two-sided outerjoin", Proceedings of Data Engineering, pp. 402-409, 1992, (hereinafter referred to as "GALI92a!");
Harinarayan, V. and Gupta, A., "Optimization using tuple subsumption", ICDT, Prague, January 1995, (hereinafter referred to as "HARI94!");
Levy, Alon Y., Mumick, I. S. and Sagiv, Y., "Query optimization by predicate move-around", VLDB, pp. 96-107, 1994, (hereinafter referred to as "LEVY94!");
Muralikrishma, M., "Improved Unnesting Algorithms for Join Aggregate SQL Queries", Proceedings of 18th VLDB Con., pp 91-102, Vancouver, British Columbia, Canada, 1992, (hereinafter referred to as "MURA92!");
Pirahesh, H., Hellerstein, J. M. and Hasan, W., "Extensible/rule based query rewrite optimization in Starburst," SIGMOD, pp. 39-48, San Diego, Calif., June 1992, (hereinafter referred to as "PIRA92!");
Paulley, G. N. and Larson, P- A., "Exploiting uniqueness in query optimization," CASCON, pp. 804-822, Vol. II, October 1993, (hereinafter referred to as "PAUL93!");
Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie, R. A. and Price, T. G., "Access path selection in a relational database management system," SIGMOD, pp. 23-34, 1979, (hereinafter referred to as "SELI79!").
However, there are numerous problems in the above prior art techniques. More specifically, BHAR94! and BHAR95! have proposed algorithms to reorder queries containing outer join predicates that reference two or more relations, but their work can only do partial reorderings and cannot reorder join aggregate queries.
With regard to join aggregate queries, consider the following query:
______________________________________ View V.sub.1 : SELECT R.sub.1.a AS a, R.sub.2.a AS b, c = count(R.sub.1) FROM R.sub.1, R.sub.2 WHERE R.sub.1.b .theta..sub.1 R.sub.2.b GROUPBY R.sub.1.c, R.sub.2.d Query 1: SELECT R.sub.3.a, R.sub.4.b, R.sub.2.b FROM (SELECT * FROM V.sub.1 LEFTOUTERJOIN R.sub.3 ON R.sub.3.b .theta..sub.2 V.sub.1.c), R.sub.4 WHERE R.sub.4.b = V.sub.1.b ______________________________________
where .theta..sub.1, .theta..sub.2 .di-elect cons. {=, .noteq., .gtoreq., .ltoreq., &lt;, &gt;}. The outer and inner joins specified in the above query cannot be reordered by the existing methods. This follows from the fact that view V.sub.1 cannot be merged with the rest of the query because it contains a column (column c) that is generated by an aggregation operator. Hence, the join specified in view V.sub.1 cannot be reordered with other outer and inner joins specified in the query.
Join-aggregate queries are known to be much more complex. Consider the following co-related join-aggregate query:
______________________________________ SELECT R.sub.1.a FROM R.sub.1 WHERE R.sub.1.b .theta..sub.1 (SELECT COUNT(*) FROM R.sub.2 WHERE R.sub.2.c = R.sub.1.c AND R.sub.2.d .theta..sub.2 (SELECT COUNT(*) FROM R.sub.3 WHERE R.sub.2.e = R.sub.3.e AND R.sub.1.f = R.sub.3.f)), ______________________________________
where .theta..sub.1, .theta..sub.2 .di-elect cons. {=, .noteq., .gtoreq., .ltoreq., &lt;, &gt;}. Typical commercial relational data base management systems (RDBMS) execute the above co-related query using Tuple Iteration Semantics (TIS), wherein for every tuple in R.sub.1, first, tuples in R.sub.2 are selected by applying predicate R.sub.2.c=R.sub.1.c and then each selected tuple in relation R.sub.2 in turn, along with the tuple in R.sub.1, is substituted in predicate R.sub.2.e=R.sub.3.e and R.sub.1.f=R.sub.3.f in order to first select and then count the selected tuples in relation R.sub.3. This process is repeated for every tuple in R.sub.1 resulting in a very inefficient processing strategy.
Prior work by GANS87! and MURA92!, incorporated by reference herein, proposed a method to unnest the above query and to transform it into the following two queries that employ outer joins and do not require TIS:
______________________________________ Query 2: SELECT INTO TEMP R.sub.1.key, R.sub.1.a, R.sub.2.key, R.sub.2.b FROM (SELECT * FROM R.sub.1 LEFTOUTERJOIN R.sub.2 ON R.sub.1.c = R.sub.2.c), R.sub.3 LEFTOUTERJOIN R.sub.2.e = R.sub.3.e AND R.sub.1.f = R.sub.3.f GROUPBY R.sub.1.key, R.sub.1.a, R.sub.2.key, R.sub.2.b HAVING R.sub.2.d .theta..sub.2 count(R.sub.3.key) Query 3: SELECT R.sub.1.a FROM TEMP GROUPBY R.sub.1.key HAVING R.sub.1.b .theta..sub.1 count(R.sub.2.key) ______________________________________
One difficult problem in query optimization has been the inability to unnest join aggregate queries. While the transformation of such a query to an outer join is known, existing methods cannot reorder the left outer joins specified in Query 2 because of the complex predicate R.sub.2. e=R.sub.3.e and R.sub.1.f=R.sub.3.f. In addition, the prior art has not described how to perform complete enumeration of queries containing outer joins, when the outer join predicate references an aggregated value, or the predicate references more than two base relations in a query subtree. The problem is important since hierarchical schemas are often found in the schemas translated from hierarchical legacy data bases used for RDBMS-based data warehousing applications and object oriented data bases.
Therefore, for more efficient processing of complex SQL queries, there is a need for reordering complex SQL queries that contain GROUPBYs, inner and outer joins.