1. Field of the Invention
The present invention relates to techniques for optimizing database queries and, more particularly, to a method and apparatus for optimizing queries having group-by operators.
2. Description of the Related Art
Performance of databases is largely dependent on the ability of the database system to optimize query execution. Query execution is optimized in databases by preprocessing the query to place it in a form which can be more efficiently executed by the database system. The optimization process selects an execution plan which is most efficient.
One problem with conventional approaches is that they have failed to adequately optimize queries having group-by operators. The conventional approaches perform the group-by operation after all the join operations have been evaluated. See, e.g., A. Klug, Access Paths in the ABE Statistical Query Facility, Proceedings of 1982 ACM-SIGMOD Conference on the Management of Data; U. Dayal, Of Nests and Trees. A Unified Approach to Processing Queries that contain subqueries, aggregates and quantifies, Proceedings of the 13th VLDB, 1987; and Selinger P. G. et al., Access Path Selection in a Relational Database Management System," Proceedings of ACM-SIGMOD Conference on the Management of Data, June 1979, pp. 23-34. Accordingly, most conventional approaches have not considered or realized the benefits of transformations when grouping precedes join to reduce the size of the relation and possibly the cost of the join.
Recently, a transformation that enables pushing a group-by operator past a join operation was discovered. See, e.g., W. Yan and P Larson, Performing Group-By before Join, International Conference on Data Engineering, 1993. The approach is based on partitioning relations in the given query in two groups so as to form two queries. The result of the given query is eventually obtained by joining the results of the two queries. But, as a price for pushing the group-by operation past a join operation, the space of choice for join ordering is reduced because the ordering of relations is considered only within a partition. Moreover, given a query, there is a unique placement for the group-by operator. Thus, the transformation utilized by Yan and Larson fails to capture alternative execution plans which are possible and sometimes preferred (because they result in more efficient execution).
Therefore, there is a need for a robust technique to optimize queries having group-by operators.