The present invention relates generally to the field of relational database management systems and, more specifically, to improving the performance of grouping and duplicate elimination by avoiding unnecessary disk access.
Aggregation is one of the major database operations in current relational database management systems (RDBMS). In SQL (sequential query language), aggregation can be done through either the GROUP BY clause (i.e., to compute “have aggregate” functions over groups of tuples) or the DISTINCT clause (i.e., to eliminate the duplicates). With the proliferation of data warehouse and data mining applications, aggregation becomes increasingly important in today's relational database management systems. Yet such aggregation operations are typically expensive and lengthy due to a large amount of disk I/O (input/output) operations performed and the large size of many databases.
Prior art efforts to enhance aggregation performance include in-memory duplicate elimination, which assumed that the data can be completely brought into memory, and sorting and de-duplication can be done in memory as well. When the relation size (e.g., the size of the RDBMS table) is larger than the memory size, the external merge sort can be used to complete the aggregation in two passes. In the first pass, data are read into the memory in batches; in-memory sorting is done for each batch; and then the sorted results are written to disk as a sublist. In the second pass, the sorted sublists are merged to form the final aggregate result. Such algorithms are currently widely in use in relational database management systems.
In addition, other revisions on the external merge sort approach have been tried to improve aggregation performance using “early-aggregation” techniques. Under such an approach, instead of delaying all the aggregation computation to the second pass, some aggregation operations are computed in the first pass as the sorted sublists are generated. This reduces the number of total disk blocks required to hold the sorted sublists and hence reduces the number of disk I/O operations. For example, one method performs early-aggregation at the sorting time—as each batch is sorted, the aggregation result is computed as well. Another method computes aggregation before sorting using a hash technique until the memory is full. At that time, the data are sorted and written back to disk.
It may be observed that even with such early-aggregation techniques, the existing methods incur a significant number of unnecessary disk I/O operations. In particular, when generating a sublist in the first pass, the prior art methods typically sort all the blocks in the memory and write them back to disk, even though in many situations not all blocks need to be sorted or written to disk.