One of the core SQL operations is a group-and-aggregation operation. As the name suggests, two operations are involved. The group operation groups together all rows in a relation that share the same keys (columns). The aggregate operation aggregates values of non-key columns of the relation within each group. Some group-and-aggregate operations specify a set of filters to be applied on the relations before the grouping operation on the relation, which can be materialized.
An example of a group and aggregate operation is:
select d.DEPTNO   ,d.NAME   ,count(e.EMPNO)as NUM_EMP   ,nvl(sum(e.MSAL),o)as SUM_MSAfrom DEPT d   ,EMP ewhered.DEPTNO = e.DEPTNO  (+)group byd.DEPTNO, d.DNAME
This SQL statement joins a department table d and an employee table e, groups the rows by department number DEPTNO and department name DNAME, and counts the number of employees NUM_EMP and the sum of their salaries e.MSAL into SUM_MSA. This query has the form of select AggFunc from R group by K, where AggFunc includes the count and sum functions, R is the relation with the department and employ tables, and K includes the two columns department name and department number.
Currently database implementations of group-and-aggregate queries, such as the one above, use the classical iterator-based technique for serial evaluation of a query. The iterator technique includes opening a row source iterator on the relation, fetching rows, and filtering the rows. If the grouping includes sorting, the rows that pass the filter tests are sorted on the group-by keys. If the grouping includes hashing, a hash is computed based on the group-by key values. The sorted or hashed rows are organized into sort or hash run structures. After all of the rows in the relation have been consumed, the row source iterator is closed and the grouped and aggregated rows are returned.
Large relations cause problems with current implementations of queries such as group-and aggregate. One problem is that the relation is so large that it does not fit in available memory, thus requiring many trips to disk to process portions that do fit in the available memory. The multiple trips to disk limit performance of the system to that of the disk system.
Another problem is that the cost of applying the filters on each of the rows in the relation may be prohibitive. If the selectivity of the filters is low, the number of rows returned by the operation is large, leading to cases in which some aggregation operations do not fit in available memory.
Yet another problem is that, if the number of groups resulting from the grouping operation is large, then constructing large hash or sort runs stresses the memory hierarchy of on-chip caches and memories.
One approach to solving the above problems is to execute portions of the group-and-aggregate query in parallel, by taking advantage of multi-threaded CPU cores, pools of server processes, or multi-node clustered configurations. Executing portions of the query in parallel also requires some technique for merging these operations into a final result.
Another approach is to off-load the processing of some of the operations involved in the group-and-aggregate operation to another system that is likely to perform the operations at a lower cost or to reduce the amount of data that the server process needs to process.
Heterogeneous Systems
For large relations, database systems can benefit from Heterogeneous Systems (HS). These systems are ones with a large number of disk-less compute nodes, each with its own main memory, and a high-speed interconnect among the nodes. As the number of nodes is very large, the amount of memory aggregated over all of the nodes is also very large. The database system using the HS has access to an in-memory representation of the relation in the HS and to persistent storage where the relation is stored.
Heterogeneous Systems are often organized in the form of a set of clusters of hierarchies, each cluster having a tree-like structure. Each leaf in the tree has a compute node and memory and is connected via switches that reside at multiple levels in the tree. Compute nodes in the hierarchy are built for both very efficient processing of a well-defined set of query primitives and low power consumption. The types of processors at each of the compute nodes can be different from processors elsewhere in the hierarchy or from processors in a database system that connects to the heterogeneous system.
In one embodiment, a hierarchy has 200 compute nodes and a total of 3 terabytes (TB) of memory distributed over the nodes. A cluster of four such hierarchies provide about 12 TB of working memory, which is sufficiently large for holding an in-memory copy of a large relation.
A heterogeneous system offers many benefits, such as a very high degree of parallelism, high throughput, and low power for operations, such as group-and-aggregate, on large relations. However, a heterogeneous system may have some functional limitations and cost-benefit tradeoffs in its use. One functional limitation is the inability to perform certain underlying functions needed by the group-and-aggregate operation. These functions include fetching the underlying row sources, supporting functions that use the key and column data types, and those that perform the particular aggregation specified. Lacking the ability to perform these underlying functions reduces the performance benefit of the heterogeneous system. Cost-benefit tradeoffs include comparison of the cost of loading portions of the relation into the heterogeneous system and collecting the results with the benefits of any improvement in the time and power consumed when the heterogeneous system assists in the group-and-aggregate operation. Additionally, because the heterogeneous system has no persistent storage for storing redo logs, the database system incurs a cost to assure transactional consistency.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.