The present invention relates generally to database groupings, and more specifically to executing database grouping set queries.
Database management is important in todays computing environments. An important aspect of database management is handling queries. Grouping set query statements in many computer languages such as structured query language (SQL) are important aspect of database query groupings. These groupings can also include cube grouping queries and roll up grouping queries that are derived therefrom. Conventional query groupings extend to a plurality of dimensions and are widely applied in many data warehousing systems such as online analytical processing (OLAP) systems. Grouping set query statements have many applications, for example they may be used to define a plurality of groups in a same query. In addition, they can be further extended by using Group By statements.
A simple statement of grouping set query can be illustrated as Grouping Sets((C1, C2), (C1, C3)). In this example, (C1, C2) and (C1, C3) are two Groups respectively, and C1, C2, C3 are all names of data column in database table. Grouping set queries relates to a plurality of groups in some instances and in others can be further expanded to relate to various possible value combinations for a plurality of data columns. Executing these queries often consumes a large portion of the execution time.
Parallelism is widely applied in process of SQL statements to improve performance efficiency. For example, a piece of SQL statement may be divided into a plurality of child tasks, each of which task assigned to execute a part of total task (piece of SQL statement) simultaneously. Subsequently, all execution results of these child tasks are then merged to generate a final result. Therefore, a piece of a grouping set query statement may be processed in parallel in this manner by dividing the total task into child tasks. The challenge, however, is dividing the total task into a plurality of child tasks in a manner that the task is handled efficiently and quickly without consuming more time and duplication of work.
One approach may involve dividing a plurality of child tasks based on data in a way that grouping set query statements are executed in parallel. In the example above, the grouping set query statement Grouping Sets((C1, C2), (C1, C3)) is assumed to generate one million of rows of data that are somehow related. In processing this grouping set query statement, a decision is made to divide the task into four child tasks that can be processed in parallel. In this case, the one million of rows of data need to be equally divided (into fours) based on the total quantity of data to be processed. Each child task then needs to process 250,000 of rows of data. The problem, however, is that in addition to each child task processing 250,000 of rows of data, each child task also need to process a whole piece of grouping set query statement with respect to these 250,000 of rows of data. This will result in a need for merging process results relating to each child task performed in order to acquire the final execution result for the entire task. The merging process results in cases to a computation time that exceeds the original. This is because a large percentage and number of complex processes have to be duplicated by each child.