Analytic “window functions” are very common SQL (structure query language) constructs used for data analysis. A window function is an aggregation function that is applied to a result set. The syntax of window functions looks like the following:
Window_Function( [arguments] ) OVER ([PARTITION BY pk1 [, pk2, ...]][ORDER BY ok1 [, ok2, ...] [WINDOW clause]] )
Window functions are evaluated within partitions defined by the PARTITION BY (PBY) keys pk1, pk2, etc. with data ordered within each partition on ORDER BY (OBY) keys ok1, ok2, etc. The WINDOW clause defines the window (begin and end points) for each row. SQL aggregate functions (sum, min, count, etc.), (lag, lead, first_value, etc.) can be used as window functions. The PARTITION BY and ORDER BY clauses are optional, and it is possible for queries to include window functions that do not contain these clauses.
For example, the “sum( )” function is a commonly used window function that provides summing aggregation computations for a set of data. The following example statement:
sum(sales_amount) OVER(PARTITION BY sales_rep)
can be used to calculates the sum of the values in the “sales_amounts” column of all rows in a given table having the same “sales_rep” value. The “PARTITION BY” clause provides the key that is used to specify the sets of data for aggregation. The sum ( ) function is an example of a “reporting aggregation function”, because the same value is reported for all rows in the partition, i.e., having the same “sales_rep” value for this particular query.
Given the large volume and quantities of data that may be need to be handled to process a window function, it is often desirable to parallelize the processing of such functions. The traditional way of parallelizing window functions is to use the PARTITION BY keys to split the work being handled by the parallelized processes or threads.
To explain, consider the example system 100 shown in FIG. 1A. This system 100 includes a pool 102 of worker processes that can be used to handle work in the system. The traditional processing approach is divided into two phases 110 and 112. The first phase 110 is the scanning phase, in which a set of worker processes (e.g., P1, P2, P3, and P4) are used to scan data for the window function in parallel from a database.
The second phase 112 is the window computation phase, in which the data scanned during the scan phase 110 is distributed among multiple worker processes to perform the actual computations required by the window function. The distribution of work among the worker processes in the window computation phase is based upon the number of different key values of the PARTITION BY clause. For example, assume that a window function is specified for the PARTITION BY clause is keyed upon the four financial quarters for a company, i.e., Q1, Q2, Q3, and Q4. Such as function may look like the following:
SUM (sales_amount) OVER(PARTITION BY sales_quarter)
In this example, it is clear that there can only be four different possible values for the PARTITION BY key. Therefore, the degree of parallelism that can be achieved for this window function in the window computation phase 112 is four, as shown in the example of FIG. 1A in which there are four different worker processes (i.e., P5, P6, P7, and P8) to handle the workload in this phase 112.
One possible drawback with this approach is that it will not parallelize window functions during the computation phase if the window function does not have a PARTITION BY clause. If the window function does not include a PARTITION BY clause, then in the traditional approach, there is no way for the workers in the scanning phase 110 to distribute work to multiple workers in the window computation phase 112. As shown in FIG. 1B, what ends up happening is that the work in the window computation phase 112 is assigned to only a single worker process P5, even if there are multiple other worker processes P6, P7, and P8 that are waiting in the pool 102 and are available to be used to handle computation workloads.
Another possible drawback is that this approach will not have scalable execution for window functions with low-cardinality partition keys, particularly if the number of available worker processes far exceeds the number of PARTITION BY keys. For example, consider the following window function:
SUM (sales_amount) OVER(PARTITION BY gender)
This window function has a PARTITION BY clause where the partition keys can only have two values, either “male” or “female.” Therefore, as shown in FIG. 1C, this the window computation phase 112 can only have two worker processes (i.e., P5 and P6) assigned to handle the computation workload, even though there may be multiple other worker processes (i.e., P7 and P8) that are waiting in the pool 102 and are available to be used to handle computation workloads.
Therefore, there is a need for an improved approach to provide more scalable handling of window functions, since conventional approaches to handling such functions are not intrinsically parallelizable if there are no partition keys, and that parallel evaluation of low-cardinality window functions is limited by the number of distinct partition key values.
According to some embodiments, the invention uses a two stage evaluation approach to parallelize the processing of window functions. In the first stage, which is highly parallel, the majority of the computation of window function is done by the available processes. In this way, the entire computing power of the database server is utilized. The second stage, which is serial but is likely to be very short, all processes involved in first stage synchronize and complete the window function evaluation.
Further details of aspects, objects, and advantages of the invention are described below in the detailed description, drawings, and claims. Both the foregoing general description and the following detailed description are exemplary and explanatory, and are not intended to be limiting as to the scope of the invention.