Many operations performed by a database server generate a series of result values (a “result item set”) based on a series of input values (an “input item set”). For example, a database may store an “employee” table that includes a “salary” column. To implement an across-the-board salary increase, the database server may execute a database command that causes the database server to multiply all values in the salary column of the employee table by 1.1. In this example, the input item set is the original set of salary values, and the result item set is the set of new salary values produced by multiplying the old salary values by 1.05.
In the example given above, each value in the result item set was produced based on a corresponding value in the input item set. Specifically, the new salary value for a given input row of the salary table is generated based on the current salary value contained in the same row.
In contrast, with certain types of operations, more information than the corresponding input value must be known in order to generate a result value for the input value. For example, in order to generate the rank value for a particular input value, it is necessary to know how many input values precede the input value in the ranking.
Rank is an example of a “window function”. Window functions are so named because they operate over a set of values from the input item set. In the context of a database system, the input item set typically corresponds to values from a particular column of rows that belong to a particular database table. The set of rows upon which the window functions operate are described by a window definition or window size. The window size describes which rows qualify for the window. The window has a starting row and an ending row. For example, a window defined for a moving average would have both the starting and end points of the window slide so that the end points maintain a constant physical or logical range. For example, the following query calculates a 3 month moving average per stock ticker.                AVG (stock_price) OVER (Partition By (stock_name) Order By (time) RANGE ‘3’ MONTH PRECEDING)        
The clause “Partition By (stock_name)” partitions the data by stock_name, and the clause “Order By (time)” orders the data time-wise within a partition. RANGE ‘3’ MONTH PRECEDING is a logical expression of window size. In the example, the “window” has the logical size of three months. Alternatively, window size may be expressed by a physical interval. That is, the interval may refer to how the data is stored within the database. For example, the following query calculates the moving average for each stock ticker over 90 preceding rows of data.                AVG (stock_price) OVER (Partition By (stock_name) Order By (time) ROWS 90 PRECEDING)        
TABLE 1 below illustrates a result set for the query containing the window function “AVG (stock_price) OVER (Partition By (stock_name) Order By (time) RANGE ‘3’ MONTH PRECEDING)”. The above window function calculates a moving average of stock price for each stock within a three month window.
TABLE 1Stock_nameTimestock_pricemoving_averageORCL1-Jan'992020ORCL1-Feb'9930(20 + 30)/2 = 25ORCL1-Mar'9958(20 + 30 + 58)/3 = 36ORCL1-Apr'9911(30 + 58 + 11)/3 = 33ORCL1-May'9951(58 + 11 + 51)/3 = 40ABCD1-Jan'992525ABCD1-Feb'9935(25 + 35)/2 = 30ABCD1-Mar'9945(25 + 35 + 45)/3 = 35ABCD1-Apr'9955(35 + 45 + 55)/3 = 45ABCD1-May'9965(45 + 55 + 65)/3 = 55
Thus, the use of window functions enhances developer productivity because window functions allow for computerized decision support that may be either interactive or batch report jobs.
An important category of window functions is the “ranking” family of window functions. Window functions in the ranking family compute the rank of a row of data with respect to other rows of data in the dataset based on the values of a set of measures. To illustrate, the following query ranks salesmen in Acme Company based on sales amount in each geographical sales region.                SELECT sales_person, sales region, sales_amount,                    RANK( ) OVER (PARTITION BY sales_region ORDER BY s_amount DESC) FROM Sales_table;                        
TABLE 2A below illustrates a result set for the preceding query. The “rank” column in Table 2A lists the sales persons in descending order based on the sales amount. The rank values are reset for each sales region.
TABLE 2Asales_personsales_regionsales_amountRankAdamsEast1001BakerEast992ConnorsEast893DavisEast754EdwardsWest741FitzhughWest662GaribaldiWest453
Examples of window functions in the ranking family include RANK, DENSE_RANK, NTILE, PERCENT_RANK, ROW_NUMBER, and CUME_DIST. Window functions that belong to the ranking family are hereafter referred to as ranking functions. Ranking functions are widely used in queries for ranking rows of data in a dataset based on some ordering criterion and subsequently filtering out all but the rows in the top-N ranks. For example, assume that the query corresponding to TABLE 2A asked for the top 2 salespersons in each sales region based on the sales amount credited to each sales person. TABLE 2B illustrates a results set where data rows corresponding to a rank that is greater than 2 are filtered out. Queries that result in the computation and selection of top-N ranks are hereafter referred to as “TOP-N” queries.
TABLE 2Bsales_personsales_regionsales_amountRankAdamsEast1001BakerEast992EdwardsWest741FitzhughWest662
One way to process database commands more quickly involves parallelizing the execution of the commands. Parallelizing the execution of a command generally involves breaking the work required by the command into multiple sub-tasks, and causing the sub-tasks to be executed in parallel by a set of slave processes.
Window functions are typically parallelized based on the specifications of the PARTITION BY clause they contain. Specifically, the work that needs to be performed to execute a window function is divided up on a partition-by-partition basis. The partition-based sub-tasks are then assigned to slave processes for parallel execution.
Window functions are parallelized in this manner because the PARTITION BY clause insures that a slave working on data items from one partition will not require information about the input items that belong to other partitions. Parallelizing window functions using the partitions created by the PARTITION BY clause works well when the number of partitions created by the PARTITION BY clause of a database command is equal to or greater than the desired degree of parallelism.
Unfortunately, this is often not the case. For example, the PARTITION BY clause of a database command may establish only two partitions, even though the system executing the database command is able to support a much higher degree of parallelism. As another example, some database commands with window functions may not be partitioned at all. When parallelization is performed by dividing work on a partition-by-partition basis, unpartitioned window functions would not generally be parallelized, though intermediate ordering and rank predicate filtering could, at times, be parallelized.
Based on the foregoing, it is clearly desirable to provide a mechanism for parallelizing window functions that does not rely on the PARTITION BY clause of the command that includes the window function. By eliminating reliance on the PARTITION BY clause, the mechanism may effectively parallelize window functions that are unpartitioned.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.