The present invention relates generally to database query operations performed within computer systems and, more specifically, to minimizing the number of sort operations required for a database query containing window functions.
Relational databases store information in tables that are organized into rows and columns. A user retrieves information from the tables by entering a request that is converted to queries by a database application, which then submits the queries to a database server. In response to the queries, the database server accesses the tables specified by the queries to determine which information within the tables satisfies the queries. The information that satisfies the queries is then retrieved by the database server and transmitted to the database application and ultimately to the user.
Online analytical processing (xe2x80x9cOLAPxe2x80x9d) applications, also known as decision support processing applications, are applications that provide analysis over large volumes of data stored in a database. Analytic calculations are critical for data warehousing applications. Examples of analytic functions are those functions used in basic business intelligence calculations such as moving averages, rankings and lead/lag comparisons of data. Analytic functions are broadly classified as window functions. Window functions are so named because they operate over a set of rows of data in the database tables. The set of rows upon which the window functions operate are described by a window definition. The window definition describes which rows qualify for the window. The window has a starting row and an ending row. For example, a window defined for a moving average would have both the starting and end points of the window slide so that the end points maintain a constant physical or logical range. For example, the following query calculates a 3 month moving average per stock ticker.
AVG (stock_price) OVER (Partition By (stock_name) Order By (time) RANGE xe2x80x983xe2x80x99 MONTH PRECEDING)
The clause xe2x80x9cPartition By (stock_name)xe2x80x9d partitions the data by stock_name, and the clause xe2x80x9cOrder By (time)xe2x80x9d orders the data within a partition by time. RANGE xe2x80x983xe2x80x99 MONTH PRECEDING is a logical expression of window size. In this example, the xe2x80x9cwindowxe2x80x9d has the logical size of three months. Alternatively, window size may be expressed by a physical interval. That is, the interval includes a certain number of rows before and after the current row in the ordered set of rows (ordered based on ORDER By columns in window function.) For example, the following query calculates the moving average for each stock ticker over 90 preceding rows of data.
AVG (stock_price) OVER (Partition By (stock_name) Order By (time) ROWS 90 PRECEDING)
TABLE 1 below illustrates a result set for the-query containing the window function xe2x80x9cAVG (stock_price) OVER (Partition By (stock_name) Order By (time) RANGE xe2x80x983xe2x80x99 MONTH PRECEDING)xe2x80x9d. The above window function calculates a moving average of stock price for each stock within a three month window.
Thus, the use of window functions enhances developer productivity because window functions allow for a succinct representation of otherwise, complicated queries. However, a separate sort is typically required for each window function. In addition, when the query block that contains window functions also contains xe2x80x9cGroup Byxe2x80x9d and/or xe2x80x9cOrder Byxe2x80x9d clauses, additional sorts are required. Thus, the number of sorts is greater than or equal to the number of window functions in the query. A typical query contains multiple window functions. The computation time required for the total number of sorts may be massive.
Based on the foregoing, there is clear need for a mechanism for minimizing the number of sort operations that are required to satisfy a query that contains window functions.
Techniques are provided for minimizing the number of sort operations that are required for satisfying a query containing a set of window functions. According to one embodiment, the window functions are grouped into Ordering Groups. An Ordering Group is a subset of the window functions, which are capable of being satisfied by a particular sort operation.
According to one embodiment, the Ordering Groups are constructed around the window functions that require the most restrictive sort operations. From the set of Ordering Groups, a minimal set of Ordering Groups is selected. The number of sort operations corresponding to the minimal set of Ordering Groups is the minimal number of sort operations needed to satisfy the sorting requirements of the set of window functions.
Since xe2x80x9cGROUP BYxe2x80x9d and xe2x80x9cORDER BYxe2x80x9d clauses of the query block are mapped to window functions, the minimal set of ordering groups represents the minimal number of sort operations required for the whole query block excluding join operations.