The present invention relates generally to query operations performed within computer systems and, more specifically, to optimizing the computation of OLAP ranking functions.
Relational databases store information in indexed tables that are organized into rows and columns. A user retrieves information from the tables by entering a request that is converted to queries by a database application, which then submits the queries to a database server. In response to the queries, the database server accesses the tables specified by the queries to determine which information within the tables satisfies the queries. The information that satisfies the queries is then retrieved by the database server and transmitted to the database application and ultimately to the user.
Online analytical processing (xe2x80x9cOLAPxe2x80x9d) applications, also known as decision support processing applications, are applications that provide analysis of data stored in a database. OLAP applications involve the use of analytic functions. Examples of analytic functions are those functions used in basic business intelligence calculations such as moving averages, rankings and lead/lag comparisons of data. Analytic functions are broadly classified as window functions. Window functions are so named because they operate over a set of rows of data in the database tables. The set of rows upon which the window functions operate are described by a window definition or window size. The window size describes which rows qualify for the window. The window has a starting row and an ending row. For example, a window defined for a moving average would have both the starting and end points of the window slide so that the end points maintain a constant physical or logical range. For example, the following query calculates a 3 month moving average per stock ticker.
AVG (stock_price) OVER (Partition By (stock_name) Order By (time) RANGE xe2x80x983xe2x80x99 MONTH PRECEDING)
The clause xe2x80x9cPartition By (stock_name)xe2x80x9d partitions the data by stock_name, and the clause xe2x80x9cOrder By (time)xe2x80x9d orders the data time wise within a partition. RANGE xe2x80x983xe2x80x99 MONTH PRECEDING is a logical expression of window size. In the example, the xe2x80x9cwindowxe2x80x9d has the logical size of three months. Alternatively, window size may be expressed by a physical interval. That is, the interval may refer to how the data is stored within the database. For example, the following query calculates the moving average for each stock ticker over 90 preceding rows of data.
AVG (stock_price) OVER (Partition By (stock_name) Order By (time) ROWS 90 PRECEDING)
TABLE 1 below illustrates a result set for the query containing the window function xe2x80x9cAVG (stock_price) OVER (Partition By (stock_name) Order By (time) RANGE xe2x80x983xe2x80x99 MONTH PRECEDING)xe2x80x9d. The above window function calculates a moving average of stock price for each stock within a three month window.
Thus, the use of window functions enhances developer productivity because window functions allow for computerized decision support that may be either interactive or batch report jobs.
An important category of window functions is the xe2x80x9crankingxe2x80x9d family of window functions. Window functions in the ranking family compute the rank of a row of data with respect to other rows of data in the dataset based on the values of a set of measures. To illustrate, the following query ranks salesmen in Acme Company based on sales amount in each geographical sales region.
SELECT sales_person, sales region, sales_amount,
RANK ( ) OVER (PARTITION BY sales_region ORDER BY s_amount DESC)
FROM Sales_table;
TABLE 2A below illustrates a result set for the preceding query. The xe2x80x9crankxe2x80x9d column in Table 2A lists the sales persons in descending order based on the sales amount. The rank values are reset for each sales region.
Examples of window functions in the ranking family include RANK, DENSE_RANK, NTILE, PERCENT_RANK, ROW_NUMBER, and CUME_DIST. Window functions that belong to the ranking family are hereafter referred to as ranking functions. Ranking functions are widely used in queries for ranking rows of data in a dataset based on some ordering criterion and subsequently filtering out all but the rows in the top-N ranks. For example, assume that the query corresponding to TABLE 2A asked for the top 2 salespersons in each sales region based on the sales amount credited to each sales person. TABLE 2B illustrates a results set where data rows corresponding to a rank that is greater than 2 are filtered out. Queries that result in the computation and selection of top-N ranks are hereafter referred to as xe2x80x9cTOP-Nxe2x80x9d queries.
TOP-N queries are often computationally expensive when massive amounts of data need to be sorted and ranked. Because the use of TOP-N queries is frequent and widespread in the industry, any improvement in computation efficiency of TOP-N queries may amount to significant savings.
Based on the foregoing, there is clear need for a mechanism for optimizing the computation of OLAP Ranking functions.
Techniques are provided for optimizing the computation of OLAP ranking functions. According to one embodiment, a push-down technique is used whereby the filtering predicate associated with a ranking function is pushed down into the window sort, which filters rows while sorting the data. The set of conditions ensures that the push down technique does not result in filtering out data that is needed in other window sorts in the query.