In query processing systems, such as the relational database management system (RDBMS) DB2™, data values are extracted from stored images of the data for further processing by the query evaluation system. Typically, the data is structured as rows comprised of column values, said rows being grouped into contiguous storage blocks known as pages. A part of the task of query evaluation comprises the process of isolating successive rows and extracting a (possibly proper) subset of the columns of the row for subsequent query evaluation steps such as filtering, sorting, grouping, or joining.
Extracting column values from pages involves steps of identifying and locating in main memory the page containing the next needed row, locating the next needed row within the page, locating the needed column values within the needed row, and copying the needed column values to new locations in memory where they are made available for subsequent query evaluation steps. Typically, locating a page in memory requires determining whether the page is in main memory and, if so, determining where in memory the page is located. If the page is not in main memory, the page must be brought to main memory from secondary storage (typically from disk).
Additionally, in query evaluation systems supporting concurrent query executions, steps must be taken to stabilize the page to ensure that it remains at the same location in memory and to avoid concurrent read and updates to the page to preserve the logical integrity of the page contents. Subsequent to copying needed column data values to new locations, the page stabilization conditions must be released.
The steps of accessing data by locating a page, stabilizing the page, locating a row in the page, and releasing stabilization for each row to be processed by the query evaluation system can constitute a significant portion of the overall execution cost of a query.
Prior art query evaluation systems, such as RDBMSs, use different approaches to avoid repeatedly accessing rows in a page by following the potentially costly steps set out above. For example, where there are predicates in queries that are to be satisfied, it is possible to evaluate the predicates for located rows before retrieving the sets of column values of interest for the queries. Where a row does not meet the predicate condition, the next row (potentially on the same page in the data) may be accessed without requiring a renewed stabilization of the page. The existing location in the page is also known, which may reduce the cost of locating the next row.
This application of predicates to column values of a current row while the column values still lie with their row in the currently identified page is sometimes called search argument (or SARG) processing. This processing approach allows the system to continue to the next row on the same page without releasing page stabilization, re-identifying the location of the page in memory, and re-stabilizing the page whenever the SARG predicate(s) are not satisfied. Additionally, programmatic book keeping associated with transfer of control between page processing and query evaluation components of the query processing system can be avoided for rows which would soon be discarded subsequent to a predicate being evaluated using the copied column values.
Another prior art approach to reducing the need to restabilize the data page involves processing the needed columns of the current -row directly from its page in the data and continuing directly to the next row on the page. Typical processing operations which can “consume” column values directly from the page include sorting (enter column values into the sorting data structure) or aggregation (include column values in the running results for SUM, AVG, MAX, etc.). This type of processing is sometimes referred to as “consuming pushdown”, because there is a ‘pushdown’ of a consuming operation into data access processing.
The above approaches, however, apply only where there is a predicate to be evaluated, or where there is a consuming operation carried out as part of the query execution. In query processing systems, such as RDBMSs, there are other types of queries that are potentially costly to execute and which are therefore not susceptible to the above approach. An example of such a query is a query having non-predicate and non-consuming operations but which filter data values.
It is therefore desirable to have a query processor which is able to execute a query including filtering in a manner that reduces the number of page stabilizations required to execute the query.