Row identifier (RID) list processing, also referred to as record identifier list processing, is a technique generally used for execution of a query to avoid excessive get-page calls and random inputs/outputs (I/Os) caused by scanning of a table with a poorly clustered index. There are usually three steps involved in RID list processing. During the first step, a list of row identifiers satisfying a query is compiled by scanning one or more indexes. Next, the list of row identifiers is sorted in order of page number. Finally, the needed data pages are prefetched in page sequence of the sorted RID list.
The row identifiers for qualifying rows/records from the one or more indexes are typically sorted in a RID pool so that data pages can be processed in sequence. Only data pages with qualifying rows will be read. Multiple qualifying rows found in one data page will only require a single I/O to the page, which may reduce the overall number of I/Os compared to direct index access without RID list processing. Data pages may then be prefetched asynchronously in sequence as they appear on a direct access storage device (DASD) to reduce seek time and reduce I/O wait time.
Selection of RID list processing over performance of a tablespace scan is sometimes based on insufficient or inaccurate statistics at bind time (i.e., when an access path/plan is chosen for a query). In addition, because execution time may occur hours, days, or even months after bind time, the statistics used to bind the query may be stale by the time the query is executed. As a result, some database management systems (DBMS) include mechanisms to terminate RID list processing under specific circumstances.
One such mechanism terminates RID list processing when the number of row identifiers retrieved during execution exceeds a threshold percentage of the total number of row identifiers in a table. Under another mechanism, when the amount of memory allocated to a RID list process is fully utilized prior to completion of the RID list process, the RID list process is terminated.
With the above mechanisms, RID list processing yields poor performance because the RID list processing is terminated, the row identifiers collected are discarded, and the entire execution process is restarted with an alternative plan, such as a tablespace scan. In addition, if only a few more row identifiers needed to be retrieved to complete the RID list processing, it would be more efficient to continue with the RID list processing than to terminate it and start over with another access plan. Further, terminating a RID list process can affect other concurrent processes that may be sharing memory space with the terminated RID list process.
Accordingly, there is a need for a technique that better manages RID list processing to more efficiently use shared memory and avoid waste.