This invention relates generally to data management, and particularly to management of data within an index tree.
Binary radix trees are known structures for arranging data within a computer system. Binary radix trees of data are indexed, and an index tree comprises a plurality of elements, including decision nodes which are used to direct a search through the index tree to find the desired data. The desired data within a binary radix tree may be accessed with one or more key values or keys which define the path to data within the index. For example, decision nodes in an index tree use the keys to define the way that a search through an index will progress. Ranges of keys designated by key endpoints are often utilized in searching a binary radix tree for database management applications. Such ranges of keys contain a specific number of keys, and database management applications rely on the accuracy of that number for efficient operation.
The blocks of data of a binary radix tree, and specifically the index elements of the tree, are grouped in units referred to as xe2x80x9clogical pagesxe2x80x9d. Therefore, to search an index according to key values, various pages must be searched. The index element used to link the various pages of an index together is referred to as a page pointer. Specifically, data and index elements within an index tree are segmented into logical pages, which will generally determine the size of physical I/Os. When a set of index elements is too large to fit within a single logical page, some of the data is split off to a new page, and it is replaced with a page pointer element which points to the new page. Those pages which do not include any page pointers to additional pages are referred to as leaf pages. Those pages which do include page pointers to additional pages are referred to as non-leaf pages or limb pages within an index tree.
Database management applications utilize a number of operations, such as Query and Join operations. When such operations are performed on a database, pages of data are accessed, and decisions have to be made with respect to the most efficient ways to perform the operations in regards to the particular index of a database and with respect to the range of key values to be utilized in the operation. If all of the pages to be searched for an operation are resident in fast access memory, such as the local RAM of a computer, searching an index key range is very efficient. However, not all of the pages of the index will typically be in the local RAM. Local memory is generally constrained such that it is not large enough to contain all of the pages of data in binary radix trees; therefore, the vast majority of the data is maintained within an external storage medium, such as on an external disk. The pages for the database operation therefore must be retrieved from the external storage and placed in the fast access RAM for database operations. Since significant time is required to retrieve the pages from external storage, it is desirable with respect to a particular database operation to determine how much data within a binary radix tree is needed and how many keys (and associated pages) are within a defined range for the particular operation, in order to optimize the operation.
Within a binary radix tree, the majority of data pages lie at the lowest level of the tree and are leaf pages. The other pages are non-leaf pages, such as limb pages and, ultimately, a trunk. Most index keys terminate on leaf pages. Within a database operation, a range of keys within the index tree might be designated for the operation as noted. Therefore, determining the number of keys within a key range for a particular database operation has required the step of retrieving large quantities of data pages, and particularly leaf pages, from external storage. Thus, such an operation has been significantly time consuming.
Attempts have been made to reduce the time for database management operations by estimating the number of keys within a key range indicated for an index and then performing the database management operation based on such an index key range estimate. For example, U.S. Pat. No. 4,774,657, which is commonly owned with the present application, provides one way of making an index key range estimation which eliminates having to retrieve and search all the pages containing keys within a specified key range. U.S. Pat. No. 4,774,657 is hereby incorporated herein by reference in its entirety. That patent makes an estimate of the number keys in a binary radix tree key range, such as to be processed for a particular Join or Query operation, as a function of the number of pages referenced but not retrieved during a level limited search within the index. In the patent, a key range of an index is defined by designating endpoint keys. The index is then searched down to the lowest level indicated by one of the endpoint keys. Knowing the lowest level of a range endpoint, a level limit is then calculated, and the estimation is based upon searching down to that level limit to determine the number of pages that were referenced. The number of pages referenced is then multiplied by the average key density per page for the whole index and the number of keys within the range is then calculated as an estimate.
While U.S. Pat. No. 4,774,657 has various valuable aspects with respect to database management optimization, the estimation routine assumes a balanced index tree wherein the depth of the leaf pages within the index tree does not vary significantly. However, the depth of leaf pages within the index may vary significantly, especially within binary radix trees. Therefore, utilizing the method of U.S. Pat. No. 4,774,657 can sometimes lead to lower and inaccurate key range estimates when the depth of part of the tree is deeper than expected. When the depth of the tree is shallower than expected, the estimator may fault in leaf pages unnecessarily. As a result, Query performance and other database operations can be degraded.
Accordingly, there is a need for further improvements within database management of a binary radix tree index for improved database management operations.
There is further a need for an improvement which takes into account the possible imbalance of the depth of the tree for more accurate key range estimates and thus more optimal performance of the desired database management operations.
The present invention addresses those needs in the prior art and other needs, and provides an improved key range estimator for binary radix trees, as discussed and disclosed further herein below.
The present invention addresses the above-objects by scanning an index into a page pointer is found and incrementing a counter for each page pointer, but not going to the corresponding leaf if the pointer is a leaf page pointer. Specifically, an apparatus, program product and method for estimating the number of keys within an index key range comprises the step of defining a left end point and a right end point of the key range and finding the point of the index in which the pass to the left and right end points diverge. Starting at the divergent point, a scan is made of the index until a page pointer is found, and a counter is incremented for each page pointer that is found. A determination is then made to see if the page pointer points to a leaf page or if it points to a non-leaf page within the index. If the page pointer points to a leaf page within the index, the scan is continued to find the next page pointer without going to the corresponding leaf page. Otherwise, if the page pointer points to a non-leaf page, the scan is continued to find the next page pointer, such as by going to the non-leaf page. The incrementing of the counter continues for each page pointer that is found, and the scan continues without going to any leaf pages. The index is scanned for the entire key range. After the scan is complete, a determination is made of an estimated number of keys utilizing the counter value.
In accordance with one embodiment of the invention, the estimate number of keys is made by multiplying the average number of keys per page for the index by the amount of the counter.
In another embodiment of the invention, the method of determining whether the page pointer points to a leaf page or a non-leaf page is made by maintaining an indicator within the page pointer and then checking the indicator to determine its status. Specifically, a leaf indicator bit might be utilized within the page pointer and the status of the bit is checked to determine whether the page pointer points to a non-leaf page or a leaf page.
When a page pointer for a non-leaf page is encountered, the scan will generally continue by eventually going to the non-leaf page which corresponds to that page pointer. In accordance with one aspect of the present invention, in order to more efficiently scan the index, a scan ahead is made in the index to find a second non-leaf page pointer. If a second non-leaf page pointer is found, the non-leaf page corresponding to the second non-leaf page pointer is asynchronously brought into local memory from external storage while the scan proceeds to the non-leaf page corresponding to the first non-leaf page pointer. In that way, the program will often already have subsequent non-leaf pages in local memory for rapid access as the scan continues.
The various features of the invention and other features will become readily apparent from the detailed description of the invention herein below.