1. Field of the Invention
This invention relates to the field of database searching and is particularly applicable to enhanced searches in B Tree and B+ Tree database structures having at least two primary key columns, and wherein at least one of those key column's search values is specified by range.
2. Background Information
Searching through large databases for records that fall within a range of one key column value while requiring a match on another key is a problem that has had various solutions in the past, all of which have had some trade-off requirements related to efficiency. These searches in B+ Trees and B Trees (fairly common forms of database record structures) are very compute and I/O intensive in large databases. This is because many pages need to be brought into memory and searched in order to produce a complete answer to the query. While in some situations a partial answer may suffice, getting to that partial answer may be problematic also.
In the B+ Tree structure, at least one level of index pages exists which contains only keys pointing to records, and there is also at least one root index page containing keys. At the leaf level, pages of memory exist having the rest of each of the records stored therein and these are called leaf pages. Each leaf, which usually contains a number of records, generally also has a way to identify a next-sequential leaf page. Thus the last record in a leaf page may “point” (or the leaf page may have a pointer that points) to the first record of the next-sequential leaf page. There are many ways the next sequential page can be identified. Most commonly, a next page number, from which a next page address can be computed, is included in the current page in a location that can be read by a program accessing the leaf page. The index pages will contain locators, which point or otherwise provide an accessibility route to the leaf pages through the key values that they maintain. Because of these locators, one can search through the indexes to find a leaf that contains a record having the key(s) for which one is searching. The indexes' and the leaves' records are ordered, preferably in ascending order, although one could apply the principles described herein to descending order database organizations, or database organizations having a combination of ascending and descending ordered key and/or record values as well. This patent only discusses the databases having all data organized in ascending order, however, to reduce the complexity of the explanation of the invention, but one could apply the teachings here to any of the above-mentioned organization forms of database records and keys if desired.
It should also be noted that the invention can be applied to B Tree and is B+ Tree database structures, but the preferred embodiment was developed for B+ Tree structures.
Where a database contains many thousands of records and thousands of leaf pages, the Input/Output time required for each search through the records of a page slows down the performance time for the search. Where multiple index pages are used and these too must be fetched to be swapped in and out of main memory, additional time costs are imposed for each B+ Tree search requiring a reference to the index pages in order to determine the next page to fetch.
Accordingly, any procedure which optimizes the number of searched pages needed to accomplish a database query would be valuable.
The inventors have found a method, which provides superior performance for searching databases where the search has a range given for a first key column and at least a second key column value of interest to the search query.
While a user could specify parts of the range to skip, or provide other hints to the search engine if the user has direct control over it, in current networked systems, particularly using ODBC, JDBC, or other Java set-ups to communicate a query to a database program, the user is not allowed to program the specific hints about using just part of a key column. Therefore, automaticity in operating the search in an intelligent manner would be very helpful. (A “hint” in this context is a way of using a proprietary syntax to suggest a directing input from within the query. The proprietary database controller would use the hint to specify a directed limitation to the search query form, based on the hints in the appropriate syntax, given by the user.)
What existed in the past for doing B+ Tree searches of the specified complexity includes at least the List Search Key method provided by Unisys Corporation, the assignee of the instant patent, but which is not well documented in relevant aspects. However, this List Search Key method appears to be quite similar to a portion of the MDAM™ system provided more recently by Compaq Computer Corporation; and described on their website as of the time of the filing of this document and which is also, apparently, the subject of U.S. Pat. No. 5,778,354 issued to Leslie et al.
The MDAM search system, if operating on two keys, say a K1 And a K2 key, would exercise two methods for getting a next consecutive K1 value where a range of K1 values is specified in the search query, these methods determining whether the data qualifies as sparse or dense, and then doing the search for a next page to fetch in accord with the indicated data quality. This can be costly overhead and thus compromise search performance since there is no automatic way to turn this testing exercise off. Another deficiency in the MDAM search system is that if a range is specified, the prior art (including the “list search” by Unisys) would perform complete searches of all the key values for the key column range. This often will mean bringing extra leaf pages into memory and searching them even though the extra searching is not necessary.
There is a provision in the MDAM search system to allow for user specified de-selection of particular key column values, but there is no automatic way to de-select key column values. Also, because of limitations in available command communications for many ODBC and JDBC installations, even these user-directed de-selection of parts of a key range will not be available to optimize many searches. Further, if the user is not appropriately skilled in his knowledge of the data within the database, user-directed range modifications may actually hurt performance or a poorly crafted user directed query can even cause a miss of some records needed for an intended query. At a lesser level than a directed query, a user can provide hints within proprietary syntax to assist an MDAM query to function more efficiently, but such hints are not accepted in the SQL. Even if the proprietary “hint” syntax is available, such hints if in artfully contrived may cause a search to run slower than optimal, or even slower than the system would run it without the hints, depending on the skill of the user's formulation of hints.
Also, the MDAM search does not allow for moving to search for a next key column value when all the records for the second key have been located within a page for a first key value. Instead, MDAM automatically may force a new fetch and reloading of a current leaf page, since it discontinues searching on that page when it finds the last record for the second key at a given primary key column value.
There have been other attempts to improve performance of B Tree database searches, including, for example, a partial indexing scheme using an optimizer described by Sartori and Scalar in Partial Indexing for Nonuniform Data Distributions in Relational DBMS's, published in IEEE Transactions on Knowledge and Data Engineering, Vol 6. No. 3, June 1994. The overhead associated with such a method however, that is, establishing new indexes and maintaining them in partial form, may not be an acceptable part of a viable solution for many users.
Accordingly there is still a need for more effective and/or alternative search methods where a range will be specified for a key value in a multi-key search query, and one which provides automatic performance enhancement by operating multiple search strategies at appropriate times, given a variable or unknown data distribution.