1. Field of the Invention
The present invention relates to a method, system, and program for searching a list of entries when search criteria is provided for less than all of the fields of the entries. In particular, the list of entries comprise a database index having a separate column for each field in the index entries.
2. Description of the Related Art
Data records in a relational database management system (RDBMS) are maintained in tables, which are a collection of rows all having the same columns. Each column maintains information on a particular type of data for the data records which comprise the rows. One or more indexes may be associated with each table. An index is an ordered set of pointers to data records in the table based on the data in one or more columns of the table. In some cases, all the information needed by a query may be found in the index, making it unnecessary to search the actual table. An index is comprised of rows or index entries which include an index key and a pointer to a database record in the table having the key column values of the index entry key. An index key is comprised of key columns that provide an ordering to records in a table. The index key columns are comprised of the columns of the table, and may include any of the values that are possible for that particular column. Columns that are used frequently to access a table may be used as key columns.
Index entries are often stored in a B-tree data structure, or balanced tree. A B-tree consists of a single root page and a number of leaf and non-leaf pages. The leaf pages store the index entries. Each index entry consists of key values and the physical address or identifier of the row or record in the data base table which has that value in its key columns. A non-leaf page stores key values and pointers to other index pages. An index may be searched for matching key values by an index scan. A scan of the index traverses from the root page down to the leaf page looking for the index entries that have the matching keys. The pages which store the index entries are maintained in a storage device, such as a hard disk drive or other non-volatile memory accessible to the database program.
A query can be made against an index to retrieve index entries using a search key that includes values for each of the key columns. The search locates the first index entry that has a key value that matches the search key, i.e., the values in the key columns in the search key match the values for the key columns in the index entry. A query often scans the rows of a table or index applying a search predicate for a comparison operation. A query of an index can qualify rows of a table for retrieval or access. The cost of an index scan is equal to the input/output (I/O) processing time to search the index plus the I/O time needed to access the index entries that qualify or match the search predicate. The time to transfer qualified matching index entries from physical storage to memory is usually many times greater than the time to scan the index entries in memory. Because transferring data from physical storage to memory takes substantially longer than processing time, the total time required to manipulate data can be substantially reduced if the number of data transfers from storage to memory can be reduced or otherwise minimized. Methods for optimizing data retrieval using index scanning are described in the commonly assigned U.S. Patent entitled xe2x80x9cMethod and Apparatus for Optimizing Data Retrieval Using Index Scanning,xe2x80x9d U.S. Pat. No. 5,758,146, which patent is incorporated herein by reference in its entirety.
An index may have multiple-columns for the key, e.g., C1, C2, C3. A B-tree would include values for each of the records in the multi-column index. An index defined on multiple key columns can be searched or accessed on any subset of key columns.
To search a multi-column index with a search key providing a search predicate for each key column, e.g., C1, C2, and C3, the search engine will scan the index and find qualifying index entries that have key column values that match those of the search key. Qualifying index entries includes accessing those index entries that match he search key by transferring the page including the qualified index entry from storage to computer memory.
Queries may be executed in which there are no search criteria or predicates for one or more of the key columns in a multi-column index, referred to as missing key columns. In such case, the search will request a range of entries that include every possible value of the missing key column or combination of the missing key column with key columns of lower order than the missing key. The leading or highest order key column in a key is the leftmost key column, or key column that determines the highest order of an entry. The key columns following the leading key column have the next lower order with respect to the previous key column. For example, a search on a multi-column index having keys columns C1, C2, and C3 may involve search predicates on C1 and C3, but not C2. In such case, the query requests all index entries in a range defined by the search criteria of C1 and all possible values for C2 and C3, which are the missing key column and lower order key column. For instance, if the search criteria requested all index entries having C1=1 and C3=3, the search engine would qualify all index entries matching the keys of (1, 0, 0) to (1, max, max), where the lowest possible value for C2 and C3 is zero, and max is the highest possible value. The accessed or qualified index entries will be further processed to determine those that satisfy all search criteria. The problem with such search techniques when criteria are not provided for all key columns is that numerous index entries may qualify, thereby requiring I/O retrievals from secondary storage of entries that do not match all the search criteria. Moreover, if the leading key column is the missing key column, then all index entries in the multi-column index will qualify, i.e., the entire contents of the B-tree including the multi-column index will be returned.
For example, the three column index may index records concerning product sales by date (C1), geographical location of sale (city) (C2), and product sold (C3), e.g., C1 represents 100 possible dates, C2 represents 1,000 possible cities, and C3 represents 10,000 possible products. If a query is executed for sales in January of 1998 for all hats, regardless of city, i.e., no search predicate for C2, then the search engine would qualify all index entries that had a date of January, 1998, regardless of the city and product values. This could require the possible access, i.e., I/O transfer, of 10 million index entries (1,000 cities times 10,000 products), even though only a small subset of such index entries would match all the search criteria. The search engine will then process the qualified index entries to locate those that match the two search criteria for C1 and C3.
Thus, in a multi-column search operation where there are no search predicates for one or more of the columns, current search techniques would qualify a substantial number of records that do not meet all the search criteria. In large database systems where there are thousands of possible values for any column, having to qualify non-matching index entries for all possible values on certain key columns could involve a substantial number of data transfer operations. As discussed, one of the primary time constraints for search operations is the time to access the index entries, i.e., transfer the index entries from non-volatile storage to computer memory for further processing.
Accordingly, there is a need in the art for an improved system, method, and program for qualifying index entries in a multi-column index when there is no search predicate or criteria for one of the columns in the index.
To overcome the limitations in the prior art described above, preferred embodiments disclose a system, method, and program for searching an ordered list of entries. Each entry is comprised of multiple fields. Each field may include one of a range of possible values for that field. If search criteria is provided for less than all of the fields, then multiple search keys will be generated such that each generated search key includes a search value for each field. The search value for each field for which there are no provided search criteria comprises one of the possible values for that field. A search operation is executed for each of the generated search keys to determine entries in the list that include field values that match the search values in the search key.
In further embodiments, the list of entries may comprise a database index having rows of index entries and a column for each field. In such case, an index entry includes values for each column in the index and information addressing at least one record in a database table.
In still further embodiments, each search operation is executed after generating the search key for the search operation. Further a feedback entry in the list is received in response to the search operation. The feedback entry is processed to determine a next search key to use in the next search operation.
In yet further embodiments, the processing of the feedback entry to determine the next search key comprises determining a next possible entry in the list that would immediately follow the feedback entry in the list according to the ordering of the entries in the list. This determined next possible entry is used as the next search key.
Preferred embodiments provide improved methods for generating search operations for a search on a multi-column index that does not search on or qualify index entries that do not satisfy all search criteria. Preferred embodiments provide methods for selecting a search value for the missing column. The search keys used to execute the search include the search predicates and a selected value for the missing column(s). This is an improvement over prior art techniques which qualify all index entries in a range of entries including all possible values for the missing key column and any lower order key columns. Such prior art search techniques often qualify a substantial number of index entries that do not satisfy all the search criteria. Qualifying an index entry requires accessing data from storage. In large databases where an index has millions of rows, qualifying non-matching search entries could consume substantial processing resources and significantly impact search performance. Certain preferred embodiments utilize the column values for index entries returned in response to a search request to determine the search values to use for the next generated search key. Using the results of previous search operations as feedback to generate a subsequent search key minimizes qualifying non-matching search entries between the last retrieved entry and the feedback entry.