The invention relates to retrieving records from a computer database.
A computer database typically contains tables having a number of records filled with data. See, for example, an example database table 10 shown in block diagram form in FIG. 1a. Each record 12 in the table 10 can be thought of as a row of the table and contains a number of fields of data. For example, each record in a client database likely contains a LAST.sub.-- NAME field 14, a FIRST.sub.-- NAME field 16, a STREET.sub.-- ADDRESS field 18, a CITY field 20, a STATE field 22, and a ZIP.sub.-- CODE field 24. In addition, each record also has a database key, i.e., an address at which it is stored in the database file.
In order to facilitate retrieval of records from a table, some of the fields are "indexed" by their record's key. That is, a separate index file for each indexed field contains a sorted list of keys for all of the records in the database. See, for example, a series of index files 26 shown in block diagram form in FIG. 1b. The keys 28 in each index file are sorted according to the contents of the indexed field, either in ascending or descending order. For example, an index file for the indexed field LAST.sub.-- NAME contains the keys for all of the records in the client database sorted by the contents of the LAST.sub.-- NAME field. Likewise, an index file for the indexed field ZIP.sub.-- CODE contains the keys for all of the records in the client database sorted by the contents of the ZIP.sub.-- CODE field. In this way, the records in the database can be retrieved in sorted order. For example, by reading through the index file for the indexed field LAST.sub.-- NAME and retrieving the record to which each successive key points, one can retrieve the database records sorted by name. Likewise, by reading through the index file for the indexed field ZIP.sub.-- CODE and retrieving the record to which each successive key points, one can retrieve the database records sorted by zip code.
It is further possible to index several fields at once as a group of fields, i.e., an "index only" field. Thus, retrieval of an index only field retrieves several fields at once, which is sometimes convenient when retrieving two closely related fields such as LAST.sub.-- NAME and FIRST.sub.-- NAME or YEAR and DATE.
Once the records are entered in the database and the index files are generated, a user may submit queries to the database in order to retrieve a single record or numerous records. For example, the user might wish to retrieve the record for a number of clients, all of whom are named "Smith". Therefore, he enters a query to retrieve a record in which the contents of the LAST.sub.-- NAME field are "Smith". Since LAST.sub.-- NAME is an indexed field, the database system can go directly to the name "Smith" in the index file, and retrieve each record having "Smith" in its LAST.sub.-- NAME field. On the other hand, the user might, for example, wish to send a metered mailing to all of the clients announcing a change of address. In such a case, the user would enter a query to retrieve all records by ZIP.sub.-- CODE.
As might be expected, performance of a database, e.g., the time it takes to retrieve records, is directly related to the number of records in the database and the complexity of the query. In present database systems, efforts are made to optimize an execution strategy and to minimize retrieval time. A single, optimized execution strategy is generated by a high level language compiler and executed by the database system in the same way for every database access. If the strategy is inefficient or in error, then the actual performance of the database can suffer.
For example, a database query is compiled in the same way, according to the same execution plan, every time it is compiled, regardless of the actual number of records it retrieves. A high level language compiler which compiles each database query before executing it includes three components: (1) a parser which translates ASCII tokens, e.g., Pascal code, into internal definitions which are easily "shuffled"; (2) an optimizer which shuffles the definitions to perform logical optimization, does an extensive combinatorial search of all or a substantial portion of all possible combinations between the operators, and creates an execution plan which is typically a tree of operations; and (3) a code generator which finalizes the plan and creates machine executable code.
Once the query is compiled, an executor executes the plan by walking the tree and executing the operations found therein. The tree is walked strictly according to the execution plan. For example, queries to perform operations on various database records almost always require that the database program retrieve the record. Record retrieval, which occurs at each leaf node of the tree, is done either by an indexed retrieval method or by a sequential retrieval method, a choice that was made when the plan was created. But, depending on the peculiarities of a query and circumstances such as data distribution or the number of retrievals to be done, one access method is more efficient than the other. And it is difficult, if not impossible, to predict how many records are going to be retrieved at each leaf. For example, there could be zero, ten, a thousand, or more records at any one leaf. Thus, strictly following the execution plan can prove to be very inefficient. It also follows that the larger the tree, the more inefficient the operation.