1. Technical Field
This invention generally relates to a database management system performed by computers, and more specifically relates to the optimization of structured query language (SQL) queries using an encoded vector index (EVI) to generate statistics.
2. Background Art
An index in a book facilitates locating information on a specific topic quickly and without blindly paging through the book. Database indexes provide similar benefits by providing a method to quickly locate data of interest. Without an index, a database performs a full table scan, blindly searching through every row in a database table until the target data is located. Thus, depending upon where data resides in a database table, such a scan can be a lengthy and inefficient process.
Indexed scans of database tables are more efficient than full table scans since the length of database index entries are in most cases shorter than the database table entries. Shorter entries mean that more index entries can be stored in a single computer page. Indexed scans can therefore result in a considerable reduction in the total number of computer pages that must be processed in order to locate the requested data.
While indexed scans of database tables can improve performance, the complexity of the data being scanned and of the nature of the database query still determine how effectively a query can be implemented. Different queries place differing levels of processing demands on the database in unique ways. As a result, different index types are needed to cope with a user""s ever-changing workloads. One type of index is the encoded vector index (EVI), disclosed U.S. Pat. No. 5,706,495, Chadha et al., Jan. 6, 1998, Encoded-Vector Indices For Decision Support and Warehousing (hereinafter xe2x80x9cChadhaxe2x80x9d), which is incorporated by reference.
An encoded vector index (EVI) is a variation of the bitmap index concept. A bitmap index indicates whether a specific value exists for each row in a particular column. One bit represents each row. Thus, in the bitmap index for the value xe2x80x9cMNxe2x80x9d in the column xe2x80x9cLOCATION,xe2x80x9d the nth bit equals 1 if the nth row of the data table contains xe2x80x9cLOCATIONxe2x80x9d=xe2x80x9cMN,xe2x80x9d or 0 if that row holds a value other than xe2x80x9cMN.xe2x80x9d An EVI serves a similar purpose, but only one index is necessary to account for all the values occurring in the column (whether they be xe2x80x9cNY,xe2x80x9d xe2x80x9cMN,xe2x80x9d or any other). So in an EVI on the xe2x80x9cLOCATIONxe2x80x9d column, the nth position of the EVI contains a bit code that identifies the value of xe2x80x9cLOCATIONxe2x80x9d in the nth row of the table. Thus, whereas a separate bitmap index is required to map each particular key value in a database field, only one EVI is required to represent the same information. Thus, an EVI saves computer memory by including all possible key values for a given field in one database index.
Chadha discloses a method to efficiently scan relational database information by performing bit-vector operations on EVI""s, instead of performing analogous operations on the relational database table itself. However, the usage of and dependence on relational database tables has increased dramatically during the past decade, and continues to increase. Thus, new ways to use database tools, such as EVI""s, are needed in order to continue to provide significant improvements in query performance; otherwise, database users will be hampered in their ability to maximize intelligent information retrieval.
According to the present invention, an apparatus and method to significantly improve performance of SQL function processing through the use of an encoded vector index (EVI) is disclosed. An EVI provides the data necessary to generate statistics that can be used in determining a best approach for performing SQL functions. Specifically, by sequentially scanning through the EVI symbol table, one can readily count the number of values in one or more database fields that match a specific criterion. The resulting count can then be used to determine, from among two or more candidate approaches, an approach to use in processing an SQL command. By processing the EVI symbol table in lieu of more traditional database indexes, and/or in lieu of the database table itself, statistics for SQL functions are generated significantly faster.
In a specific disclosed embodiment, the performance of a SELECT SQL command that identifies first and second criteria for record selection applicable to first and second database fields, is enhanced by applying the criterion for each database field to each symbol in the EVI symbol table for that database field, to arrive at a count of the number of records having values in each field that match the criterion for that field. Then, based on the results of the count, the SELECT SQL command is performed by either (a.) selecting a set of records in which the first field does or does not match the first criterion, and then selecting from or adding to this set of records, records in which the second field does or does not match the second criterion, or (b.) selecting a set of records in which the second field does or does not match the second criterion, and then selecting from or adding to this set of records, records in which the first field does or does not match the first criterion. The approach taken may be chosen based on the relative count of records matching the first criterion compared to the count of records matching the second criterion, the logical operators applied to the criteria and the logical relationship between the criteria in the SELECT SQL command.
The foregoing and other features and advantages of the invention will be apparent from the following more particular description of embodiments of the invention, as illustrated in the accompanying drawings.