The increased focus on complex queries for data warehousing and OLAP (OnLine Analytical Processing) has resulted in a revival of interest in bitmap indexes. The basic idea behind a bitmap is to use a single bit (instead of multiple bytes of data) to indicate that a specific value of an attribute is associated with an entity. A bit-mapped index is simply a very, very long string of bits, commonly called bit vector or bitmap. Each bit in the bitmap represents each row in a table, and the bit is set to 1 if an associated entry is contained in the list represented; otherwise, the bit is set to 0. The relative position of the bit within the bitmap can be mapped to the relevant record ID of the row in the table.
This technique is particularly attractive when the set of possible values for the index key is small. Input/Output (I/O) is significantly reduced when a large fraction of a large table is represented using bitmap lists. However, when a large number of values exist in an index, it would require large number of bitmaps that are likely to be rather sparse (i.e., very few bits will be 1 in the bitmaps) and would result in heavy storage requirements for storing a lot of zeros.
Therefore, the bit mapped approach is not practical for large dimensions and fact tables. The impracticality leads to a better bitmap schema called Encoded Vector Index (EVI) that retains much of the processing advantages of bit-mapped indexing and can also support very large tables with larger cardinalities. An EVI consists of a Symbol Table and an Encoded Vector. The Symbol Table contains a sorted list of all the distinct values of a column in a table, a unique code assigned for each distinct value, and an occurrence count for each distinct value that indicates the number of rows in the table with that distinct value. The Encoded Vector is an array with a dimension equal to the number of rows in the table. Each entry in the Encoded Vector contains the code from the Symbol Table that corresponds to the value contained in the row of the table.
By way of example, FIG. 1 illustrates a data table 10, Table A, a symbol table 12, and an encoded vector 14. The data table 10 includes data identified in an ID column by appropriate symbols. The symbols include the alphabetic characters `A`, `E`, `I`, `K` and `W`, as shown in the symbol column of symbol table 12. The associated encoded value for each symbol is also included in the symbol table 12. These encoded values, `0`, `1`, `2`, `3`, and `4`, are utilized to represent the data in the data table 10 in the encoded vector 14, as shown.
FIG. 2 illustrates a prior art approach to data searching in an environment that utilizes an encoded vector index. When performing a search of the encoded vector index, the process initiates with the receipt of a search query (step 20). The items to be searched, either range or point, are then used in conjunction with the symbol table. Thus, a sequential look-up of the symbol table is performed with each search key to develop a candidate code list (step 22). Then, using the candidate code list, the candidates from the candidate code list are compared with each entry in the encoded vector (step 24). When any one of the candidates in the candidate code list matches the entry data of the encoded vector, a bit is set in a temporary bitmap (step 26). The temporary bitmap provides the results to the search query over the entire encoded vector.
To illustrate the prior art approach, the following search query is presented and performed using the example data table 10, symbol table 12, and encoded vector 14 from FIG. 1:
Select * from TableA PA1 where `A`&lt;=TableA.Key&lt;=`F` OR
TableA.Key=`J` OR PA2 Table A.Key=`K`
The resultant candidate code list 28 is shown in FIG. 3 and includes the range of values [0,1] and the value `3` in accordance with the encoded values associated with the symbols that match the search query. FIG. 3 further illustrates the use of the candidate code list in conjunction with the encoded vector 14 that results in the temporary bitmap of search results 30. As described above, for each entry in the encoded vector 14, the entire candidate code list 28 is compared against each entry to determine whether the entry meets the search criteria. For each entry that does meet the search criteria, a bit is set to a `1` value in the temporary bitmap 30.
While the temporary bitmap does provide sufficient search results, the process of producing the temporary bitmap may be quite time-consuming due to two possible problems. When the search query produces a long list of search keys and the symbol list is long, the sequential process that produces the candidate code list takes a significant amount of time. Further, when the candidate code list is long, the sequential process of comparing each candidate to the encoded v tor entry also takes a significant amount of time.
Accordingly, a need exists for more efficient vector index searching. The present invention addresses such a need.