When compared to some of the many indexing schemes, such as binary tree (B-Tree) indexing and hash cluster indexing, bitmapped indexing provides favorable performance and storage characteristics, particularly for multiple predicate (condition) queries on low cardinality (few possible values) data. Logical operations performed using bitmapped indexes can be extremely efficient because they are amenable to machine level operations. On the other hand, relative to some other indexing schemes, bitmapped indexes require very little space. For example, under some circumstances, a single bitmapped index entry can contain the equivalent of thousands of B-Tree entries.
Consider the data table 100 illustrated in FIG. 1. Data table 100 consists of intersecting columns and rows which form a matrix of cells. Each of the three columns contains a type of customer data consisting of either CUSTOMER#, LOCATION, or TYPE, while each row contains all of the data for a particular customer. CUSTOMER# is a unique number and therefore (for a large number of customers) is a high cardinality data type. On the other hand, the LOCATION data type has only four possible values (NORTH, SOUTH, EAST, WEST) and is therefore a low cardinality data type. Similarly, the TYPE data type has only two possible values (BUSINESS, INDIVIDUAL), also making this data type a low cardinality data type.
FIG. 2A illustrates the index entries of a bitmapped index 200 for data type LOCATION of data table 100. Each index entry in bitmapped index 200 corresponds to one of the four possible values (key value) for the data type LOCATION and includes both the key value and a bitmap. Each bitmap contains a string of bits, with each bit (left to right) corresponding to a specific row (top to bottom) in the LOCATION column of the data table 100. A bit value of "1" indicates that the cell at the intersection of the LOCATION column and the row to which that bit corresponds in data table 100 contains the key value for that bitmapped index entry, while a "0" indicates that the cell does not contain the key value for that bitmapped index entry.
For example, the second bit 202 and the sixth bit 204 in the bitmap of the first entry 206 of bitmapped index 200 are both "1"s, indicating that the rows 208 in data table 100 (the second and sixth rows) contain the value NORTH under the LOCATION column. Since all remaining bits in the bitmap of the first entry 206 are "0"s, none of the remaining rows under the LOCATION column in data table 100 contain the value NORTH.
FIG. 2B illustrates the index entries of a bitmapped index 250 for TYPE based upon data table 100. Since the data type TYPE only has two possible values, (BUSINESS, INDIVIDUAL), bitmapped index 250 only contains two index entries.
A bitmapped index could also be easily generated for CUSTOMER#. However, such a bitmapped index would require six index entries, one for each possible value of CUSTOMER# and consequently would not be very efficient. Nevertheless, based upon the size of bitmapped indexes 200, 250 for LOCATION and TYPE respectively, bitmapped indexes for low cardinality data require little storage space.
Turning now to FIG. 3 which illustrates a query table 300 for business customers (TYPE=`BUSINESS`) located in either the EAST or SOUTH (LOCATION=`EAST` OR `SOUTH`). Each entry (row) in query table 300 is obtained directly from bitmapped indexes 200, 250 for LOCATION and TYPE, respectively, as indicated on the left hand side of query table 300. The result 302 is easily and quickly determined using low-level (AND/OR), highly efficient logic. Other queries may be similarly processed so long as a complete bitmapped index is available for, and accurately reflects, each data type in the underlying data table.
Despite the favorable performance and storage characteristics of bitmapped indexes, they are not without their disadvantages. Bitmapped indexes are only useful if they accurately reflect the data upon which they are based. Consequently, whenever a data table is changed, all bitmapped indexes based upon that data table must be updated to reflect the change to the data table.
Updating a bitmapped index can require substantial system resources since the appropriate bitmaps must be loaded into volatile memory, updated and then written back out to disk. Since each bitmap contains one bit for every row in the corresponding data table, updating bitmaps for large data tables can require significant amounts of memory and processing power. Moreover, a large amount of UNDO (undo update) and REDO (recovery) information must be maintained. For example, referring to data table 100 (FIG. 1), a change of the location of CUSTOMER# 101 from EAST to WEST affects the index entries in bitmapped index 200 (FIG. 2A) for the key values EAST and WEST. Consequently, to update bitmapped index 200, the bitmaps in both the EAST and WEST index entries must be loaded into memory, the left most bit of each bitmap flipped, and then the updated bitmaps are written to disk. Although each bitmap in bitmapped index 200 only contains six bits, corresponding to each of the six rows in data table 100, other data tables may have thousands or hundreds of thousands of rows, requiring each corresponding bitmap to also have thousands or hundreds of thousands of bits. Updating bitmaps containing thousands or hundreds of thousands of bits can require substantial system resources.
For ready-only, read-mostly or warehousing environments, the underlying data is changed infrequently enough to minimize the frequency of bitmapped index rebuilds. However, for high OLTP (online transaction processing) environments or other environments involving frequent insert, update and delete operations, the high frequency of bitmapped index rebuilds can significantly reduce system performance.
In addition to adversely affecting system performance, rebuilding a bitmapped index can adversely affect data concurrency in multi-user systems. Since each entry in a bitmapped index covers many data blocks in the data table upon which the bitmapped index is based, many rows in the data table must be locked (low granularity locking) while the bitmapped index entry is being updated so that additional changes cannot be made to any of the rows indexed by the bitmapped index entry during the update. As a result, while a bitmapped index is being updated, no other users can make changes to the data table, resulting in low data concurrency.
In view of the burdens placed on system resources and the adverse effects on data concurrency discussed above, a method and apparatus for implementing bitmapped indexing with minimal effects on system resources while providing high data concurrency through high granularity locking is highly desirable.