Queries are used to retrieve sets of data that match certain criteria. For example, a query could be used to retrieve from a data base a set of data for every person born in California AND living in California.
A bitmap index provides an efficient and fast means of retrieving data from a database. A bitmap index is an index that includes a set of bitmaps that can be used to access data. In the context of bitmap indexes, a bitmap is a series of bits that indicate which of the records stored in the body of data satisfy a particular criteria. Each record in the body of data has a corresponding bit in the bitmap. Each bit in the bitmap serves as a flag to indicate whether the record that corresponds to the bit satisfies the criteria associated with the bitmap.
Typically, the criteria associated with a bitmap is whether the corresponding records contain a particular key value. In the bitmap for a given key value, all records that contain the key value will have their corresponding bits set to 1 while all other bits are set to 0. A collection of bitmaps for the key values that occur in the data records can be used to index the data records. In order to retrieve the data records with a given key value, the bitmap for that key value is retrieved from the index and, for each bit set to 1 in the bitmap, the corresponding data record is retrieved. The records that correspond to bits are located based on a mapping function between bit positions and data records.
To retrieve records matching criteria that can be represented by multiple key values, bitmaps can be combined using logical operations into a resulting bitmap. The resulting bitmap is used to retrieve the data. For example, FIG. 2A illustrates a Table 200 that contains 8,000,000 rows. Every row contains a name and an age. Retrieving a set of data for the condition age greater or equal to 25 can be performed by generating a resulting bitmap that represents the combining of the bitmaps with key values that match the condition.
FIG. 2B illustrates the combining of bitmaps that match the condition age greater than or equal to 25. Bitmaps 260 are some of the bitmaps matching this condition in a bitmap index of the age. The zeros and ones in bitmaps 260 are bits in bitmaps 260. According to one embodiment of the invention, each bitmap in bitmaps 260 is associated with a key value representative of an age. The position of the bits in bitmaps 260 correspond to the row ids of table 200 . These four bitmaps are combined using an OR operation to yield resulting bitmap 290. All bits shown in bitmap 290 are set to 1 because an OR operation is being performed and, among all bitmaps 266, there is bit set to one for each shown row.
While combining bitmaps to retrieve a set of data can be more efficient than other retrieval approaches, combining bitmaps can use undesirably large amounts of memory. For example, the bitmaps shown in FIG. 2B each represent a row in Table 200. Because each bitmap contains a bit for each row, any single bitmap could contain up to 8 million bits, or 1 million bytes (assuming 8 bit bytes).
Like any resource in a computer system, memory is limited. In a multi-process environment, numerous processes are concurrently competing for memory. To accommodate the competing demand for memory, memory limits are imposed upon processes. A process itself may self impose memory limits, the operating system may impose a memory limits upon the process, or some other mechanism may impose a memory limits.
The compression approach is one approach used to avoid exceeding memory limits when combining bitmaps. One way to compress bitmaps is to represent a sequence of bits set to 0 with a smaller sequence of bits containing a number. A sequence of bits set to 0 is referred to herein as a gap. The number contained in the smaller sequence of bits represents the number of bits set to zero in a gap. Compression can lessen the amount of memory needed to store a bitmap. The effectiveness of compression increases as the size of the gaps found in the bitmap increases.
A problem with the compression approach is that compression by itself can be ineffective for lessening the amount of memory needed for a bitmap like bitmap 290. Bitmap 290 has a bit set to 1 for every row meeting the condition age greater than or equal to 25. Assuming over half the rows contain an age greater than 25, over half the bits in bitmap 290 would be set to 1. Bitmaps with this many bits set to one inherently contain small gaps. Very little compression can be achieved, thus very little memory is saved.
Another problem with the compression approach involves large bitmaps. Some bitmaps may be so large that even when compression is effective, the memory required nevertheless exceeds the memory limit.
Based on the foregoing, it is clearly desirable to provide a method that combines bitmaps of indefinites size and number within a memory limit.