Not applicable.
1. Field of the Invention
The present invention pertains generally to a word aligned compression method for arbitrary length and content bitmaps, and more particularly to methods of logical operations on such word aligned compressed bitmaps.
2. Description of the Relevant Art
U.S. Pat. No. 5,363,098, incorporated herein by reference, discloses a method for compressing, decompressing, and logically manipulating arbitrary bitmaps. The method uses what has been found to be a relatively computationally expensive byte boundary compression (BBC) method that produces very good compression ratios. However, due to the byte boundaries used, the BBC method uses significantly more computational effort than a method based upon the compression method using word alignment strategies, as described below.
Many commercial and scientific applications need to search and otherwise query large amounts of data. When the data is modified infrequently, as is the case in general databases, image databases, statistical databases, data warehouses, and large datasets produced by scientific experiments and simulations (generally referred to as write oncexe2x80x94read many datasets), one of the faster indexing strategies is the bitmap index. Bitmap indexing methods are frequently the most efficient scheme for ad hoc searches on large datasets, a type of operation common in on-line analytical processing (OLAP) and business intelligence systems.
To reduce the large sizes of bitmap indices, the bitmaps are compressed. However, general-purpose compression schemes are not efficient for database query operations because such operations on the compressed bitmaps are too slow. A number of specialized schemes have been devised to address the lack of speed issue; such as the Byte-aligned Bitmap Code (BBC mentioned above). BBC has generally been considered to be most efficient of all known compression methods.
The new bitmap compression technique described herein uses a bitmap compression data structure and associated computer programmed methods tailored to use the compressed data structure. The bitmap compression method described herein, called Word-Aligned Hybrid (WAH) bitmap compression, increases the execution speed of operations by utilizing the compressed bitmaps. To achieve this increased speed, a compression method has been designed to store data in a way that requires less central processor unit (CPU) effort to compress and decompress bitmaps. Inventor tests have shown that on specific real and synthetic data sets, on the average, the WAH method may be as much as ten times faster than the BBC compression scheme when performing logical operations between compressed bitmaps. The specific data sets were selected to represent a wide spectrum of scientific and industrial applications.
Bitmap indices have previously been generally regarded as efficient for low cardinality attributes, i.e., those attributes with only a small number of different distinct values. Analyses and tests conducted by the inventors have demonstrated that with WAH compression the bitmap indices may also be efficient for high cardinality attributes. Thus the compressed bitmap index may now be used effectively on many different attributes.
This invention provides for a bitmap compression data structure located in a computer readable medium comprising: a) a compressed bitmap comprised of words that have a length of Word Length (WL) bits, wherein said compressed bitmap represents sequentially corresponding bits in an uncompressed bitmap; b) an activeWord; c) an activeWord bit length, nbits, comprising: i) a series of remainder bits, numbering less than WL minus two, stored in the activeWord, (1) the series of remainder bits forming a remainder after the uncompressed bitmap has been parsed into contiguous groupings of WL minus one bits, (2) the nbits in the activeWord represents the last contiguous grouping of nbits bits in the uncompressed bitmap; and d) each of the words of the compressed bitmap having a literal/fill bit, wherein: i) in each of the words having the literal/fill bit that represents a set state, the compressed bitmap word further comprises: (1) a fill bit value having a binary state, and (2) a fill length, wherein the compressed bitmap word represents a contiguous grouping of (fill length*(WL minus one)) bits having the fill bit value state in the corresponding uncompressed bitmap; ii) in each of the words having the literal/fill bit that represents a not set state, a literal word represents a contiguous grouping of WL minus one bits of the corresponding uncompressed bitmap.
In an alternative embodiment, the bitmap compression data structure described above may have said compressed bitmap represent either the literal or a binary complement of the literal uncompressed bitmap.
In another embodiment, the bitmap compression data structure described in the above paragraph may have said literal/fill bit, where: a) is comprised of one or more bits; b) whereby said literal/fill bit alternatively represents either the set state, or the not set state.
In another embodiment, the bitmap compression data structure described in the above paragraph may have: a) said literal word representing the contiguous grouping of WL minus one bits by either the literal, a binary complement of the literal uncompressed bitmap, or any isomorphic transformation.
In another embodiment, the bitmap compression data structure described in the above paragraph may have: a means for calculating, on a computer, the bitmap compression data structure of the corresponding uncompressed bitmap.
In another embodiment, this invention provides for a method of bitmap compression comprising: a) initializing a Word Length (WL), a compressed bitmap of WL bits, an activeWord, and an activeWord bit length; i) providing a literal/fill bit in each Word Length word of the compressed bitmap by: (1) if the literal/fill bit is set to indicate a literal value, then (a) setting in the remaining WL minus one bits a literal representation of an uncompressed bitmap; (2) if the literal/fill bit is set to indicate a fill value, then (a) setting, in a fill bit, a representation of the fill value, and (b) setting a plurality of the remaining WL minus two bits to contain a representation, k, of a number of fill values repeated k*(WL minus one) times in the uncompressed bitmap; and b) continuing compression until done.
In another embodiment, the method of Word Aligned Hybrid bitmap compression of the above paragraph may have said initializing step further comprising: a) setting the compressed bitmap to an empty state; b) setting the activeWord to an empty state; and c) setting the activeWord bit length to zero length.
In another embodiment, the method of bitmap compression of described in the above paragraph may have said continuing compression step further comprising: a) starting at the beginning of the uncompressed bitmap and the beginning of the compressed bitmap that results from compression; and b) looping until the uncompressed bitmap is entirely compressed, the looping step comprising: i) sequentially inputting the WL minus one bits of the uncompressed bitmap into the activeWord; (1) if the end of the uncompressed bitmap is reached at any bit, then (a) setting the activeWord bit length to the number of the uncompressed bitmap bits input; and (2) exiting; ii) if the first WL minus one bits of the activeWord are identical and a current word in the compressed bitmap has the literal/fill bit set to the fill value representing the bits of the activeWord, then; (1) incrementing the number of fills, k, in the current word by one; (2) setting the activeWord to an empty state; and (3) setting the activeWord bit length to zero length; iii) else; (1) appending a new word to the compressed bitmap, which becomes the current word; (2) setting the literal/fill bit of the current word to a representation of the literal state in the compressed bitmap; and (3) setting the literal value to a representation of the activeWord; (4) setting the activeWord to an empty state; and (5) setting the activeWord bit length to zero length.
In another embodiment, this invention provides for a method of logically operating on a plurality of Word Aligned Hybrid compressed bitmaps comprising: a) inputting a plurality of Word Aligned Hybrid compressed bitmaps having a same uncompressed bit length; b) means for logically operating on the plurality of Word Aligned Hybrid compressed bitmaps to create a resultant Word Aligned Hybrid compressed bitmap that is a function of a logical operator.
In another embodiment, this invention provides for a Word Aligned Hybrid compressed bitmaps logical operation apparatus comprising: a) a computer that performs the steps of the above paragraph.
In another embodiment, this invention provides for the method of logically operating on a plurality of Word Aligned Hybrid compressed bitmaps of the above paragraph, wherein said inputting step may further comprise: a) reading one or more of the plurality of Word Aligned Hybrid compressed bitmaps from a computer readable medium.
In another embodiment, this invention provides for the method of logically operating on a plurality of Word Aligned Hybrid compressed bitmaps of the above paragraph, further comprising: a) outputting the resultant Word Aligned Hybrid compressed bitmap to a computer readable medium.
In another embodiment, this invention provides the method of logically operating on one or more Word Aligned Hybrid compressed bitmaps of the above paragraph, wherein said logical operator is selected from the group of logical operations consisting of the unary operator NOT, and the binary operators AND, NAND, OR, XOR, AND_NOT, and OR_NOT.
In another embodiment, this invention provides for a computational method of operating on a Word Aligned Hybrid compressed bitmap to determine a number of instances of a bit pattern in the compressed bitmap, the method comprising: a) inputting a Word Aligned Hybrid compressed bitmap that corresponds to an uncompressed bitmap; b) inputting a bit pattern comprising one or more bits; c) means for examining the Word Aligned Hybrid compressed bitmap to create a count that is a number of instances the bit pattern occurs in the corresponding uncompressed bitmap.
In another embodiment, this invention provides for a computational method of operating on a Word Aligned Hybrid compressed bitmap to determine a set of positions of instances of a bit pattern in the compressed bitmap, the method comprising: a) inputting a Word Aligned Hybrid compressed bitmap that corresponds to an uncompressed bitmap; b) inputting a bit pattern comprising one or more bits; c) means for traversing the Word Aligned Hybrid compressed bitmap to create a set of positions that are where the bit pattern occurs in the corresponding uncompressed bitmap.