Bit vectors (i.e., encoded bit strings) can be used in a relational database management system ("RDBMS") to represent instances of data items occurring within records stored in the database. A bit vector stores an encoded form of a bit string, i.e., a one dimensional array of 1's and 0's. The conventional approach to data storage in a relational database management system is to use collections of tables organized into rows and columns of data. Each column contains a particular type of information while each row consists of individual records of disparate information from each of the columns. Rows are contiguously stored as identically structured records. This record-oriented approach imposes large input and output (I/O) requirements, requires complex indexing schemes and demands periodic returning and denormalization of the data to achieve acceptable levels of performance.
A method for representing data in an RDBMS that substantially reduces these problems is described in International Application No. PCT/US88/03528, filed Oct. 7, 1988, the entire disclosure of which is incorporated herein by reference. The database is structured using a column-oriented approach that separates data values from their use in the database tables by using bit vectors. Rather than storing information as contiguous records, the data is stored in a columnar organization. The data values that make up each table are separated using bit vectors, each bit vector representing a unique value from each of the columns. Within each bit vector, a binary bit value indicates an incidence of a columnar value within a given record (or row). The bit vectors are used to represent values in an RDBMS, but can also be used for other applications.
As in any RDBMS, individual data records can be located by interrogating the database using queries. A common form of a query process involves a boolean operation operating on a pair of bit strings to form a resultant bit string representing those database records that satisfy the conditions of the query.
To save space in the database and reduce processing costs, the bit strings can be compressed, encoded and processed according to boolean operations as disclosed in U.S. Pat. No. 5,036,457 to Glaser et al., the entire disclosure of which is incorporated herein by reference. An encoded binary bit string is decoded, or a raw bit string is converted, into a series of bit units, a compressed binary bit form describing either a run or an impulse. A run refers to a bit string of one or more contiguous bits of the same binary value. An impulse refers to a bit string of one or more contiguous bits of the same binary value followed by an ending bit having a binary value opposite the bits of the same binary value. A boolean operation is performed on pairs of bit units, i.e., runs or impulses in compressed form, using an iterative looping construct to form a resultant bit unit. This technique is significantly faster than operating on each bit, one at a time, as is typically done in the art.
A pair of compressed impulses are obtained from a pair of encoded bit strings (i.e., bit vectors) or a pair of converted raw bits strings and the impulse with the shorter (called "minimal") length is selected. The boolean operation is performed for the number of bits in the minimal length impulse and a resultant bit unit of this minimal length is formed. This cycle is repeated for each of the remaining minimal length impulses. The total number of cycles required to perform the boolean operation approximately equals the sum of the number of impulses in the two input bit vectors, which is significantly less than the number of bits in the input bit vectors.
The computational overhead to process bit vectors with many short impulses becomes excessive using the minimal length method, although it is not generally a problem for bit vectors with many long impulses. Therefore, a method of performing boolean operations on a pair of encoded bit strings of indeterminate length was developed that avoids the computational overhead imposed by performing an iteration for each minimal length impulse. This method is described in U.S. patent application Ser. No. 08/566,005, filed Dec. 1, 1995, the entire disclosure of which is incorporated herein by reference. This method, called the "maxima" method, improved query processing by performing a boolean operation on bit vectors using a maximal length impulse instead of a minimal length impulse.
Despite the advances discussed above, further improvement in processing speed for queries in a large RDBMS is desirable. Prior methods operate only on a pair of bit strings at a time. When a combination of a larger number of bit strings are required to solve a query through pairwise "ORing" for example, multiple iterations of the minima or maxima methods are needed to process the bit strings. For example, if there are four input bit strings A, B, C and D, first A and B are processed as a pair outputting a result bit string E. Then E and C are processed to produce a result bit string F. Finally, F and D are processed to produce the resultant bit string. Production of this resultant bit string requires substantial processing overhead in the form of multiple procedure calls, context switching, and storage and recall of data structures such as the bit strings generated to represent the intermediate (and therefore temporary) processing values. To continue to improve query processing, a system and method supporting any number "N" of parallel input bit strings without the processing overhead of the current pair-wise scheme is needed.