Bit strings can be used in a relational database management system ("RDBMS") to represent instances of data items occurring within records stored in the database. The conventional approach to data storage utilizes collections of tables organized into rows and columns of data Each column contains a particular type of information while each row consists of individual records of disparate information from each of the columns. Rows are contiguously stored as identically structured records. This record-oriented approach imposes large input and output (I/O) requirements, requires complex indexing schemes and demands periodic retuning and denormalization of the data.
A method for representing data in an RDBMS that substantially reduces these problems is described in International Application No. PCT/US88/03528, filed Oct. 7, 1988, the entire disclosure of which is incorporated herein by reference. The database is structured using a column-oriented approach that separates data values from their use in the database tables using bit strings. Rather than storing information as contiguous records, the data is stored in a columnar organization. The data values that make up each table are separated using bit strings that each represent a unique value in a row from each of the columns. Within each bit string, a binary bit value indicates an incidence of a columnar value within a given record (or row).
As in any RDBMS, individual data records can be located by interrogating the database using queries. A common form of a query process involves a boolean operation operating on a pair of bit strings to form a resultant bit string that represents those database records that satisfy the conditions of the query.
To save space, the bit strings can be compressed, encoded and processed according to a boolean operation as disclosed in U.S. Pat. No. 5,036,457 to Glaser et al., the entire disclosure of which is incorporated herein by reference. An uncompressed binary bit string is converted into a compressed binary bit form consisting of either a run or an impulse. A boolean operation is performed on pairs of impulses in compressed form using an iterative looping construct to form a resultant bit string. This technique is significantly faster than operating on each bit, one at a time, as is typically done in the art.
A pair of compressed impulses are obtained from a pair of encoded bit strings and the impulse with the shorter (called "minimal") length is selected. The boolean operation is performed for the number of bits in the minimal length impulse and a resultant bit string of this minimal length is formed. This cycle is repeated for each of the remaining minimal length impulses. The total number of cycles required to perform the boolean operation equals approximately the sum of the number of impulses in the two input bit strings.
The computational overhead to process bit strings with many short impulses becomes excessive, although it is not generally a problem for bit strings with many long impulses. Therefore, there is a need for a method of performing boolean operations on a pair of compressed bit strings of indeterminate length that avoids the computational overhead imposed by performing an iteration for each minimal length impulse.