In computer systems, it is well known that the amount of physical space required to store data can be reduced by compressing the data to a more compact format. Furthermore, as an additional advantage, compressed data can generally be processed in less time than uncompressed data. For example, fewer bits are processed when compressed data are communicated from one computer system to another. Data compression is frequently used for large data bases, graphic images, and full-text inverted files.
One type of compression that is sometimes used for integer vectors is "bit-map" encoding. With bit-map encoding, each integer of the vector is represented in a bit-map by a single bit. A logical "1" in a bit position of the bit-map signifies the presence of an integer, and a logical "0" denotes the absence of an integer. Not only is there a substantial reduction in space, but also, time is saved during processing, since the representative bits of the bit-map can be directly accessed and manipulated.
Bit-maps are comparatively efficient in space and time utilization for compressing dense vectors. Dense vectors are vectors which are populated with a relatively large number of integers. However, bit-maps suffer space and time losses for sparse vectors, or vectors with skewed densities. In bit-maps representing sparse vectors, a large proportion of the bit map space is wasted on bit sequences having nothing but logical zeroes.
For vectors which lack any type of systemic bit distribution, "bit-wise" compression is sometimes used. Bit-wise compression derives space reduction from the fact that the differences between consecutive integers in a vector are typically small for very large vectors. Thus, the number of significant bits in the differences between consecutive integers can be encoded more compactly than the integers themselves. Each difference is encoded as a "prefix" bit string, followed by a "suffix" bit string. The prefix bit string encodes the number of bits in the suffix, and the suffix bit string encodes all significant bits of the difference. Bit-wise compression which encodes successive differences is sometimes known as Delta-compression.
Bit-wise compression compresses close to the theoretical limit for any kind of distribution of the compressed data. However, data which are compressed bit-wise suffer because the compressed representation of the data can generally not be manipulated directly by logical operators, such as AND, OR, and XOR (exclusive OR), and the like. Therefore, bit-wise compression generally requires time consuming encoding and decoding in order to perform logical operations, making bit-wise compression less suitable for data which are logically manipulated.
Furthermore, bit-wise compression utilizes bit strings of various sizes not always compatible with logic circuits and data paths used to manipulate them. For example, digital computers are generally designed to operate on bits organized in fixed-sized bytes. Thus, bit-wise compression must either waste space to keep the prefix and suffix strings aligned along easily manipulated byte boundaries, or time is wasted to parse the variable bit lengths of the prefix and suffix into manipulatable bytes.
Taking the foregoing into consideration, it is apparent that there is a need for a compression technique which compresses data regardless of the data content. Furthermore, it is desirable that logical operations on the compressed data be possible without requiring the data to be fully decompressed.