1) Field of the Invention
This invention relates to a computer apparatus and/or method for converting an uncompressed one-dimensional array of binary bits (bit string) into a compressed bit string. This invention also relates to a computer apparatus and/or method for efficiently processing a Boolean operation on a first and a second compressed bit string.
2) Prior Art
Technological developments from the invention of the printing press through automatic acquisition of data from space exploration have foisted the information explosion on us. However, the ever growing numbers of warehouses of data, such as hard copy records, magnetic tapes, etc., attest to the need for somehow condensing the representation of data while preserving its information content. Furthermore, there is a need for quickly and economically performing Boolean operations on this data without decompressing the data to its original form.
There are two basic data compression techniques used in computers which are generally described in the prior art; first, data compression techniques which compress data stored in a database and second, bit string compression techniques which compress strings of bits representative of data.
A typical computer method of data compression monitors the statistics of data in a database and modifies data encoding in accordance therewith. For example, Huffman encoding uses a variable length code to achieve data compaction in a large database. Date, An Introduction to Database Systems, 4th Edition, 79-80 (1986). In accordance with the Huffman encoding scheme, characters or other basic items of information which are to be processed are encoded into bit strings of varying length, with the shortest strings being assigned to the most frequently occurring items of data. In this way, the bit strings representing these items have an average length which is much less than that of bit strings representing such items in a conventional fixed length code format.
Another computer method of data compression is a holotropic system as disclosed in U.S. Pat. No. 4,068,298 to Dechant et al. A holotropic system compresses data by automatically taking advantage of any redundancy in the data. For example, once a character, a word, a sentence, a paragraph etc. has been encountered, no subsequent occurrence of that same element need be stored in its original form. Instead, the holotropic system notes that a previously encountered element has occurred in a manner which permits reconstruction of any or every one of the multiple data elements in its original context. Another feature of a holotropic system is that after each data element is added to the database it can be automatically correlated with any other element already stored. This correlation can reveal a relationship between a particular data element and a number of already stored data elements which permits all of the related elements to be treated as a single entity and stored together. Thus, a number of elements which were stored separately, can be collapsed into one entry in the database.
Data compression methods, as discussed above, require a sizable amount of hardware and software for their implementation, such as probability distribution analyzers, prediction function generators, etc. Data compression may also result in actual expansion of the information because the information source statistics may not be measurable to the required precision. Additionally, the resulting bit strings representative of the compressed data are not easily operated on by Boolean operators. Usually, data must be converted back to its original fixed length formats before Boolean operations can be performed.
The second category of techniques disclosed by the prior art involves methods for bit string compression by using run-length encoding schemes to represent strings of identical bit values ("1's" or "0's"). One aspect of the present invention is an improvement related to these compression techniques. These methods are generally data independent unlike the data compression techniques mentioned above. A typical application for run length encoding is employed to reduce the number of bits in redundant binary data such as digitized images of engineering drawings or transmissions of data from outer space. In Bradley, "Optimizing a Scheme for Run Length Encoding", Proceedings of the IEEE (Jan. 1969) a scheme is presented for depicting an image which is represented by a two-dimensional array of black and white picture elements. Because the appearance of black and white pixels always alternates in an image, in the encoding process it is only necessary to encode the length of a run of white pixels and not the value of its picture elements. A variable length encoding scheme is introduced for encoding the image, specifically, two encoding formats are discussed. The first encoding format encodes a run of zeros followed by a terminating one to mark the end of the run. The second encoding format simply encodes a string of consecutive zeros. A run of any arbitrary length can be represented by several code entries of the second entry type followed by a single entry from the first type.
Although the run-length encoding schemes, as discussed in the prior art (e.g. Bradley), are efficient for compressing images and large transmissions of data where there are long runs of "0's", they are not amiable to traditional forms of computer processing, particularly in the setting where the run lengths are considerably shorter and the frequency variation of runs of "1's" and "0's" occurs more often. Additionally, the prior art run-length encoding schemes are not structured for efficiently performing operations (e.g. Boolean operations, etc.) on data. Fixed length compressed formats, on the other hand, are amiable to computer operations. Specifically, fixed length formats can be operated on in an efficient fashion, and they can be used for efficiently encoding uncompressed bit strings.
Lastly, to date there has been little development in the techniques for efficiently performing Boolean operations on compressed bit strings. Traditional techniques require that Boolean operations be performed on each bit of data. These techniques have been slightly enhanced via the use of faster hardware; however, each bit of data must still be evaluated. Therefore, performing Boolean operations on bit strings remains a relatively inefficient process.