Generally, data compression is used in systems to reduce the amount of storage space required to store the data or to reduce the amount of bandwidth required to transmit the data. Various data compression techniques are known in the prior art. For example, hash-based predictive compression is used to compress character strings of natural language text. Hash-based predictive compression utilizes the fact that the knowledge of a short substring of characters constitutes a good basis for predicting the next character in the character string. This method for predicting successive characters based on a preceding substring is feasible because the order of characters in a natural language is not random.
In general, the hash based predictive compression technique gathers the information of short substrings of characters, and predicts the next character in the sequence based on the substring. For a correct prediction, the compression technique does not store the corresponding correctly predicted character. Instead, an indication is stored to reflect that the character was predicted correctly. However, by eliminating the storage of some characters and adding indications as to whether the character is stored, the binary representation of the characters no longer exhibits the original binary order. For example, the letter "B" has a binary representation greater than the binary representation of the letter "A." However, when encoded, the letter "B" may have a binary representation less than the binary representation of the letter "A."
An example data processing system, which utilizes data compression, may compress the data, store or transmit the data, and decompress the data prior to processing. In such a system, data are not processed in the compressed form because characteristics of the data required for processing are concealed completely when the data are compressed. The requirement that data be decompressed prior to processing results in slower and less efficient processing than would occur if the data could be processed in the compressed form. Clearly, for many data processing systems, a benefit may be realized by reducing the amount of data required for processing to increase the performance. For example, in a merge-sort data processing application, the application utilizes a storage device to temporarily hold portions of the data during processing. For such an application, multiple accesses to the mass storage device may be required particularly when processing large amounts of data. The I/O accesses to the mass storage device are often the bottleneck in the performance of the data processing application. Consequently, it is desirable to reduce the amount of data required to reduce I/O accesses in a data processing application, thereby increasing performance.
As is described below, the present invention includes a compression technique that preserves binary order of character strings when compressed. Because order of the compressed character strings is preserved, data processing may occur directly on the compressed data to provide an improvement in data processing performance.