The term "data compression" refers generally to the process of transforming a set of data into a smaller compressed representational form. A data decompression process complements a data compression process and is the process of decoding the representational form back into the original set of data or an approximation thereof. Typically, effective data decompression requires a prior knowledge of the encoding method used in the data compression process.
A "data compression system" is a general term applied to an apparatus that implements a data compression and/or data decompression process. Data compression systems are extremely useful in many applications. One application for data compression is for processing textual data such as information in books, programming language source or object code, database information, numerical representations, etc. For example, a data compression system could take the full text of a literary work and compress the work for computer storage in which memory is limited. Another application for a data compression system is in data transmission applications. For example, if a large set of data is required to be transmitted over communications lines, a data compression system can compress the set of data into a smaller representational form. This representational form is then transmitted over the communications lines and recreated by the receiver, using a complementary data decompression method. Clearly, transmittal of a smaller representational form of a set of data will be faster than direct transfer of the original set of data.
FIG. 1 illustrates the basic application of a data compression system. An input data file 11 containing an original set of data is input into a data compressor 13. Data compressor 13 is comprised generally of a CPU, memory and input/output devices, and implements a data compression method. Data compressor 13 compresses input data file 11 and outputs a compressed data file 14 containing the representational form of the original set of data. Compressed data file 14 can then be sent, for example, to a memory storage device 15 or, alternatively, to a transmission station 17 for transmittal to a receiving station 19.
Complementing data compressor 13 is a data decompressor 21, which implements a data decompression method. Like the data compressor 13, the data decompressor 21 is comprised generally of a CPU, a memory, and input/output devices. Data decompressor 21 reconstructs input data file 11 by first retrieving compressed data file 14 from memory storage device 15 or, alternatively, from receiving station 19. Data decompressor 21 then decompresses the representational form and produces an output data file 23 that is identical to or a close approximation of input data file 11. One major goal of data compression systems is to generate an output data file that approximates the content of the input file in an acceptable manner.
One widely used data compressor is based on the Lempel and Ziv method. This method is described in detail in J. Ziv and A. Lempel, "Compression of Individual Sequences Via Variable Rate Coding," IEEE Transactions on Information Theory, IT-24(5):530-536, September 1978. A practical reduction of the Lempel and Ziv method was developed by Welch. This method, referred to as the Lempel-Ziv-Welch (LZW) method, is described in T. A. Welch, "A Technique for High-Performance Data Compression," Computer, 8-19, June 1984, and is the subject of U.S. Pat. No. 4,558,302.
The LZW method uses a string table implemented in memory that associates input strings with output codes. The string table has the property that, for every input string in the table, the prefix input strings of an input string are also stored in the table. For example, if the input string rare appears in the string table, the prefix input strings {r, ra, rar, rare} will also appear in the string table.
In the LZW method, the string table is initialized to the one-letter strings over an alphabet and each of these strings is associated with an output code. The input data sequence is analyzed character serially and the longest input string from the input data sequence that matches an input string currently in the string table is parsed off from the input data sequence. The output code associated to the matched input string in the string table is output. A new input string, comprised of the parsed input string from the input sequence concatenated with the next character in the input data sequence, is added to the string table. The new input string is also assigned an unique output code. The method repeats with the remaining portion of the input data sequence until the entire input sequence is compressed. The output codes generated by the compressor represents the compressed data.
One clear difficulty is that the number of input strings stored in the string table is finite. In one example of the LZW method, the output codes are a fixed length. A common length is 12 bits. Thus, 4096 (two to the twelfth power) different input strings can be assigned an unique output code. The length of the output code constrains the size of the string table, and therefore, the number of different input strings that may be stored in the table. Moreover, the input strings that occupy the string table are representative of the early portion of the input sequence. If the characteristics of the input data sequence change, the early input strings will remain in the string table, thereby not providing maximum compression. Thus, the LZW method does not adapt to changing characteristics in the input data sequence.
It is desirable to design a method of data compression that utilizes a string table that adapts with the input data sequence. An adaptable string table provides better compression as opposed to a string table that reflects only an initial portion of the input sequence. As noted above, another important aspect of data compression systems is the accuracy of the compression/decompression process. Generally, systems exhibit varying ranges of accuracy. The accuracy that is acceptable for a particular application dictates the compression system that may be used. The accuracy of a particular process may be checked each time the process is implemented by incorporating additional hardware or software to perform error checking. A widely used error detection system is a form of a Cyclic Redundancy Check (CRC). See J. E. McNamara, Technical Aspects of Data Communication, at 148-158 (1977). Such a system requires additional components to be included in the communications system. Some systems operate without error checking capabilities. Therefore, it is difficult to determine whether the decompressed data is accurate.
The present invention discloses a method of data compression and data decompression that constructs a look-up table (LUT) that continuously adapts to the input data sequence, thus enhancing compression performance. Moreover, the present invention utilizes a fixed-length string-parsing method that produces an automatic error checking mechanism.