The invention relates generally to data processing and, more particularly, to methods and apparatus for representing input data in a form requiring fewer bits than the original data, with no loss of information.
The storage and transmission of information is becoming increasingly important in modern economic systems. Such information often consists of material such as text which contains a large amount of redundancy. Such redundancy can be reduced by the process of data compression to increase data density while retaining all of the original information. Compressed data thus has the same information content as the original non-compressed data but can be stored in smaller amounts of memory and can be transmitted over a communication channel in less time. This provides significant reductions in the cost of data storage and transmission. The compression data is then decompressed to obtain a replica of the original non-compressed data.
Existing techniques provide useful data compression for various applications. For example, the prior art technique of Huffman coding converts data of uniform increments into code values of variable length. More sophisticated techniques have been developed to overcome various problems with Huffman-type coding systems, such as those described in a paper entitled "A Universal Algorithm For Sequential Data Compression" by Lempel and Ziv, IEEE Transactions on Information Theory, Volume IT-23, No. 3, May 1977. The system described in this paper, hereinafter referred to as the "LZ" system, provides fixed length code values which are chosen to represent varying numbers of input data symbols, such as text characters. The objective of the LZ system is to represent as many input data symbols as possible by a single code value.
The LZ system uses a translation table to assign code values for input data symbol strings and to derive decompressed data symbol strings from an input of code values. The table is generated simultaneously with the compression and decompression procedures from the input data symbols and input code values, respectively, and is continuously modified along with the stream of input symbols. The LZ system is thus said to be "adaptive", and provides increasingly good compression characteristics as the translation table is built and modified.
The LZ procedure can produce efficient data compression, but is difficult to implement in some computer hardware configurations. A variation of the LZ system is described in U.S. Pat. No. 4,558,302 to Welch and in the article "A Technique For High Performance Data Compression" by Terry A. Welch, Computer, June 1984. This system, which will be referred to as the LZW system, is more suitable for implementation in computer hardware. However, the LZW system builds a fixed translation table and is thus not as adaptive as the LZ system. Under certain conditions, the LZW system can result in less efficient data compression.
It is therefore desirable to provide a method and apparatus for data compression which is readially implemented in computer hardware, and which exhibits highly adaptive characteristics to achieve greater data compression.