Many types of data compression systems exist. One commonly used technique is the Lempel-Ziv algorithm which is described in "Compression of Individual Sequences via variable Rate Coding" by Lempel and Ziv in IEEE Transactions on Information Theory, September, 1977, pages 530-536. FIGS. 1A-1C illustrate a typical implementation of the Lempel-Ziv algorithm. In FIG. 1A, a shift register 10 that is N+1 bytes long is used to temporarily store previously processed data. If new data to be processed includes a string of data bytes that have been processed before, then a token including the length and relative address of the previously processed data string in the shift register will be generated. This can in general be expressed using fewer bits of information than the data string itself, so the data string is effectively compressed. If the data to be processed does not form part of a previous data string existing in the shift register, then a token or tokens will be generated containing this data explicitly. In general, such tokens have to be expressed using slightly more bits of information than the data itself, so there is an effective expansion. Overall, the gain from the compressed data strings usually exceeds the losses from the non-compressed data strings, so overall data compression results. If there are no repeating strings of data in a data stream, then the data stream can not be compressed by this technique.
FIG. 1B illustrates the generation of a token referencing previously processed data. In the example given, the values A, B, C and D were previously processed and are currently stored in the shift register at addresses 37, 36, 35 and 34. New values to be processed are A, B, C and E. The new data includes the string ABC that has a length of 3 and matches previously stored string ABC at relative address 37. The address is relative because once a token is generated describing the string, the values A, B, and C will be loaded into the shift register and the values A, B, C and D will be shifted down the shift register to a new address. The address of data in the shift register is relative to the number of data values subsequently processed.
FIG. 1C illustrates the generation of a second token referencing previously stored data. In the example given, the values A, B, C and Z are to be processed. The new data includes the string ABC that has a length of 3 and matches previously stored string ABC at relative addresses 3 and 41. The token generated in this example is usually the lower relative address of 3. Tokens include the count and relative address of the previously processed string and are expressed as (count, relative address). As a result of the compression of the values A, B, C, E, A, B, C and Z as shown in FIGS. 1B and 1C, the generated processed output will include: (3, 37), E, (3, 3), Z.
One of the primary problems with implementations of the Lempel-Ziv compression technique is the difficulty in performing the search operation for previous matching strings at an effective processing speed. Many techniques discussed below are modifications of the Lempel-Ziv technique that attempt to improve the speed of the technique by improving the speed of the search operation or the amount of compression achieved by using more efficient token encoding. U.S. Pat. No. 4,021,782 teaches a compaction device to be used on both ends of a transmission line to compact, transmit, and decompact data. Each possible incoming character is categorized according to an expected frequency of use in a preset coding table. The category of the character affects the encoding of that character (more frequently used characters have shorter generated code).
U.S. Pat. No. 4,464,650 discloses parsing an input data stream into segments by utilizing a search tree having nodes and branches.
U.S. Pat. No. 4,558,302 teaches what is commonly called a Lempel-Ziv-Welch data compression technique. This patent discloses utilizing a dictionary for storing commonly used data strings and searching that dictionary using hashing techniques.
U.S. Pat. No. 4,612,532 discloses interchanging positions of candidates or ordering the table of candidates in approximate order of frequency.
U.S. Pat. No. 4,622,585 describes compression of image data. This patent discloses a first series of data bits in a current picture line and a second series of data bits of a preceding picture line being shifted together in a compression translator.
U.S. Pat. No. 4,814,746 describes utilizing a dictionary for storing commonly, used data strings and deleting from that dictionary the data strings not commonly used.
U.S. Pat. No. 4,853,696 describes a plurality of logic circuit elements or nodes connected together in reverse binary treelike fashion to form a plurality of logic paths corresponding to separate characters.
U.S. Pat. No. 4,876,541 is directed to improvements to the Lempel-Ziv-Welch data compression technique described above by using a novel matching algorithm.
U.S. Pat. No. 4,891,784 is directed to an auto blocking feature in a tape environment which utilizes a packet assembly/disassembly means for auto blocking.
U.S. Pat. No. 4,899,147 is directed to a data compression technique with a throttle control to prevent data underruns and an optimizing startup control. This patent discloses a capability to decompress data read in a reverse direction from which it was written. This patent also discloses a string table for storing frequently used strings with a counter of the number of times the string has been used.
U.S. Pat. No. 4,906,991 is directed to copy codeword encoding which utilizes a tree data structure.
U.S. Pat. No. 4,988,998 is directed to preprocessing strings of repeated characters by replacing a sequentially repeated character with a single character and repeat count.