Many types of data compression systems exist. One commonly used technique is the Lempel-Ziv algorithm which is described in "Compression of Individual Sequences via variable Rate Coding" by Lempel and Ziv in IEEE Transactions on Information Theory, September, 1977, pages 530-536. FIGS. 1A-1C illustrate a typical implementation of the Lempel-Ziv algorithm. In FIG. 1A, a shift register 10 that is N+1 bytes long is used to temporarily store previously processed data. If new data to be processed includes a string of data bytes that have been processed before, then a token including the length and relative address of the previously processed data string in the shift register will be generated. This can in general be expressed using fewer bits of information than the data string itself, so the data string is effectively compressed. If the data to be processed does not form part of a previous data string existing in the shift register, then a token or tokens will be generated containing this data explicitly. In general, such tokens have to be expressed using slightly more bits of information than the data itself, so there is an effective expansion. Overall, the gain from the compressed data strings usually exceeds the losses from the non-compressed data strings, so overall data compression results. If there are no repeating strings of data in a data stream, then the data stream can not be compressed by this technique.
FIG. 1B illustrates the generation of a token referencing previously processed data. In the example given, the values A, B, C and D were previously processed and are currently stored in the shift register at addresses 37, 36, 35 and 34. New values to be processed are A, B, C and E. The new data includes the string ABC that has a length of 3 and matches previously stored string ABC at relative address 37. The address is relative because once a token is generated describing the string, the values A, B, and C will be loaded into the shift register and the values A, B, C and D will be shifted down the shift register to a new address. The address of data in the shift register is relative to the number of data values subsequently processed.
FIG. 1C illustrates the generation of a second token referencing previously stored data. In the example given, the values A, B, C and Q are to be processed. The new data includes the string ABC that has a length of 3 and matches previously stored string ABC at relative addresses 3 and 41. The token generated in this example is usually the lower relative address of 3. Tokens include the count and relative address of the previously processed string and are expressed as (count, relative address). As a result of the compression of the values A, B, C, E, A, B, C and Z as shown in FIGS. 1B and 1C, the generated processed output will include: (3, 37), E, (3, 3), Z.
One of the primary problems with implementations of the Lempel-Ziv compression technique is the difficulty in performing the search operation for previous matching strings at an effective processing speed. Many techniques discussed below are modifications of the Lempel-Ziv technique that attempt to improve the speed of the technique by improving the speed of the search operation or the amount of compression achieved by using more efficient token encoding.
U.S. Pat. No. 4,558,302 teaches what is commonly called a Lempel-Ziv-Welch data compression technique. This patent discloses utilizing a dictionary for storing commonly used data strings and searching that dictionary using hashing techniques.
U.S. Pat. No. 4,876,541 is directed to improvements to the Lempel-Ziv-Welch data compression technique described above by using a matching algorithm.
U.S. patent application Ser. No. 07/807,007, filed Dec. 13, 1991, entitled "METHOD AND APPARATUS FOR COMPRESSING DATA", assigned to International Business Machines Corporation, teaches a modification to the Lempel-Ziv compression technique where the history buffer data is stored in a fixed location rather than a shift register. As a result, the tokens used to refer to previously compressed data refer to data in a fixed location rather than to data moving along a shift register.