The Lempel-Ziv algorithms are well known in the field of data compression. In particular, the “history buffer” version, known as LZ1, has become particularly popular in hardware implementations wherever lossless compression of coded data is required, since its relatively modest buffer requirements and predictable performance make it a good fit for most technologies.
The LZ1 algorithm works by examining the input string of characters and keeping a record of the characters it has seen. Then, when a string appears that has occurred before in recent history, it is replaced in the output string by a “token”: a code indicating where in the past the string has occurred and for how long. Both the compressor and decompressor must use a “history buffer” of a defined length, but otherwise no more information need be passed between them.
Characters that have not been seen before in a worthwhile string are coded as a “literal”. This amounts to an expansion of the number of bits required, but in most types of data the opportunities for token substitution (and hence compression) outweigh the incompressible data, so overall compression is achieved. Typical compression ratios range from 2:1 to around 10:1.
Some variations of the basic LZ1 algorithm have emerged over the years, but improvements have been incremental.
As the LZ1 algorithm works on units of a byte, traditional hardware implementations consider just one byte at a time when compressing the input stream. As each byte is input, the “history buffer” is scanned—using, for example, a Content-Addressable-Memory (CAM)—for all occurrences of the byte. As a single byte is not considered an efficient candidate for string replacement, any matches found must be supplemented by consecutive matches before token substitution takes place.
Each subsequent byte that is input is also sought in the history buffer, but the only matches reported are those following existing matches. Finally, the string match may terminate (when no more matches found that adjoin known matches) and the surviving “string match” is coded for substitution. The longer the match, the greater the saving in bits.
A simple implementation of the LZ1 algorithm which processes one byte per clock cycle is limited to some 100–200 MB/s at typical clock rates for current ASIC (application specific integrated circuit) technology. However, this may be insufficient for some applications (such as, for example, memory compression, optical networks and RAID disk arrays) which require high bandwidth for a single data stream. To increase performance, that is, the number of bytes that may be compressed per second, either the “cycle time” (the time taken to input the byte and find all matches) must be reduced, or the CAM be modified to search for more than one byte at a time. Because of the difficulty of designing multiple-input CAMS, performance improvements have usually concentrated on shortening the access time of the CAM, and hence the cycle time. But of course, the two improvements are not mutually exclusive; a multi-byte CAM will gain performance over and above any reduction in cycle time.
A previous attempt at more than one byte per cycle compression may be found in U.S. Pat. Nos. 5,771,011 and 5,929,791 (the latter being a divisional of the former), which match two bytes per cycle. Although these patents purport to indicate the steps necessary to extend their technique to more than two bytes, their teaching is incomplete and unrealisable as to how this is to be done: although they give a formula for the number of equations required in their technique for N bytes per cycle compression, they do not indicate how the equations themselves are to be derived. Nor do they reveal how a ‘circular buffer’ that is used can be adapted to handle larger numbers of input bytes per cycle.
An earlier U.S. Pat. No. 5,179,378 uses a different technique for comparing an input byte with a history buffer. Rather than inputting one byte at a time and employing a full parallel comparison (comparing that byte with the entire contents of the history buffer) each clock cycle, as in the above prior art patents, U.S. Pat. No. 5,179,378 breaks the comparison down into stages using ‘systolic array’ pipelining. However, the technique of this patent does not achieve a processing rate of more than one input byte per cycle, and may not even complete processing of one byte per cycle.
A need therefore exists for a method and arrangement for data compression wherein the above-mentioned disadvantage(s) may be alleviated.