1. Technical Field
The present invention relates to data compression in general, and, in particular, to an apparatus for performing data compression. Still more particularly, the present invention relates to an apparatus for performing data compression according to Lempel-Ziv algorithms.
2. Description of Related Art
Lempel-Ziv algorithms are well-known in the field of data compression. In particular, the “history buffer” version, commonly known as an LZ1 algorithm, has become particularly popular in hardware implementations wherever lossless compression of coded data is required. This is because an LZ1 algorithm has a relatively modest buffer requirement and predictable performance, which make it a good fit for most technologies.
Generally speaking, an LZ1 algorithm works by examining a string of characters and keeping a record of the characters. Then, when an input string appears that has occurred before in the recent history, the input string is replaced in the output string by a token—a code indicating where in the past the input string has occurred and for how long. Both a compressor and decompressor must use a “history buffer” of a defined length, but otherwise no more information are needed to be passed between the compressor and decompressor.
Characters that have not been seen before in a worthwhile string are coded as a literal. This amounts to an expansion of the number of bits required, but in most types of data. Since the opportunities for token substitution (and hence compression) outweigh incompressible data, so an overall compression can be achieved. The compression ratios for LZ1 algorithms typically range from 2:1 to approximately 10:1.
Some variations of the basic LZ1 algorithm have emerged over the years, but improvements have been incremental.
As an LZ1 algorithm works on units of a byte, traditional hardware implementations consider only one byte at a time when compressing an input data stream. As each byte is being input, the history buffer is scanned by using, for example, a content-addressable memory (CAM), for all occurrences of the byte. As a single byte is not considered an efficient candidate for string replacement, any match found must be supplemented by consecutive matches before a token substitution should take place.
Each subsequent input byte is also sought in the history buffer, but the only matches reported are those following existing matches. Finally, the string match may terminate (when no more match is found to be adjoined to known matches) and the surviving “string match” is coded for token substitution. Needless to say, the longer the match, the greater the saving in bits.
A simple implementation of an LZ1 algorithm that processes one byte per clock cycle is limited to some 100–200 Mbytes/second at typical clock rates for current application specific integrated circuit (ASIC) technology. However, such may not be fast enough for applications such as memory compression, optical networks and RAID disk arrays, which require high bandwidth for a single data stream. To increase performance, i.e., the number of bytes that may be compressed per second, either the cycle time (the time taken to input a byte and find all matches) must be reduced or the CAM be modified to search for more than one byte at a time. Because of the difficulties in designing multiple-input CAMs, performance improvements have usually been focused on reducing the access time (in other words, cycle time) of a CAM. But of course, the two improvements are not mutually exclusive; a multiple-input CAM can gain performance over and above any reduction in cycle time.
The present disclosure provides an improved apparatus for performing data compression.