1. Technical Field
The present invention relates to a method and apparatus for compressing data in general, and in particular to a method and apparatus for performing adaptive data compression. Still more particularly, the present invention relates to a method and apparatus for providing improved data compression efficiency for an adaptive data compressor.
2. Description of the Prior Art
The type of data presented to a compression algorithm to be compressed can vary enormously. Therefore, most compression algorithms are made to be adaptive in nature in order to attain a better compression performance over a wide range of data types. Both the classical Lempel-Ziv 1 (LZ.sub.-- 1) and Lempel-Ziv 2 (LZ.sub.-- 2) compression algorithms embody this concept to a certain degree.
In the LZ.sub.-- 1 case, every byte processed is moved to a history-buffer that is initially empty. This history-buffer can be thought of as a byte-wide shift register, and once the history-buffer is completely filled, each new incoming data byte will displace the oldest data byte from the history-buffer. The current content within the history-buffer is compared with the incoming data to identify any matching strings or sequences of incoming data bytes that have occurred earlier, which still remain in the history-buffer. This incoming data sequence is then encoded in a more compact form, giving the starting point of the matching string within the history-buffer and the length of the matching string, and this forms the basis of the LZ.sub.-- 1 compression algorithm. A LZ.sub.-- 1 decompressor maintains a history-buffer with an identical data history to the history-buffer within the LZ.sub.-- 1 compressor, and simply copies such strings as its output when decoding a reference.
In the LZ.sub.-- 2 case, a dictionary of data sequences is maintained, and references to these dictionary entries constitute the basis of the LZ.sub.-- 2 compression algorithm. It is not necessary to encode a length when the incoming data matches one of the dictionary entries because the length is also held in the dictionary. Hence, compressed output data from a LZ.sub.-- 2 compression algorithm usually consists of only a sequence of numbers, representing dictionary entries. Adaptive LZ.sub.-- 2 implementations continually add new dictionary entries based on the incoming data. As in the LZ.sub.-- 1 case, both LZ.sub.-- 2 compressor and LZ.sub.-- 2 decompressor start with and maintain an identical structure, although in the case of LZ.sub.-- 2, different management strategies are utilized when the dictionary becomes full.
Typically, a data stream to be compressed is frequently found to contain sequences of identical characters, commonly known as "runs." For example, executable code often contains significant runs of "00" characters. Also, code compilers often generate null data to initialize arrays or variables to a known state. Further, database software often allocates data fields with either all blank or all zero characters. In addition, binary bitmap image data often contain a great deal of "whitespaces," typically "00" characters, representing eight pixels which are all blank. Otherwise, grey scale or color image data, especially, which is encoded utilizing one byte per pixel, may also contain long runs of identical data bytes.
These kinds of runs may lead to unnecessary and unproductive adaptations within a data compressor. Furthermore, because a history-buffer in LZ.sub.-- 1 or a dictionary in LZ.sub.-- 2 may easily be overflowed with identical data bytes from a run, it will take a while for the data compressor to resume its optimal compression ratio after the run. Consequently, it would be desirable to provide a method and apparatus to render better data compression efficiency to a data compressor such that the data compressor may be able to resume its optimal compression efficiency more rapidly after an occurrence of a run.