1. Technical Field
The present invention relates to a method and apparatus for compressing data in general, and in particular to a method and apparatus for performing adaptive data compression. Still more particularly, the present invention relates to a method and apparatus for improving data compression efficiency of Lempel-Ziv 1 variants.
2. Description of the Prior Art
The well-known Lempel-Ziv coding scheme was first suggested by J. Ziv and A. Lempel in "A Universal Algorithm for Sequential Data Compression," IEEE Trans. Inform. Theory, vol. IT-23, no. 3, pp.337-343, 1977. The classical implementation which utilized a variant of the original Ziv-Lempel coding scheme was first suggested by J. Storer and T. Symanski in "Data Compression via Textual Substitution," J. ACM, vol. 29, no. 4, pp 928-951, 1982, and was subsequently implemented by T. Bell (see "Better OPM/L Text Compression," IEEE Trans. Comm., vol. COM-34, no. 12, 1986), who called his implementation of this coding scheme LZSS.
The LZSS scheme permits the output of a Lempel-Ziv compressor to be an arbitrary mixture of code symbols called LITERAL.sub.-- POINTERs and COPY POINTERs. The LITERAL.sub.-- POINTER contains one single embedded data symbol, and the COPY.sub.-- POINTER contains two elements, namely, a SYMBOL.sub.-- COUNT and a DISPLACEMENT (or OFFSET).
Each LITERAL.sub.-- POINTER is processed by simply extracting the embedded data symbol with a decompressor. The extracted data symbol is then output from the decompressor, and is also copied back to the current location of a history-buffer for update. COPY.sub.-- POINTERs are processed by extracting and decoding the SYMBOL.sub.-- COUNT and the DISPLACEMENT value. Then, some number of symbols, as specified by the SYMBOL.sub.-- COUNT value, are copied from the history-buffer of the decompressor. These symbols are copied sequentially, one symbol at a time, starting from a history-buffer location specified by the DISPLACEMENT value. As each symbol is copied to the output of the decompressor, it is also copied back to the history-buffer for update, in the same way as for a LITERAL.sub.-- POINTER.
As in most implementations of this kind of Lempel-Ziv 1 variant (to distinguish it from a dictionary-based scheme known as Lempel-Ziv 2, also proposed by Lempel and Ziv), LZSS utilizes a variable-length SYMBOL.sub.-- COUNT coding scheme because shorter count values tend to be much more frequent than longer ones. Also, LZSS utilizes a single flag bit to distinguish between encoded LITERAL.sub.-- POINTERs and the COPY.sub.-- POINTERs.
In recent years, there is an adaptive compression algorithm known as Adaptive Lossless Data Compression (ALDC) that is widely utilized for general purpose data compression within computers and associated peripheral devices. ALDC, also a Lempel-Ziv 1 variant, is described in full detail in "QIC Development Standard QIC-154," Rev. A, 10 Mar 94, Quarter-Inch Cartridge Drive Standards, Inc. This document, also available via internet at "http://www.qic.org," is incorporated herein by reference.
If ALDC encounters incompressible data (an input data stream of random bytes, for example), the output will largely consist of LITERAL.sub.-- POINTERS, and it is not likely the incoming random data bytes will contain any extensive repeats of the preceding data sequences already stored in a history-buffer. As in the LZSS coding scheme, ALDC utilizes a single flag bit at the start of a LITERAL.sub.-- POINTER and a COPY.sub.-- POINTER for distinguishing them from each other. This will result in an expansion of about 12.5% for ALDC because each incoming random data byte value (of 8 bits) must be encoded as a LITERAL.sub.-- POINTER, which for ALDC is a single "0" bit followed by 8 literal data bits. Thus, ALDC expands each 8 incoming data bits to 9 data bits for any data having no matching string found in the history-buffer.
The present disclosure describes an improved method and apparatus for performing adaptive data compressions without utilizing any flag bits, such that the compression ratio may be better than those that require a flag bit for distinguishing a LITERAL.sub.-- POINTER and a COPY.sub.-- POINTER, such as LZSS and ALDC, as described above.