Data compression may be described as a process that converts data in a given format into an alternative format, with the data in the alternative format requiring less storage space, or transmission time, than the original data.
LZ77 and LZ78 are the names for the two lossless data compression algorithms, published in 1977 and 1978 respectively, and named after the authors A. Lempel and J. Ziv of the algorithms. Since that time the LZ77 and LZ78 algorithms have also formed the basis for many other compression algorithms, including LZW, LZSS and LZRW. These compression algorithms are known as dictionary coders.
The LZ77 dictionary coder operates by keeping a history window of the most recently processed input data. A comparison is then made between the current input data, that is the data that is currently being encoded, and the data in the history window. The output stream of the coder, that is the compressed stream, is made up from references to the position in the history window of any matches, and the length of such matches. If a match cannot be found the character itself is simply encoded into the compressed stream.
The LZRW algorithm has proven to be very fast, but has a poor compression ratio. The LZ78 algorithm, and algorithms based thereon such as LZW, requires significant storage in the decoder. The LZ77 dictionary coder has the advantages that its compression ratio is very good for many types of data, and its decoding is very simple and fast. Accordingly, the LZ77 algorithm has formed the basis of the gzip compression utility written by Jean-loup Gailly and adopted by the GNU Project, accessible on http://www.gzip.org/. The LZ77 algorithm is also used in the lossless Portable Network Graphics (PNG) image compression format. However, encoding using the LZ77 algorithm is often time-consuming due to the number of comparisons to be performed, which is a particular problem when applied to highly redundant data, such as halftoned images.
The previously mentioned gzip compression utility also uses a fixed Huffman scheme. As a result, two passes over the input data are required in order to collect probabilities. This exposes known issues with fixed Huffman coding namely, the impossibility of optimally coding symbols that have non-power 2 probabilities, and the necessity to send a coding tree along with the compressed data. Frequent restarting of the encoder is required to maintain adaptiveness to different patterns in the data. Adaptive Huffman coders, which perform the coding process in one pass, and arithmetic coders, which are optimal in terms of statistical probability, on the other hand are very slow.
Even though the original LZ77 and LZ78 algorithms were developed for 1-dimensional data streams, some 2-dimensional LZ schemes have been proposed recently, such as that by Storer et al.: “Lossless Image compression by block matching” published in, The Computer Journal 1997, vol40, pp. 137-145. Although the algorithm proposed by Storer outperforms the LZ77 algorithm with regards to compression ratio when applied to images, the memory requirements and time performance do not match those of the simpler 1-dimensional LZ77 algorithm.
Encoding formats have also been developed for facsimile transmissions, such as the Fax Group 3, also known as G3, and the Fax Group 4, also known as G4 encoding formats used for Tagged Image File Format (TIFF) files. The G3 format is the more commonly used format, and supports one-dimensional image compression of black and white images. As the G3 and G4 encoding formats have been developed for bi-level images containing large amounts of flat regions, like scanned text, such encoding formats do not perform well on halftoned images.
State of the art lossless bi-level image compression schemes, such as the Joint Bi-level Image experts Group (JBIG) compression scheme, only achieve moderate compression ratios on halftoned images due to the periodic dot structure appearing in halftoned images. As a result there have been some efforts to develop “hand-crafted” fast image compression schemes which are based on the knowledge of the dither matrices used in halftoning. Some of these compression schemes have very good time performance, but provide only moderate compression ratios. However, because these compression schemes rely on knowledge of the exact pattern of the dither matrix, and because the exact pattern cannot always be known, these compression schemes often fail.