1. Field of the Invention
The invention relates to dictionary based data compression and decompression particularly with respect to the manner in which the compression and decompression dictionaries are updated.
2. Description of the Prior Art
The Lempel-Ziv (LZ) algorithm known as LZ2 provides the theoretical basis for numerous dictionary based data compression and decompression systems in widespread usage. LZ2 is described in a paper entitled "Compression Of Individual Sequences Via Variable-Rate Coding" by Jacob Ziv and Abraham Lempel, published in the IEEE Transactions on Information Theory, Vol. IT-24, No. 5, September 1978, pages 530-536. A ubiquitously used data compression and decompression system known as LZW, adopted as the standard for V.42 bis modem compression and decompression, is described in U.S. Pat. No. 4,558,302 by Welch, issued Dec. 10, 1985. LZW has also been adopted as the compression and decompression methods used in the GIF and TIFF image communication standards. A variant of LZ2 is described in U.S. Pat. No. 4,876,541 by Storer, issued Oct. 24, 1989. Further examples of LZ dictionary based compression and decompression systems are described in U.S. Pat. No. 4,464,650 by Eastman et al., issued Aug. 7, 1984; U.S. Pat. No. 4,814,746 by Miller et al., issued Mar. 21, 1989; U.S. Pat. No. 5,153,591 by Clark, issued Oct. 6, 1992; and European Patent Application Publication Number 0 573 208 A1 by Lempel et al., published Dec. 8, 1993.
In the above-cited systems, the input data character stream is compared character-by-character with character strings stored in a dictionary to effect a match therewith. Typically, the character-by-character comparison is continued until the longest match is determined. Based on the match, a compressed code is output and the dictionary is updated with one or more additional character strings. In the Storer patent ('541) the dictionary is updated by concatenating all of the non-zero prefixes of the current longest matched string with the previous longest matched string. Thus, if there are N characters in the current longest match, N strings are added to the dictionary after the current longest match is determined. In the Storer patent this is denoted as the All Prefixes (AP) update technique.
Another type of data compression and decompression method is denoted as Run-Length Encoding (RLE). The RLE algorithm compresses a repeating character or character group run by providing a compressed code indicating the character or character group and the length of the run. RLE is thus effective in encoding long runs of the same character or group of characters. For example, RLE is effective in compressing a long sequence of blanks that may be included at the beginning of a data file. RLE is also effective in image compression where an image contains a long run of consecutive pixels having the same value, such as in the sky portion of a land-sky image.
The LZ dictionary based compression and decompression algorithms discussed above are not especially effective in compressing long runs of a repeating character or character group. Even utilizing the AP update technique, a large number of compressed code outputs are required to compress a long length run.
This deficiency of the dictionary based systems is traditionally overcome by applying the data to a run length encoder and applying the run length encoded data to the LZ dictionary based system. In such an architecture a run length encoder is utilized at the front end of the dictionary based compressor and a run length decoder is utilized at the output end of the dictionary based decompressor. Such a system suffers from the disadvantages of increased equipment, expense, control overhead and processing time.