1. Field of the Invention
The invention relates to LZW data compression particularly with respect to minimizing dictionary access when processing character runs.
2. Description of the Prior Art
Professors Abraham Lempel and Jacob Ziv provided the theoretical basis for LZ data compression and decompression systems that are in present day widespread usage. Two of their seminal papers appear in the IEEE Transactions on Information Theory, IT-23-3, May 1977, pp. 337-343 and in the IEEE Transactions on Information Theory, IT-24-5, September 1978, pp. 530-536. A ubiquitously used data compression and decompression system known as LZW, adopted as the standard for V.42 bis modem compression and decompression, is described in U.S. Pat. No. 4,558,302 by Welch, issued Dec. 10, 1985. LZW has been adopted as the compression and decompression standard used in the GIF and TIFF image communication protocols.
Further examples of LZ dictionary based compression and decompression systems are described in the following U.S. patents: U.S. Pat. No. 4,464,650 by Eastman et al., issued Aug. 7, 1984; U.S. Pat. No. 4,814,746 by Miller et al., issued Mar. 21, 1989; U.S. Pat. No. 4,876,541 by Storer, issued Oct. 24, 1989; U.S. Pat. No. 5,153,591 by Clark, issued Oct. 6, 1992; and U.S. Pat. No. 5,373,290 by Lempel et al., issued Dec. 13, 1994.
Another type of data compression and decompression, denoted as run-length encoding (RLE), compresses a repeating character run by providing a compressed code indicating the character and the length of the run. RLE is thus effective in encoding long strings of the same character. For example, RLE is effective in compressing a long sequence of blanks that may be included at the beginning of a data file. RLE is also effective in image compression where an image contains a long run of consecutive pixels having the same value, such as in the sky portion of a land-sky image.
When the above dictionary based LZ compression systems encounter a character run, numerous dictionary accesses are utilized to generate the compressed codes corresponding to the run. It is desirable in such systems to minimize the number of dictionary accesses so as to enhance system performance.
In the prior art, run-length encoding has been combined with LZ systems as exemplified in the following U.S. patents: U.S. Pat. No. 4,929,946 by O'Brien et al., issued May 29, 1990; U.S. Pat. No. 4,971,407 by Hoffman, issued Nov. 20, 1990; U.S. Pat. No. 4,988,998 by O'Brien, issued Jan. 29, 1991; U.S. Pat. No. 5,247,638 by O'Brien et al., issued Sep. 21, 1993; U.S. Pat. No. 5,389,922 by Seroussi et al., issued Feb. 14, 1995; and U.S. Pat. No. 5,861,827 by Welch et al., issued Jan. 19, 1999.
In some prior art systems, run-length encoding has been combined with an LZ system by applying the data to a run-length encoder and then applying the run-length encoded data to the LZ based system. In such an architecture, a run-length encoder is utilized at the front end of the compressor and a run-length decoder is utilized at the output end of the decompressor. Such a system suffers from the disadvantages of increased equipment, expense, control overhead and processing time. U.S. Pat. Nos. 4,971,407 and 4,988,998 exemplify such a system.
In the LZW based system of U.S. Pat. No. 5,389,922, certain output codes from the compressor are suppressed in the presence of a run of repeating input data characters but numerous dictionary accesses are nevertheless utilized. A special run enhancement engine is required at the input to the decompressor to regenerate the missing codes.
In the compressor of the system of U.S. Pat. No. 5,861,827, when a partial string W and a character C are found, a new string is stored with C as an extension character on the string PW where P was the string conveyed in the last transmitted output compressed code. With this compression algorithm, a run of characters is encoded in two compressed codes regardless of its length but, nevertheless, numerous dictionary accesses are utilized. The decompressor of this system uses a special unrecognized code process to maintain synchronism with the compressor.
In the system of U.S. Pat. No. 4,929,946, a run is indicated by transmitting a predetermined reserved reference value followed by a repeat count for the run. The requirement of the use of the reserved reference value in the compressed stream for every run that is detected tends to reduce the compression. U.S. Pat. No. 5,247,638 provides descriptions similar to those of U.S. Pat. No. 4,929,946.
Another data compression system involving the encoding of data character runs is disclosed in said patent application Ser. No. 09/264,269. In the compressor of this patent application, runs are processed by successively looking ahead into the input to determine if contiguous numerically increasing segments exist in the run.
Another data compression system involving the encoding of data character runs is disclosed in said patent application Ser. No. 09/300,810. In the compressor of this patent application, runs are processed by mathematically determining, from the length of the run, the respective output codes corresponding to the contiguous numerically increasing segments that exist in the run.
Another data compression and decompression system that involves the processing of data character runs is disclosed in said patent application Ser. No. 09/336,219. In the system of this patent application, run-length encoding/decoding is embedded in the LZW data compression/decompression system where the compressor and decompressor code counters are utilized in signalling and detecting that a character run has been encountered.
It is an object of the present invention to detect the presence of a character run in an LZW data compression system and to variously utilize run processing procedures described in said patent applications Ser. No. 09/264,269 and Ser. No. 09/300,810 to process the run. Since these run processing procedures do not require numerous dictionary accesses, a performance improvement is effected.