1. Field of the Invention
The present invention relates generally to lossless data compression, and more particularly to optimizing backward reference selection to improve compression ratio in digital image systems.
2. Background
Existing lossless byte stream compression techniques, such as the Deflate data compression algorithm, incorporate commonly occurring patterns in source input data in order to increase amount of compression. These techniques build a list of commonly occurring patterns and encode the patterns by transmitting an index associated with each occurring pattern in the list. Deflate is widely used in gzip compressed files and Portable Network Graphics (PNG) images files. Improvements that are compatible to the current Deflate method can thus lead to significant savings in both data transfer bandwidth costs and persistent storage of data.
A typical Deflate compression method consists of two compression phases: backward reference selection followed by Huffman coding. The backward reference selection phase deploys a backward reference selection algorithm, such as LZ77 algorithm, for building a list of commonly occurring patterns in the source input data. The Huffman coding phase contains two Huffman tables for entropy encoding source input data and backward references. In LZ77 algorithm, the list of commonly occurring patterns is simply a portion of previously encoded sequence of source input data. To encode an input data stream, a LZ77 encoder moves a search pointer backward through the previously encoded sequence of input data searching for a match to the first data element of the input data stream. The LZ77 encoder achieves compression by continuously moving a search pointer backward through the previously encoded sequence of input data to find a longest match. Once the longest match is found, the encoder encodes it with a tuple (d, l), where d is the backward reference distance from the search pointer to the data element in the input stream following the match, and l is the length of the match.
Conventional backward reference selection algorithm in a Deflate implementation, like zlib, favors smaller backward distances when identical sequences exist. This approach is satisfactory when used for compression generic data files, since proximity in an input data stream also means similarity. Unfortunately, simply favoring smaller backward distances may result in increased entropy in the backward reference length codes. For compression of multi-component signals, such as a truecolor (red-green-blue) image, zlib or other general purpose Deflate implementation tries to match backward reference distances across the signal components, such as green component with blue one. This adds to the entropy of the backward reference distance histogram by including backward references whose distances are not aligned by the component size of an input signal.
Conventional backward reference selection algorithm in a Deflate implementation, also favors backward references with longer matches. For example, for a 24-bit truecolor image, when zlib finds a match of substrings of 4 bytes and another match of 3 bytes, zlib selects this 4 bytes substring as the backward reference. Thus, this 4-byte long match can also lead to increased entropy in the backward reference length codes. As result, the backward reference selection without taking into consideration the individual data components of the signal is not optimized, and thus tends to increase the entropy code length and to add unnecessary computational costs in finding the combinations between the individual components of the signal.