LZ77 is a lossless data compression algorithm. One algorithm that uses LZ77 is DEFLATE that is used with portable network graphics (PNG) files. The DEFLATE standard data format includes a series of blocks that correspond to successive blocks of input data. Each block is compressed using a combination of the LZ77 algorithm and Huffman coding. The LZ77 algorithm finds repeated substrings and replaces them with backward references (relative distance offsets). The LZ77 algorithm can use a reference to a duplicated string occurring in the same or previous blocks. For example, the LZ77 algorithm may use a window to find duplicated strings in previous data. In some implementations, the window may be up to 32 kilobytes (kb) in length. The compressed data consists of a series of elements of two types: literal bytes and pointers to replicated strings, where a pointer is represented as a pair <length, backward distance>. Huffman coding utilizes variable length codes to represent symbols such that frequent symbols get short codes and infrequent symbols get long codes, thereby representing the set of symbols compactly.
In DEFLATE, compression using dynamic Huffman codes does two passes through the data. The first pass analyzes the message block for symbol frequencies and constructs an optimal Huffman code for the data. Once the Huffman code has been generated, the data is processed in a second pass that performs substitution of symbols into the variable length prefix codes, which compresses the data. Performing two passes through the data constructs the optimal Huffman code for the data, but is costly for embedded hardware solutions that may have very limited on-chip memory and have very high processing throughput requirements. In addition, two passes through the data requires additional time compared to a single pass through the data.
LZ77 may also use a single pass through the data using a single, predefined, static Huffman code. Using the static Huffman code, however, is independent on the data being compressed. Accordingly, the compression ratio loses on average around 15% compared to LZ77 that uses dynamic Huffman codes based upon the data being compressed. Thus a compression hardware engine that optimizes for latency and throughput by producing all blocks using static Huffman codes will sacrifice a significant compression ratio.
One enhancement to using a single static Huffman code is to use four static trees that work well for a class of file. When a file is compressed, the header contains a code ID that determines which of the four static Huffman codes is used. This enhancement works well when the compressor and decompressor are aware of the header extensions and know a priori what static codes are used. This extension has some limitations. As this extension is not standard, the extension only works when both the compressor and the decompressor are aware of and use the enhancement. Once the static Huffman codes are defined, a legacy is created and the Huffman codes have to be supported and maintained. In addition, the four Huffman codes may not be optimal for other file types or application. Adding additional Huffman codes for different applications may also be difficult.