This invention relates in general to data compression and more specifically to a single-pass deflate block encoding approach to data compression.
In data communications it is desirable to have faster transfer rates at lower costs. Data compression addresses these demands by reducing the amount of data that must be transferred over a medium of fixed bandwidth, thereby reducing connection times. Likewise, data compression reduces the media bandwidth required to transfer a fixed amount of data with fixed quality of service, thereby reducing the tariff on the service.
Deflate algorithms are standard in data compression applications and can be implemented using software-based or hardware-based approaches. Deflate algorithms typically use a multi-pass data compression approach that combines a LZ77 string matching algorithm with a Huffman encoding algorithm to provide multiple levels of data compression. In the first pass, the LZ77 algorithm searches a string for duplicate multi-byte substrings, replaces the duplicate substrings with a length and distance pair which pointers to the original substring. A sliding window-based searching algorithm uses a dictionary for keeping track substrings. Codes identifying the distance to the original substring and the length of the substring are output, in lieu of the actual substrings. In a subsequent pass, the Huffman encoding algorithm takes each code of the resultant LZ77 output and maps the code to variable-bit-length pattern so that codes with higher frequencies receive shorter bit-length encodings. De facto standard data compression applications such as GZIP, ZLIB, ZIP, PKZIP etc. use some variation of this multi-pass approach usually by implementing a combination of LZ77 passes, Huffman accounting passes, and Huffman encoding passes, which may possibly include raw encodings. A problem with the multi-pass data compression approach described above is that it uses significant CPU resources, server memory and disk storage due to the latency of the multiple passes and buffering requirements for file optimization.
Alternatively, single-pass variants of the multi-pass data compression approach described above use acute simplifications of the deflate algorithm by using small window sizes and static encoding rules. These variants reduce latency and increase throughput, however these improvements are at the expense of compression ratio and compression feature configurability. Typically, hardware-based implementations have opted for this type of simplified deflate algorithm approach because of its ease of implementation. However, more recently, multi-pass hardware implementations have become more prevalent, but with limitations on efficiency due to duplication overhead. Thus, it is desirable to provide a compression approach that minimizes protocol overhead while incorporating Huffman coding flexibility.