Huffman coding is an encoding algorithm for lossless data compression. Huffman coding uses a variable length code table for encoding a source symbol such as a character in a file. In general, the variable-length code table is derived from the number of occurrences of each source symbol in the file.
Conventional Huffman coding is used as a part of the GNU zip (gzip) DEFLATE and INFLATE processes, as specified in RFC 1951. FIG. 1 illustrates a conventional compression application 10 which uses the DEFLATE and INFLATE processes to transform between a file 12 and a compressed file 14. In particular, the DEFLATE process converts the file 12 into a compressed file 14. The INFLATE process is an inverse process used to decompress the compressed file 14 to recreate the original file 12. In the DEFLATE process, files 12 are first compressed using LZ77, and then the resulting LZ77 code is Huffman coded to provide an even better compression performance. The INFLATE process implements Huffman decoding to recover the LZ77 code, and then decompresses the LZ77 code to recreate the files 12.
In conventional implementations of the INFLATE process, a series of lookups are implemented using the variable-length Huffman code values to find the LZ77 code values used in a subsequent decoding operation. These longest-prefix look up operations are typically implemented in software using an associative array. Other conventional hardware implementations use a ternary content-addressable memory (CAM) structure. However, associative arrays and ternary CAMs have certain disadvantages. For example, ternary CAMs are relatively large so they consume a significant amount of circuit area.