The present disclosure is related to Huffman coding.
As is well-known, Huffman codes of a set of symbols are generated based at least in part on the probability of occurrence of source symbols. A binary tree, commonly referred to as a xe2x80x9cHuffman Treexe2x80x9d is generated to extract the binary code and the code length. See, for example, D. A. Huffman, xe2x80x9cA Method for the Construction of Minimumxe2x80x94Redundancy Codes,xe2x80x9d Proceedings of the IRE, Volume 40 No. 9, pages 1098 to 1101, 1952. D. A. Huffman, in the aforementioned paper, describes the process this way:
List all possible symbols with their probabilities;
Find the two symbols with the smallest probabilities;
Replace these by a single set containing both symbols, whose probability is the sum of the individual probabilities;
Repeat until the list contains only one member.
This procedure produces a recursively structured set of sets, each of which contains exactly two members. It, therefore, may be represented as a binary tree (xe2x80x9cHuffman Treexe2x80x9d) with the symbols as the xe2x80x9cleaves.xe2x80x9d Then to form the code (xe2x80x9cHuffman Codexe2x80x9d) for any particular symbol: traverse the binary tree from the root to that symbol, recording xe2x80x9c0xe2x80x9d for a left branch and xe2x80x9c1xe2x80x9d for a right branch. One issue, however, for this procedure is that the resultant Huffman tree is not unique. One example of an application of such codes is text compression, such as GZIP. GZIP is a text compression utility, developed under the GNU (Gnu""s Not Unix) project, a project with a goal of developing a xe2x80x9cfreexe2x80x9d or freely available UNIX-like operation system, for replacing the xe2x80x9ccompressxe2x80x9d text compression utility on a UNIX operation system. See, for example, Gailly, J. L. and Adler, M., GZIP documentation and sources, available as gzip-1.2.4.tar at the website xe2x80x9chttp://www.gzip.org/xe2x80x9d. In GZIP, Huffman tree information is passed from the encoder to the decoder in terms of a set of code lengths along with compressed text. Both the encoder and decoder, therefore, generate a unique Huffman code based upon this code-length information. However, generating length information for the Huffman codes by constructing the corresponding Huffman tree is inefficient. In particular, the resulting Huffman codes from the Huffman tree are typically abandoned because the encoder and the decoder will generate the same Huffman codes from the code length information. It would, therefore, be desirable if another approach for generating the code length information were available.