In digital processing, if a message comprises a sequence of symbols, each distinct symbol can be represented as a distinct binary codeword. Huffman's algorithm uses a table of the frequencies of occurrence of each symbol in a message and optimizes the variable length codewords such that the most frequent codeword has the shortest length. This results in data compression and Huffman coding is commonly used in audio and video compression coding, for example, MPEG.
“A method for the construction of minimum-redundancy codes”, by David A Huffman, Proceedings of the IRE 40 (1952) 1098-1101, introduces Huffman Coding.
If there are nine symbols S0, S1 . . . S8 with the following respective frequencies of occurrence 5, 5, 6, 1, 2, 3, 16, 9, 9, then they can be encoded using the Huffman algorithm into the binary tree illustrated in FIG. 1.
The tree 10 comprises leaf nodes Si and interior nodes Fi arranged in H levels. Each leaf node depends from a single interior node on the next lowest level and represents a symbol. Each interior node depends from a single interior node on the next lowest level. The level L of a node is defined by setting the root to level 0, and the other nodes have a level that is one higher that the level of the node from which it depends. The highest level is the height H of the Huffman tree. The symbols (i.e. the leaves of T) are labelled from left to right as S0, S1, S2 . . . S8.
The Huffman tree illustrated in FIG. 1 results in the following coding of the symbols
TABLE 1SymbolCodewordS0000S1001S2010S301100S401101S50111S610S7110S8111
In its simplest representation, a Huffman binary tree of height H may be represented using a word for each node of the tree. The size of such a representation makes it difficult to search during decoding.
“Memory efficient and high speed search Huffman coding”, by Hashemian, IEEE Trans on Comms, Vol. 43, No 10, 2576-, October 1995, reduces the storage space required to represent a Huffman tree and increases the decode speed using the tree. A sparse single-side growing Huffman tree is created and partitioned into smaller and less sparse clusters (sub-trees), each L levels apart. A super-tree is constructed in which each cluster is represented by a node. A super table specifies the super-tree. It specifies, for each node, the length of the cluster associated with that node and the address of the look up table for that cluster. A negative entry in the look-up table is a reference back to the super-table. A positive entry indicates that a symbol has been found and the magnitude of the entry provides the location of the symbol in memory, the codeword and the codeword length.
In “A memory-efficient and fast Huffman decoding algorithm”, by Chen, Inform Process Lett. 69 (1999) 119-122, weight is given to a leaf node equal to the number of leaves in a complete tree under the node. It is dependent upon the level of the node within the tree. Every leaf node is assigned a number equal to the cumulative weight of all the leaves appearing before it and its own weight. A codeword is given an equivalent cumulative weight as if it were a node on a tree. The actual cumulative weights are searched to determine if one matches the equivalent cumulative weight. If there is a match and the weights of the matching nodes are the same, then the codeword is a leaf node of the tree (i.e. a symbol).
“An efficient decoding technique for Huffman codes”, Chowdhury et al, Inform Process Lett, 81 (2002), 305-308, truncates a Huffman tree by removing all the leaves to improve memory use and search speed.
It would be desirable to provide an alternative representation of a Huffman binary tree and an improved Huffman decoding mechanism for decoding a received string of binary digits.