Huffman codes are widely used in the area of data compression and telecommunications. Some applications include JPEG picture compression and MPEG video and audio compression. Huffman codes are of variable word length, which means that the individual symbols used to compose a message are represented (encoded) each by a distinct bit sequence of distinct length. This characteristic of the codewords helps to decrease the amount of redundancy in message data, i.e., it makes data compression possible.
The use of Huffman codes affords compression, because distinct symbols have distinct probabilities of incidence. This property is used to advantage by tailoring the code lengths corresponding to those symbols in accordance with their respective probabilities of occurrence. Symbols with higher probabilities of incidence are coded with shorter codewords, while symbols with lower probabilities are coded with longer codewords. However, longer codewords still show up, but tend to be less frequent and hence the overall code length of all codewords in a typical bit string tends to be smaller due to the Huffman coding.
A basic difficulty in decoding Huffman codes is that the decoder cannot know at first the length of an incoming codeword. As previously explained, Huffman codes are of variable length codes. Huffman codes can be detected extremely fast by dedicating enormous amounts of memory. For a set of Huffman codewords with a maximum word length of N bits, 2N memory locations are needed, because N incoming bits are used as an address into the lookup table to find the corresponding codewords.
A technique requiring less memory is currently performed using bit-by-bit decoding, which proceeds as follows. One bit is taken and compared to all the possible codes with a word length of one. If a match is not found, another bit is shifted in to try to find the bit pair from among all the codewords with word length of two. This is continued until a match is found. Although this approach is very memory-efficient, it is very slow, especially if the codeword being decoded is long.
Another technique is the binary tree search method. In this implementation technique, Huffman tables used should be converted in the form of binary trees. A binary tree is a finite set of elements that is either empty or partitioned into three disjoint subsets. The first subset contains a single element called the root of the tree. The other two subsets are referred to as left and right sub trees of the original tree. Each element of a binary tree is called a node of the tree. A branch connects two nodes. Nodes without any branches are called leaves. Huffman decoding for a symbol search begins at the root of a binary tree and ends at any of the leaves; one bit for each node is extracted from bit-stream while traversing the binary tree. This method is a compromise between memory requirement and the number of Huffman code searches as compared to the above two methods. In addition, the coding speed of this technique will be down by a factor related to maximum length of Huffman code.
Another technique currently used to decode Huffman codes is to use canonical Huffman codes. The canonical Huffman codes are of special interest since they make decoding easier. They are generally used in multimedia and telecommunications. They reduce memory and decoding complexity. However, most of these techniques use a special tree structure in the Huffman codeword tables for encoding and hence are suitable only for a special class of Huffman codes and are generally not suitable for decoding a generic class of Huffman codes.
As indicated in the above examples, a problem with using variable codeword lengths is the difficulty in achieving balance between speed and reasonable memory usage.