1. Technical Field
present invention relates to data compression, and more specifically, to compressing and decompressing data utilizing a Huffman based coding scheme employing variable and limited length symbol codes.
2. Discussion of the Related Art
Huffman codes are used to compress a stream of data symbols by replacing each data symbol with a corresponding Huffman code. Frequently occurring data symbols within the data stream are assigned shorter length Huffman codes, while less-frequently occurring data symbols within the data stream are assigned longer length Huffman codes. A canonical heap-based algorithm is employed for choosing Huffman codes based on a histogram of data symbol frequencies within the data.
Huffman codes are typically represented by a tree structure (or Huffman tree). The tree structure is generally a binary tree of nodes, and may be generated by initially providing leaf nodes including a data symbol to be encoded, and a weight (or frequency of occurrence of that data symbol within the data). The two least frequently occurring data symbols are combined to form an equivalent symbol (or parent node) with a frequency of occurrence determined from the sum of the frequency of occurrence of each of the two child data symbols (or child nodes). This process (of combining the least frequently occurring data symbols) is repeated until a single equivalent symbol (or root node) is present. Bits are assigned to the branches of the tree, where typically a ‘0’ bit is assigned to a branch between a parent node and left child node and a ‘1’ is assigned to the branch between a parent node and a right child node. The resulting Huffman code is determined by starting at the root node and traversing the tree to the node associated with a desired data symbol. The bits assigned to traversed branches (from the root node to the node associated with the data symbol) are concatenated to form the Huffman code.