1. Field of the Invention
This invention relates to data compression, and specifically to decoding Huffman-encoded code words.
2. Description of the Related Art
Huffman codes are very widely used in the area of data compression and telecommunication. Some applications include JPEG picture compression and MPEG video and audio compression. Huffman codes are of variable word length, which means that the individual symbols used to compose a message arc represented (encoded) each by a distinct bit sequence of distinct length. This characteristic of the code words helps to decrease the amount of redundancy in message data, i.e., it makes data compression possible. For example, symbols A, B, C and D may be represented with following code words:
All code words are uniquely decodable; for example, the sequence of bits xe2x80x9c01101110100xe2x80x9d decodes to xe2x80x9cACDABAxe2x80x9d. The set of code words is called a symbol list or alphabet. Uniqueness follows From the xe2x80x9cprefix propertyxe2x80x9d of Huffman codes; that is, the fact that if and when any leftmost or xe2x80x9cleadingxe2x80x9d substring of a code word matches a code word in a Huffman decoding table, there is no need to check any additional bits beyond the leading substring. For example, the symbol xe2x80x9cBxe2x80x9d is assigned a code word of xe2x80x9c10xe2x80x9d. Thus, no other code words begin with xe2x80x9c10xe2x80x9d.
The use of Huffman codes affords compression, because distinct symbols have distinct probabilities of incidence. This property is used to advantage by tailoring the code lengths corresponding to those symbols in accordance with their respective probabilities of occurrence. Symbols with higher probabilities of incidence are coded with shorter code words, while symbols with lower probabilities are coded with longer code words. Longer code words still show up, but because of their smaller probabilities of occurrence, the overall code length of all code words in a typical bit string tends to be smaller due to the Huffman coding.
The algorithm for building Huffman code is based on a xe2x80x9ccoding treexe2x80x9d. Commonly-known algorithm steps are:
1. Line up the symbols by decreasing probabilities.
2. Link two symbols with least probabilities into one new symbol which probability is a sum of probabilities of two symbols.
3. Iterate step two until there is only one symbol left that has probability of unity.
4. Trace the coding tree from a root (the generated symbol with probability 1.0) to origin symbols, and assign to each lower branch 1, and to each upper branch 0, or vice versa.
For example, probabilities for some letters are listed in Table 2, and one of the possible Huffman trees built by applying the above algorithm to these probabilities is shown in FIG. 1.
Each xe2x80x9c0xe2x80x9d bit in a code word corresponds to traversing a xe2x80x9c0xe2x80x9d branch in the tree, which, in FIG. 1, is done by going up; going down traverses a xe2x80x9c1xe2x80x9d branch. The code word xe2x80x9c11000xe2x80x9d is represented on the tree by, starting on the right, at the root, and traversing one-by-one, a branch for each bit of the code word. The first two bits, xe2x80x9c11xe2x80x9d, correspond to the two one branches, or two down steps. The next bit, xe2x80x9c0xe2x80x9d, corresponds to movement up, i.e. along a zero branch, as shown by the arrow. Traversing two more zero branches, for the remaining bits, xe2x80x9c00xe2x80x9d, leads to the output symbol for the complete code word xe2x80x9c11000xe2x80x9d, which is here the letter xe2x80x9cPxe2x80x9d, located on the left side of FIG. 1.
It is thus seen from FIG. 1 that, for example, the code for letter xe2x80x9cPxe2x80x9d is xe2x80x9c11000xe2x80x9d and that there are several possible Huffman tables for any given probability distribution.
A basic difficulty in decoding Huffman codes is that the decoder cannot know a prior what is the length of an incoming code word.
Huffman codes can be detected extremely fast by dedicating enormous amounts of memory. For a set of Huffman code words whose maximum word length of N bits, 2N memory locations are needed, because N incoming bits are used as an address into the lookup table to find the corresponding code words. For example, the decoding symbols of Table 1 would require 23=8 memory locations. All addresses that begin with xe2x80x9c0xe2x80x9d are used to store the symbol xe2x80x9cAxe2x80x9d, all addresses starting with xe2x80x9c10xe2x80x9cstore the symbol xe2x80x9cBxe2x80x9d and so forth. When a code word is applied to the lookup table, decoding of the slice is performed instantly. Then, the incoming bit stream is shifted by the bit length of the code word just decoded, to bring the following code word into operable decoding position. For codes that have, for example, a maximum length of 19 bits, memory requirements grow very large.
A technique requiring less memory is bit-by-bit decoding, which proceeds as follows. One bit is taken and compared to all the possible codes with a word length of one. If a match is not found, another bit is shifted in to try to find the bit pair from among all the code words with a word length of two. This is continued until a match is found. Although this approach is very memory-efficient, it is also very slow, especially if the code word being decoded is long.
Another solution uses content-addressable memories (CAMs). A bit slice (i.e., bit string long enough to accommodate any code word and therefore equal in length to the maximum code word) is applied to the input of a CAM containing, all code words as xe2x80x9caddressesxe2x80x9d and memory pointers as xe2x80x9ccontentsxe2x80x9d. The CAM contains memory pointers that reference symbols and associated code word lengths in a RAM table. Once a code word is decoded, the incoming bit stream is then shifted by the length of the decoded code word, and decoding resumes. An efficiently-implemented CAM scheme is fast, but still requires extra memory for pointers. Moreover, CAMs are not readily available in all technologies. The CAM-based approach is described in U.S. Pat. No. 5,208,593 which is further discussed below.
As indicated in the above examples, a problem in using variable code word lengths is achieving balance between speed and reasonable memory usage.
Canonical Huffman codes are of special interest since they make decoding easier. PKZip (file compression/decompression utility), MPEG-1 layer III (Mp3) and the JPEG default baseline encoder all use canonical Huffman tables. Applications can also be found in other areas of multimedia and telecommunication.
Characteristic of canonical Huffman codes is that the most significant (nxe2x88x921) bits of the smallest Huffman code of length n are greater in value than the largest Huffman code of length (nxe2x88x921), provided that the table is of the type where almost all codes have a leading one bit. For a Huffman table composed predominantly of codes whose leading bit is zero, that is, a table derived, for example, by reversing all code word bits, a converse rule applies: The most significant (nxe2x88x921) bits of the largest Huffman code of length n are smaller in value than the smallest Huffman code of length (nxe2x88x921). Transforming Huffman tables to canonical format does not decrease coding efficiency, because, as can be seen from the following example in Table 3, the transformation does not change the number of bits per code word.
In accordance with the above converse rule for canonical codes, codes of length 3 (for example, 010 and 011) are always larger than the three starting bits of codes of length 4 (for example, 0000,0001,0010,0011). Code lengths are otherwise left unchanged.
Also noteworthy is that canonical codes often start with a string of ones (or zeroes) due to the above characteristic. The property of starting with one strings has been used in U.S. Pat. No. 5,208,593 (xe2x80x9cTongxe2x80x9d) in the context of JPEG decoding, since JPEG Huffman tables consist of several codes that start with strings of ones. This reference applies xe2x80x9cleading ones detectionxe2x80x9d to Huffman codes used in IPEG. The next code word to be decoded is checked for the length of the consecutive run of xe2x80x9c1xe2x80x9ds that starts at the most significant bit (MSB) (hereinafter, xe2x80x9cthe leading bitxe2x80x9d will mean the most significant bit or leftmost bit) of that next code word. After this length or count is known, it is also known, based on a given maximum code word length, what is the maximum number of remaining bits in the code word. The consecutive run of ones (and the following zero, since it is always known) arc masked away. The remaining bits, plus the knowledge of the number of consecutive (leading) ones, are used to form an address into a RAM table which contains symbols.
Tong is only effective on Huffman code words that have a leading bit string of ones. The Mp3 Audio Standard, for example, specifies Huffman tables with codes word with leading strings of zeros. Moreover, Tong is operative only on canonical Huffman tables and uses a lot of memory. If Tong""s methodology were to be applied to the Huffman table shown below in Table 4 (Hashemian, R. Memory Efficient and High-Speed Search Huffman Coding, IEEE Transactions on Communications, Vol. 43 No. 10, (1995)), Tong would do particularly well, because it is a single-side crowing table, i.e., a table constructed to keep xe2x80x9csubtreesxe2x80x9d small. Tong, however, uses 13 words for addresses into a second table which contains 36 entries, requiring, in total, 13+36=49 words. In addition, Tong would be memory-inefficient if applied to JPEG standard AC tables that have maximum code word lengths of 8 bits after the elimination of leading ones because Tong would use 28 memory locations in a lookup table for those remaining 8 bits.
U.S. Pat. No. 6,219,457 to Potu discloses Huffman decoding pre-processing that is implemented to count either the number of consecutive leading zeros of a code word or the number of leading ones of a code word, depending, respectively, on whether the incoming code stream has been encoded under the MPEG standard, which codes with leading zeros, or under the IPEG or Digital Video Standard, which code with leading ones. The count is used to index into a first lookup table to determine a base address of a variable length code (VLC) decoding table. A predetermined number of bits following the counted bits in the code word are combined with the base address to select the proper VLC decoding table, from which the output symbol is retrieved.
Potu, however, operates only on either a leading one bit string or on a leading zero bit stream, depending on the application to which Potu is being applied; moreover, Potu is not effective on successive bit runs within the same code word. As in the case of Tong, Potu can handle Huffman codes only if they are canonical, and Potu""s inability to decode successive bit runs in the same code word leads to larger decoding tables.
Hashemian""s decoding scheme is based on xe2x80x9cclusteringxe2x80x9d the incoming bits as follows. The first L bits are xe2x80x9cclusteredxe2x80x9d for use as a pointer into a table. If the code is L or fewer bits in length, the current table contains the symbol, and the code is instantly decoded. If it is longer, the table has pointers to other tables which contain code words that start with those particular L bits. These new tables are again addressed by the next L-bit cluster, and so forth, until the symbol is finally found. Decreasing L improves memory efficiency, but the number of decoding steps increases.
For example, for L=4, a 13-bit word requires four steps ({fraction (13/4)}=3.25) to locate a symbol. The first four of the 13 bits identify, in the first lookup table, the pointer to a second lookup table, whose codes all start with those four bits. Those four bit are thus no longer needed. Therefore, there are 9 bits left for the second lookup; after the second lookup, there are 5 bits left for the third lookup; and after the third lookup, there is 1 bit left, which requires a fourth step. That is, the three table lookups constitute the first three steps in decoding, and the processing of the remaining bit constitutes the fourth decoding step. JPEG uses maximum lengths of 13 bits, while the longest code words in Mp3 are 19 bits long.
There are several drawbacks to Hashemian""s scheme. It relies on bit masking and comparison steps. Also, since it does not exploit properties of canonical codes, the algorithm cannot simply jump over consecutive ones or zeros but processes code at a rate of at most L bits at a time; therefore, long codes take a very long time to decode. Moreover, Hashemian""s solution using the above single-side growing table and a cluster length of 4 takes up 122 words of memory.
What is needed is a fast and memory-efficient decoding method flexible enough to handle Huffman codes whether or not they are canonical, yet sufficiently robust to take advantage of efficiencies realizable from decoding codes in canonical form.
Further aggravating the problem is the fact that general CPUs are not well equipped to handle code words of variable length but operate on native lengths such as 16 or 32 bits. Shifting and masking of bit fields with arbitrary masks and searching based on results is slow. Also, Huffman decoding algorithms are structured to require frequent comparisons and branches based on their results, which is very inefficient for CPUs with deep pipelines. Some digital signal processors (DSPs) are very capable at bit field manipulation, but unfortunately also have long pipelines. Large if/then or switch/case-structures should be avoided.
Pure software decoding is slow. Finding the first 1xe2x80x9d in a stream, for example, requires several comparison operations using two""s exponents or, alternatively, other complex tasks. In hardware, finding the leading one is a simple task which requires only combination logic, whereas, with general CPU instructions, several shift/mask/comparison operations are needed.
Performing Huffman decoding requires the use of specialized independent hardware components such as shifters and adders, etc. This approach is feasible in application-specific devices, such as high definition television (HDTV) decoders, etc., but is a waste of resources on a system with a high-performance processor since these components already exist in the host.
An accelerator can be implemented as a completely independent decoder (loose coupling) that has its own access to memory and outputs data so that the host CPU can perform its own tasks. Although several resources must be duplicated (adders, memory interface units, shifters etc.), performance is high. Unfortunately, Huffman decoding requires rather large tables which, if stored in the decoder""s internal memory, would require that the memory be correspondingly large and costly. If the tables are in common memory, the decoder might block memory buses since decoding is a memory-intensive application.
In one aspect, the present invention is directed to a method, apparatus and program for decoding a current code word in a series of Huffman-encoded code words. The value of a bit in the code words is detected. A current count is calculated of that bit and subsequent, consecutive bits of the same value. Based on the current count, an entry is retrieved from the decoding table. The detecting and calculating is iteratively repeated, each time for bits subsequent to those already counted, until the last retrieved entry indicates that no more iterations are to be pet formed.
In a further aspect of the invention, if the last retrieved entry does not contain an output symbol that constitutes a decoding of the current code word, at least one bit subsequent to those counted is used to retrieve an entry that contains an output symbol that constitutes a decoding of the current code word.
In another aspect, the present invention is directed to determining the value of the leading bit of a string, and a count of a run that includes the bit. A value detector detects the value, and, a first inverter inverts the bits of the string if the detected value is equal to a pre-selected bit value. A digit extender converts to the pre-selected bit value every bit of the string of value different than the pre-selected bit value and of significance lower than that of the most significant bit having the pre-selected bit value. A second inverter inverts bits output from the digit extender. A reversor reverses the order of the bits inverted by the second inverter to create a reversed string. A thermometer code evaluator calculates a run count of the bits in the reversed string that hive the pre-selected value.
In an alternative aspect, this invention is directed to a computer usable medium having computer-readable program code means for decoding Huffman codes. The means includes a Huffman decoding table having, as an entry, an offset for identifying, from serially-arranged Huffman-encoded code words, remainder bits that represent a tail offset into the table. The number of remainder bits representing the tail offset is predetermined based on a plurality of counts of respective, consecutive, same-valued hits in the serial arrangement. The same-valued bits are of significance higher than that of the remainder bits and generally do not all have the same bit value count-to-count.
Other objects and features of the present invention will become apparent from the following detailed description considered in conjunction with the accompanying drawings. It is to be understood, however, that the drawings are designed solely for purposes of illustration and not as a definition of the limits of the invention, for which reference should be made to the appended claims.