The present invention relates generally to a method and apparatus for decoding data, and more particularly to a data decoding method and apparatus for a bitstream encoded by an entropy-based scheme called Huffman coding.
Demand for bandwidth among the telecommunications and computer industries has largely outpaced the gains afforded by optical fiber, cable modems and digital subscriber lines. Consequently, data compression and source coding has become ubiquitous. Such applications require faster processors and increased memory to implement the data compression codec. A technique known as xe2x80x9cfull table lookupxe2x80x9d is one of two mutually exclusive methodologies commonly relied on to decode data. Full table lookup involves storing, directly associating and recalling a symbol. Exemplary symbols include an ASCII character, byte, or numeral operable to initiate a given processing function. The symbol is stored in a memory device, or storage table, which includes a ROM (Read Only Memory) or RAM (Random Access Memory). The symbol is logically linked within the table to a unique sequence of bits that comprises a codeword. When such a sequence is conveyed in a bitstream to a storage table, the table xe2x80x98looks upxe2x80x99 and outputs the symbol value associated with the codeword.
The full table lookup method requires minimal processing time and power to associate a complete codeword. However, the memory requirements of the hardware needed for the lookup method""s execution limits its utility. For example, if the longest codeword in a bitstream is 16 bits, then pure full table look-up decoding requires a table that contains up to 216, 64 K elements. A table of such magnitude can severely burden the memory allocations of a computer system, diminishing its ability to retain other data and programs.
FIG. 1 shows a portion of a storage table suitable for use in a full table lookup application. Turning to the figure, when the codeword xe2x80x9c0110101010xe2x80x9d is presented in a bitstream, it must be extended to 12 bits by reading two extra bits from the bitstream. So the index xe2x80x9c011010101000xe2x80x9d to xe2x80x9c011010101011xe2x80x9d all correspond to the same symbol. The table 100 associates the bit sequence with indexed locations from 1704 to 1707 of the left-hand column and outputs xe2x80x9c4.xe2x80x9d Similarly, the code word xe2x80x9c111111110111xe2x80x9d is matched at table indexed location 4,088 to the symbol xe2x80x9c+.xe2x80x9d Of note, an xe2x80x9cxxe2x80x9d in the input sample column represents a logical value of either xe2x80x9c1xe2x80x9d or xe2x80x9c0xe2x80x9d, i.e., does not matter. Significantly, all 4,096 locations of the table must potentially be recalled in order to match either codeword. Despite relative decreases in the cost of computer memory, disk real estate, and interprocessor bandwidth, the requirements of many such applications make full table lookup techniques impractical and cost ineffective.
Huffman coding is applicable to various items of data, not limited to vector data, images, numerals and ASCII characters. The method capitalizes on similarities between strings of bits and statistical coding. Statistical coding translates the probability of each symbol to a sequence of bits. Namely, the Huffman algorithm takes a string of bits and translates it reversibly into another string that is on the average of shorter length. Generally, the goal of Huffman coding is to use shorter bit patterns for more commonly occurring symbols.
Before going into a detailed description of Huffman coding, a code tree appropriate for generating Huffman codes is explained. FIG. 2 illustrates one example of a partial code tree 200 that corresponds to the full table lookup application treated in FIG. 1. Nodes of the tree 200 are points designated by either a circle or a square. A line segment connecting the nodes is called a xe2x80x9cbranch.xe2x80x9d The node located in the highest position is called a xe2x80x9crootxe2x80x9d 201. Further, an under node 202 connected via a branch 203 to a certain node 204 is termed a xe2x80x9cchildxe2x80x9d of the node 204. Conversely, the upper layer node 204 is referred to as a xe2x80x9cparentxe2x80x9d of the child node 202. A node having no child is called a xe2x80x9cleaf,xe2x80x9d and a unique symbol corresponds to each leaf Further, the nodes excluding the leaves are referred to as xe2x80x9cinternal nodes,xe2x80x9d and the number of branches from the root down to each node constitute levels or layers. In the figure, all internal nodes are shown as circles and leaf nodes are displayed as squares.
When encoding by use of the code tree 200, a path extending from the root 201 down to a target leaf is outputted as a code. More specifically, xe2x80x9c0xe2x80x9d is outputted when branching off to the left from each of the nodes, while xe2x80x9c1xe2x80x9d is outputted when branching off to the right. For instance, in the code tree illustrated in FIG. 2, the code xe2x80x9c11010xe2x80x9d leads to a symbol value xe2x80x9c3xe2x80x9d that corresponds to a leaf node 205. For exemplary purposes, each layer corresponds to N cycles of the computer processor unit (CPU). Thus, 5N processing cycles are required to produce a complete five bit codeword. Likewise, xe2x80x9c0110101010xe2x80x9d of the above full-table example, uses 10N processing cycles to produce the codeword before it is associates it with the symbol xe2x80x9c4xe2x80x9d at leaf node 206.
According to Huffman coding, the above-described code tree is generated by the following procedures that comprise the Huffman algorithm: first, leaf nodes are prepared and the frequency of occurrence of their associated symbols are calculated. Second, an internal node is created for the two leaf nodes having the minimum occurrence frequency, and this internal node is connected via branches to the two leaf nodes. Further, a sum of the occurrence frequencies of the two nodes connected via the branch is recorded as an occurrence frequency of the newly created node. Third, the process set forth in the second step is repeated for the remaining nodes, i.e. the nodes not having parents, until the number of remaining nodes is one. In the code tree generated by such procedures, it follows that a code is allocated to each symbol with a code length that is inversely proportional to the occurrence frequency of the symbol. Therefore, when the coding is performed by use of the code tree, it follows that the data can be compressed and less memory space is required.
While decoding, Huffman codewords can be achieved with minimal memory allocation, the processing times required to achieve the enumerated process limits its utility. These limitations are especially applicable to systems possessing small processing reservoirs. Some Huffman circuits have been modified so as to decrease required processing times. For example, one such technique processes two decoded symbols at once. However, no known technique mitigates the substantial processing power requirements of a pure Huffman application.
Currently, program designers must choose between the two mutually exclusive approaches enumerated above. Full table lookup, while resulting in quick processing times, requires tremendous memory capacity. In contrast, tree tracing-based systems demand little storage capacity, but are burdened with multiple CPU cycles. Consequently, what is needed is a decoding technique that does not deplete the memory capacity of the computer or result in excessive processing times, yet still provides coding of data in an efficient, cost effective manner.
The present invention addresses these and other problems associated with the prior art by providing a unique method and apparatus for decoding a codeword that is embedded in a bitstream. The present invention obtains and uses a first set of bits as an index to a plurality of cascading subtables. A first subtable obtains either a symbol or an address for a second subtable from an indexed location identified by the first bit set. For the latter case, a second set of bits is then obtained and used by the designated second subtable to obtain a symbol and associated code length. Alternatively, the address of a third subtable. The code length is used to determine the lead bit of a second codeword.
One embodiment of the present invention employs Huffman tree-tracing decoding techniques in conjunction with a data table look-up method to maximize the capabilities and efficiencies of available equipment. Generally, the embodiment enables flexible decoding of data by accounting for the processing power and memory limitations of available compression hardware when presetting the size of the bit sample. By adjusting the sampling size, the invention requires smaller memory allocations than methods that instantaneously recall entire codewords. Further, the invention invokes fewer layers of a Huffman tree, translating into smaller processing requirements.
The above and other objects and advantages of the present invention shall be made apparent from the accompanying drawings and the description thereof.