The present invention relates generally to variable length decoders (VLDs) used in data transmission systems, and more particularly, to a variable length decoder with adaptive acceleration in processing of Huffman encoded bit streams (such as compressed digital video bit streams) whose basic architecture is essentially the same as the one disclosed in U.S. Pat. No. 5,650,905, which issued on Jul. 22, 1997 to the present inventor (Michael Bakhmutsky), and which is assigned to the present assignee and is incorporated herein by reference, but which achieves improved performance and further cost reduction by sub-grouping and then cross-grouping certain DCT (Discrete Cosine Transform) coefficients based on their bit length.
In digital video data transmission systems, video data is encoded prior to being transmitted to a receiver, which decodes the encoded digital video data. The decoded digital video data is then output to a subsequent signal processing stage. To increase the data throughput and memory efficiency of such systems, statistical compression algorithms are used to compress and encode the digital video data. One such compression algorithm is the Huffman coding algorithm. Compressing the data typically results in data streams consisting of variable length code words rather than fixed length code words. Variable length decoders decode the variable length code words comprising the compressed data stream.
There are several presently available methods for decoding a sequence of variable length code words. The most prevalent methods are the tree searching algorithm and the table look-up technique.
The tree searching algorithm uses a bit-by-bit search through a code tree to find the end and value of each code word in the input bit stream. The coding tree includes leaves of known code words. The decoding process begins at the root of the coding tree and continues bit-by-bit to different branches of the coding tree, depending upon the decoded value of each successive bit in the bit stream. Eventually a leaf is reached and the end of the code word is detected. The code word is then segmented from the rest of the bit stream and the value of the detected code word is looked up and output from the variable length decoder. Decoding a bit stream using the tree searching algorithm is too slow for many high speed applications, since the decoding operation is performed at the bit rate rather than at the symbol rate. In this connection, decoding a bit stream at the bit rate does not satisfy the peak symbol rate requirements of an HDTV decoder.
To increase the data throughput of a variable length decoder, a table look-up decoder was developed, such as the one disclosed in U.S. Pat. No. 5,173,695, issued to Sun et al., the disclosure of which is herein incorporated by reference. The input of the table look-up decoder disclosed in the above-referenced patent is connected to the output of a rate buffer which receives a variable-word-length encoded bit stream at its input and outputs to the VLD bit segments normally equal in length to the maximum length code word in the bit stream. These bit segments are written into cascaded latches. The cascaded bit segments in both latches are input to a barrel shifter which provides from its multi-bit input, a sliding decoding window to a table-lookup decoder. A control signal directly shifts the position of the decoding window of the barrel shifter as each code word is detected.
To detect each code word, the initial bits in the decoding window are compared with code word entries in the table-lookup decoder. When a code word is detected, the corresponding code word length is added to the value of an accumulator with previously accumulated code word lengths to produce the control signal which directly shifts the decoding window by the number of bits in the just decoded word. When all of the bits in the first latch have been decoded, the next bit sequence in the buffer is input to the second latch while the previous bit sequence in the second latch is transferred to the first latch. The decoding window is then shifted to the beginning of the next code word in the undecoded sequence. The shifting of the decoding window and the decoding of the code word can be done in one clock cycle. As a result, the table look-up decoder is capable of decoding one code word per clock cycle regardless of its bit length, thereby dramatically increasing the data throughput of the decoder relative to the previously available tree searching algorithm decoder.
In consumer HDTV applications, for example, where the peak symbol rate is about 100 million code words per second, decoding the whole picture at the symbol rate with a single VLD becomes impractical. In HDTV systems, the VLD must be able to extract an entire picture from a rate buffer within the picture display time. The VLD must decode words in the data stream at the peak symbol rate (PSR), which depends upon the display resolution and the display time. For HDTV systems which use the MPEG ("Moving Pictures Expert Group") protocol, a VLD throughput of 100 million or more code words per second is required.
In addition to the technical problems associated with implementing the VLD itself with such throughput, the high-speed VLD interface with the large capacity rate buffer is quite expensive with the currently available memory technology. The problem becomes more severe if price is an issue, since faster and more expensive memory devices such as static random access memories (SRAMs) and synchronous dynamic random access memories (SDRAMs) must be used, rather than slower and cheaper memory devices such as asynchronous DRAMs. Of course, the price of the memory is a particularly important consideration for a consumer product, such as an HDTV set.
In current implementations, HDTV systems are normally partitioned into multiple processing paths, using multiple VLDs to decode different portions of the picture in parallel. In such implementations, the VLD is one of the major bottlenecks. Because each partition of the picture may contain almost all of the picture information, multiple dedicated ping-pong buffers may be required between all of the VLDs and the rate buffer, thereby dramatically increasing the amount of bit stream memory required for the system. For example, a partitioned decoding system having eight parallel VLDs may require eight ping-pong buffers, each one of the ping-pong buffers being twice the size of the rate buffer, thereby increasing the amount of required buffer memory by a factor of sixteen over a system having a single VLD.
In HDTV systems, the input bit stream is normally an MPEG digital video data stream which includes payload data and setup data. The payload data, which constitutes the overwhelming majority of the data (about 95% of the data), is represented by contiguous code words such as DCT (discrete cosine transform) coefficients and motion vectors, which are decoded using their respective look-up tables. The setup data, which constitutes the remaining portion of the data (about 5% of the data), is represented by singular code words which are decoded using different look-up tables. Statistically speaking, most of the HDTV material can be decoded without quality degradation using a basic VLD configuration, such as the one disclosed in U.S. Pat. No. 5,173,695. However, if such a VLD is not capable of operating at a peak symbol rate (PSR) (which is very difficult to achieve in the current VLSI technology), the pictures that do carry too much data to processed in the limited picture display time will cause the decoder to crash because the VLD will fail to extract all of the picture from the rate buffer. This may have devastating consequences for the picture quality, especially if the failed picture happened to be an anchor, which is most likely the case since those types of pictures usually carry most of the information. It the failed picture is an anchor, the error will propagate into several other pictures, thus aggravating the degradation problem. As a consequence, the VLD configuration, such as the one described in U.S. Pat. No. 5,173,695, has to operate at the PSR in order to decode HDTV pictures without failure.
Based on the above and foregoing, it can be appreciated that there presently exists a need in the art for a variable length decoder which overcomes the above-discussed drawbacks and shortcomings of the presently available technology, e.g., which can be used to implement a single-VLD HDTV decoder, rather than a partitioned HDTV decoder. More particularly, there presently exists a need in the art for a variable length decoder having a data throughput which is adequate for processing digital video data, but at a lower clock rate, thereby enabling the use of cheaper (slower) memory and making more practical the implementation of the variable length decoder.
To fulfill this need in the art, and increase the data throughput, a variable length decoder capable of pre-tagging the input bit stream and of parallel processing contiguous code words of identical type was developed and disclosed in U.S. Pat. No. 5,668,548 entitled "HIGH PERFORMANCE VARIABLE LENGTH DECODER WITH ENHANCED THROUGHPUT DUE TO TAGGING OF THE INPUT BIT STREAM AND PARALLEL PROCESSING OF CONTIGUOUS CODE WORDS," by M. Bakhmutsky (the present inventor), the disclosure of which is herein incorporated by reference. Although the variable length decoder disclosed in the above-referenced application constitutes an excellent solution for both higher throughput and lower clock rate, the overhead of both the tree-searching pre-tagging circuit and the additional buffer memory of the same size as the rate buffer result in a product cost which may be prohibitive for at least some consumer HDTV decoders. Thus, the cost of this variable length decoder constitutes a shortcoming thereof.
Accordingly, a high performance variable length decoder meeting these stringent requirements for a consumer HDTV decoder was developed and disclosed in U.S. Pat. No. 5,650,905. The throughput per clock cycle of the variable length decoder disclosed in this application is adaptively increased for a selected group of code words in the Huffman encoded input bit stream which have a bit length less than a prescribed number, by decoding combinations of two or more code words from the selected group, during a single clock cycle, using a combination value look-up table. Since the code words in the selected group are the statistically most frequently occuring code words in the Huffman encoded input bit stream, the variable length decoder is able to process an entire picture at a reduced clock rate, without sacrificing throughput.
Thus, improved statistical performance is attained due to adaptive acceleration in processing code words in the selected group. In other words, the VLD disclosed in this application applies parallelism in the processing of the smaller code words in the Huffman Table which actually are the cause of the high PSR.
Although the VLD disclosed in this co-pending application constitutes a low-cost, high-quality VLD, there is room for further optimization and cost reduction, which is the purpose of the present invention.