The present invention relates generally to digital video decoders, and, more particularly, to an HDTV video decoder partitioned on a macroblock level, and a related method.
In digital video data transmission systems, video data is encoded prior to being transmitted to a receiver, which decodes the encoded digital video data. The decoded digital video data is then output to a subsequent signal processing stage. To increase the data throughput and memory efficiency of such systems, statistical compression algorithms are used to compress and encode the digital video data. One such compression algorithm is the Huffman coding algorithm. Compressing the data typically results in data streams segmented into variable length code words rather than fixed length code words. Variable length decoders decode the variable length code words comprising the compressed data stream.
There are several presently available methods for decoding a sequence of variable length code words. The most prevalent methods are the tree searching algorithm and the table look-up technique.
The tree searching algorithm uses a bit-by-bit search through a code tree to find the end and value of each code word in the input bit stream. The coding tree includes leaves of known code words. The decoding process begins at the root of the coding tree and continues bit-by-bit to different branches of the coding tree, depending upon the decoded value of each successive bit in the bit stream. Eventually a leaf is reached and the end of the code word is detected. The code word is then segmented from the rest of the bit stream and the value of the detected code word is looked up and output from the variable length decoder. Decoding a bit stream using the tree searching algorithm is too slow for many high speed applications, since the decoding operation is performed at the bit rate rather than at the symbol rate. In this connection, decoding a bit stream at the bit rate does not satisfy the peak symbol rate requirements of an HDTV decoder.
To increase the data throughput of a variable length decoder, a table look-up decoder was developed, such as the one disclosed in U.S. Pat. No. 5,173,695, issued to Sun et al., the disclosure of which is herein incorporated by reference. The input of the table look-up decoder disclosed in the above-referenced patent is connected to the output of a rate buffer which receives a variable-word-length encoded bit stream at its input and outputs in parallel sequences of bits equal in length to the maximum length code word in the bit stream. These sequences are read into cascaded latches. The cascaded sequences in both latches are input to a barrel shifter which provides from its multi-bit input, a sliding decoding window to a table-lookup decoder. A control signal directly shifts the position of the decoding window of the barrel shifter as each code word is detected.
To detect each code word, the initial bits in the decoding window are compared with code word entries in the table-lookup decoder When a code word is detected, the corresponding code word length is added to the value of an accumulator with previously accumulated code word lengths to produce the control signal which directly shifts the decoding window by the number of bits in the just decoded word. When all of the bits in the first latch have been decoded, the next bit sequence in the buffer is input to the second latch while the previous bit sequence in the second latch is transferred to the first latch. The decoding window is then shifted to the beginning of the next code word in the undecoded sequence. The shifting of the decoding window and the decoding of the code word can be done in one clock cycle. As a result, the table look-up decoder is capable of decoding one code word per clock cycle regardless of its bit length, thereby dramatically increasing the data throughput of the decoder relative to the previously available tree searching algorithm decoder.
In consumer HDTV applications, however, where the peak symbol rates are in excess of 100 million code words per second, decoding the whole picture at the symbol rate with a single variable length decoder becomes impractical. In HDTV systems, the variable length decoder (VLD) is used to extract an entire picture from a rate buffer within the picture display time. The VLD must decode words in the data stream at the peak symbol rate (PSR), which depends upon the display resolution and the display time. For HDTV systems which use the MPEG ("Moving Pictures Expert Group") protocol, a VLD throughput in excess of 100 million or more code words per second is required.
In addition to the technical problems associated with implementing the VLD itself with such throughput, the high-speed VLD interface with the large capacity rate buffer is quite expensive with the currently available memory technology. The problem becomes more severe if price is an issue, since faster and more expensive memory devices such as static random access memories (SRAMs) or synchronous dynamic random access memories (SDRAMs) must be used, rather than slower and cheaper memory devices such as asynchronous DRAMs. Of course, the price of the memory is a particularly important consideration for a consumer product, such as an HDTV set.
In current implementations, HDTV video decoders are normally partitioned into multiple processing paths, using multiple VLDs to decode different portions of the picture in parallel. In such implementations, the VLD is one of the major bottlenecks, as this constitutes the point of transition between the compressed and decompressed domains. Because each partition of the picture may contain almost all of the picture information, multiple dedicated ping-pong buffers are required between all of the VLDs and the rate buffer, thereby dramatically increasing the amount of bit stream memory required for the system. For example, a partitioned decoding system having eight parallel VLDs may require eight ping-pong buffers, each one of the ping-pong buffers being twice the size of the rate buffer, thereby increasing the amount of required buffer memory by a factor of sixteen over a system having a single VLD.
In HDTV systems, the input bit stream is normally an MPEG digital video data stream which includes payload data and setup data. The payload data, which constitutes the overwhelming majority of the data (about 95% of the data), is represented by code words such as DCT (discrete cosine transform) coefficients and motion vectors, which are decoded using their respective look-up tables. The setup data, which constitutes the remaining portion of the data (about 5% of the data), is represented by singular code words which are decoded using different look-up tables.
Various techniques have been proposed by the present inventor in various copending patent applications in order to implement the HDTV video decoder using a single VLD rather than a partitioned HDTV video decoder using multiple VLDs. For example, a variable length decoder capable of pre-tagging the input bit stream and of parallel processing of contiguous code words of identical type was developed and disclosed in a co-pending U.S. patent application entitled "HIGH PERFORMANCE VARIABLE LENGTH DECODER WITH ENHANCED THROUGHPUT DUE TO TAGGING OF THE INPUT BIT STREAM AND PARALLEL PROCESSING OF CONTIGUOUS CODE WORDS," by M. Bakhmutsky (the present inventor), Ser. No. 08/580,405, filed Dec. 28, 1995, the teachings of which are herein incorporated by reference. Although the variable length decoder disclosed in the above-referenced application constitutes an excellent solution for both higher throughput and lower clock rate, the overhead of both the tree-searching pre-tagging circuit and the additional buffer memory of the same size as the rate buffer result in a product cost which may be prohibitive for at least some consumer HDTV decoders. Thus, the cost of this variable length decoder constitutes a shortcoming thereof.
Accordingly, a high performance variable length decoder meeting these stringent requirements for a consumer HDTV decoder was developed and disclosed in a co-pending U.S. patent application entitled "VARIABLE LENGTH DECODER WITH ADAPTIVE ACCELERATION IN PROCESSING OF HUFFMAN ENCODED BIT STREAMS", by M. Bakhmutsky (the present inventor), Ser. No. 08/580,407, filed Dec. 28, 1995, the teachings of which are herein incorporated by reference. The throughput per clock cycle of the variable length decoder disclosed in this application is adaptively increased for a selected group of code words in the Huffman encoded input bit stream which have a bit length less than a prescribed number, by decoding combinations of two or more code words from the selected group, during a single clock cycle, using a combination value look-up table. Since the code words in the selected group are the statistically most frequently occurring code words in the Huffman encoded input bit stream, the variable length decoder is able to process an entire picture at a reduced clock rate, without sacrificing throughput. Thus, improved statistical performance is attained due to adaptive acceleration in processing code words in the selected group.
However, while this statistical performance enhancement mechanism guarantees a higher average minimum code word length for an entire picture, it does not guarantee that picture elements smaller than the size of the entire picture may be processed with higher throughput. The inability to guarantee high performance with regard to local activity constitutes a shortcoming of this variable length decoder, since it could impair real-time picture processing. Good handling of local activity is vital for real-time picture processing performed without unnecessary overhead in picture memory.
A high-performance variable length decoder with two-word bit stream segmentation (and related method) which achieves high performance without the expense of high hardware complexity and additional memory is disclosed in a copending U.S. patent application entitled "HIGH PERFORMANCE VARIABLE LENGTH DECODER WITH TWO-WORD BIT STREAM SEGMENTATION AND RELATED METHOD", by Michael Bakhmutsky, Ser. No. 08/672,246, filed Jun. 26, 1996, the disclosure of which is incorporated herein by reference. The VLD disclosed in this co-pending application is capable of processing macroblocks in real-time at rates exceeding 100 million code words per second, thus satisfying the stringent requirements for use in contemporary digital HDTV video decoders, such as an MPEG-2 Main Profile, High Level compliant HDTV video decoder. Provided that the bit width of the decoding window is wide enough to accomodate two maximum-size code words, the qualifying code words are guaranteed to be processed with double throughput at approximately one-half of the clock rate required for the conventional single-path VLD disclosed in U.S. Pat. No. 5,173,695, issued to Sun et al. The maximum-size qualifying code word (AC coefficient) in the MPEG-2 protocol is 24 bits long. Therefore, providing a 48-bit-wide decoding window will guarantee double throughput for all qualifying code words.
However, in the actual hardware implementation of the high-performance VLD with two-word bit stream segmentation disclosed in this co-pending application, the 48-bit-wide decoding window is disadvantageous, because it results in a reduced speed of operation and a higher gate count in silicon, and thus, is less economical than is desirable for many consumer applications. Although it might be possible to find a VLD implementation which constitutes an acceptable statistical trade-off amongst the width of the decoding window, the macroblock clock cycle allocation (VLD clock rate), and the acceptable frequency of VLD failures in those worst-case situations in which too many specific qualifying code word pairs are "broken", i.e., not parallel-processed or "pair-matched", due to their combined length exceeding the bit width of the limited-size decoding window, this VLD implementation would not be "failure-free" (i.e., immune to failures), and thus, would not meet the most stringent requirements of some contemporary digital HDTV video decoders.
In order to overcome the limitations of the above-mentioned high-performance VLD with two-word bit stream segmentation, an improvement thereto which uses pair-match Huffman transcoding was developed by the present inventor and disclosed in co-pending U.S. patent application Ser. No. 08/749,039, which is a C-I-P of the aforementioned U.S. patent application Ser. No. 08/672,246, the disclosure of which is also incorporated herein by reference. The improved high-performance VLD disclosed in this C-I-P application utilizes a decoding window having a bit-width less than double the length of the maximum-length qualifying code word, while at the same time guaranteeing both double throughput and failure-free peak code word throughput on the macroblock level. Such a high-performance VLD with two-word bit stream segmentation provides a higher throughput, lower-cost, failure-free, "unbreakable" VLD architecture that can satisfy the most stringent requirements of the contemporary digital HDTV video decoders with minimal memory overhead.
However, while the above-described single-VLD HDTV video decoder implementations disclosed in the present inventor's above-referenced co-pending U.S. patent applications provide many significant advantages over the presently available technology, they do so at the cost of higher complexity and memory utilization than may be desired for some consumer HDTV applications, at least at the current level of VLSI technology. In this connection, it is desirable to implement a partitioned HDTV video decoder which utilizes less memory than some of the single-VLD implementations disclosed in the present inventor's co-pending applications discussed hereinabove. Further, it is desirable to synchronize the entire decoder to the same slow clock, and thereby eliminate complex multi-port access to the rate buffer. It is also desirable to reduce the rate buffer memory access speed requirement and to more efficiently utilize rate buffer memory space. Moreover, it is desirable to improve the speed performance and to reduce the gate count of the VLD. The macroblock-level partitioned HDTV video decoder of the present invention achieves each of these desired goals.
In general, partitioning an HDTV video decoder into multiple VLDs is difficult because the smallest bit stream unit of the encoded (compressed) digital video bit stream which is identifiable by a fixed length decoder is a slice. In accordance with the MPEG-2 coding protocol which is the standard for consumer-level HDTV systems, a slice consists of a variable number of macroblocks. As such, the smallest unit of partitioning is normally a group of slices constituting a full raster, where the group of slices consists of a fixed number of macroblocks. Partitioning the HDTV video decoder in this manner imposes a requirement for a significant amount of memory to store these bit stream units (i.e., group of slices), as well as a requirement for a "concurrent" or multi-port access to dynamically changing data locations in the rate buffer. The present invention introduces a novel method of partitioning the HDTV video decoder in order to overcome these significant disadvantages of presently available partitioned HDTV video decoders, and to achieve the above-identified desired goals.