This invention relates generally to the encoding and decoding of multimedia data, and more particularly the invention relates to a decoder of audio and video data which has been encoded in accordance with the MPEG (Motion Picture Experts Group) standard for full-motion video.
A real time processing system for MPEG decoding needs to perform a given number of "simple" operations per second and has some processing clock whose max frequency is determined by the current state of the art of the semiconductor implementation technology. In addition, the processing system needs some memory for buffering and storage of input data, intermediate results, output data, and sometimes also instruction data.
The semiconductor implementation technology imposes a practical limit on the cost effective size of a semiconductor device. The amount of processing and the amount of memory needed determine if one device can be used or multiple devices are needed. If multiple devices are needed, then there is an option to divide the processing and the memory to the various devices or to dedicate one (or more) of the devices for memory only, and dedicate the rest of the devices mainly for processing with some memory on board.
The advantage to utilize memory only devices is in the opportunity to use general purpose memory-devices which are made in huge quantities and hence have low price. The disadvantage is in the amount of data transfer needed between the processing devices and the memory devices. In some cases the amount of total needed memory divided by the number of needed processing devices is such that the amount of memory needed in each of the processing devices still exceeds the limits of a cost effective solution. In these cases, one (or more) devices dedicated to memory are needed. If the number of "simple" operations per second required is less than, or approximately equal to the max processing clock frequency, then one device can be used which contains one processing unit. If the number of "simple" operations per second required exceeds the max processing clock frequency, then one device with a number of processing units (not necessarily of the same function) can be used. If the number of processing units required is more than could be cost-effectively implemented within one device, then a number of devices are needed.
If the number of data units for MPEG decoding, such as the Huffman coded "events" and reconstructed picture color components "samples" processed by one of the processing units, is much smaller than the max processing clock frequency, and if the "simple" operations are different from each other (e.g., a mix of arithmetic and logic operations with loops and repeated sequences), a processing unit structure similar to a general purpose processor, which is programmed by an instruction set from a program memory, should be considered. Such a processing unit is denoted herein by the name "processor".
The processing tasks of the decoder device for MPEG system and video decoding and for audio synchronization are the following:
a) Receive the system (or video only) bitstream. The data can enter the decoder at a constant bitrate or by demand.
b) Demultiplex the system bitstream, extract the specified video and serial data streams (e.g., audio) and write them in the coded data buffers.
c) Read the video stream from the video code buffer and decode it. The video decoding can be broken down to the following tasks:
1) Decoding of the various headers. PA1 2) Decoding of each sample block (Huffman decoding) to retrieve the quantized coefficients data. PA1 3) Descale and dequantize the coefficients. PA1 4) Inverse DCT transform the dequantized coefficients. PA1 5) Read one or two picture reference data blocks (as needed). PA1 6) Calculate the prediction block and add it to the result of the inverse DCT transform of the dequantized coefficients. PA1 7) Write the results in the decoded picture data buffer.
d) Read the decoded picture data from the decoded picture data buffer, post-process it (as needed, e.g., conversion from progressive to interlaced format or color conversion from Y, U and V to the color space needed for display) and output it timed to the video synchronization signals or video demand signals.
e) Read the serial coded data from the serial data code buffer, reformat it as necessary (e.g., parallel to serial conversion) and output it timed to achieve the synchronization specified in the system bitstream at a constant rate specified in the serial data stream.
All the five processing tasks described above are not naturally synchronized within a picture decoding period, but only every picture decoding period. The MPEG decoding algorithm described above specifies several buffers for proper decoding. The first type of buffers are coded bitstream buffers. If the decoder decodes video only, then one coded bitstream buffer is needed. If the decoder decodes the multiplexed system bitstreams, then the number of coded bitstream buffers needed is equal to the number of bitstreams synchronized by the decoder. The second type of buffers are decoded pictures buffers used as reference data in the decoding process. Two picture buffers are needed for this purpose. When the coded pictures are progressive (as is the case in MPEG 1 and some subsets of MPEG 2) and the decoder has to support conversion of the decoded picture to interlaced display, at least a third picture buffer is needed.
Even for constrained MPEG 1 video bitstreams, the size of the needed coded video bitstream buffer (typically about 40 Kbytes) and SIF size picture buffers (typically about 125 Kbytes per picture) precludes a cost effective solution that supports the needed buffers inside the decoder device, so that an external buffer, composed of one or more memory devices, completely controlled by the decoder, is a better solution.
Of the common types of RAM devices (SRAM, VRAM and DRAM), the DRAM offers the most cost effective solution and indeed many of the decoders already implemented use external DRAM buffers. The requirements of the DRAM structure and mapping of the various buffers to the DRAM address space are described in copending application Ser. No. 08/245,465 filed May 18, 1995 for Dynamic Random Access Memory for MPEG Decoding.
MPEG and other processing requirements: A decoded picture is composed of three rectangular components: One (the Y component) is 1 lines by p samples by 8 bits, and the other two (the U and V components) are 1/2 lines by p/2 samples by 8 bits.
The pictures are written in 8*8 sample blocks as they are decoded. The order of decoding are by macroblocks which contain four Y blocks followed by one U block and then one V block.
For some macroblocks, decoding requires reference data from one reference picture. For some macroblocks, decoding requires reference data from two reference pictures. The data needed for the decoding of each block of those macroblocks if one 9*9 sample block with origin at any sample of the component, from either one or both of the reference pictures.
For display, each of the three picture buffers (or only two, as the case may be), is read in raster scan order. The data of all three components are usually needed in parallel.
For MPEG 1 SIF size pictures, the sample rate (Y, U and V samples combined) is about 3.8 Msamples/Sec. The number of operations needed for most of the processing tasks, apart system code data, serial data handling and Huffman decoding, have a practically linear relationship with the size of the decoded picture.
The number of simple operations per second needed for MPEG 1 or main profile of MPEG 2 decoding is such that a single device with multiple processing units can be used. The choice of the number of the processing units within the device, their structure and function and their connectivity is the subject of this invention.