1. Field of the Invention
The present invention generally relates to the art of audio/video data compression and transmission, and more specifically to a statistically derived method and system for decoding Motion Picture Experts Group (MPEG) motion compensation and transform coded video data.
2. Description of the Related Art
Constant efforts are being made to make more effective use of the limited number of transmission channels currently available for delivering video and audio information and programming to an end user such as a home viewer of cable television. Various methodologies have thus been developed to achieve the effect of an increase in the number of transmission channels that can be broadcast within the frequency bandwidth that is currently allocated to a single video transmission channel. An increase in the number of available transmission channels provides cost reduction and increased broadcast capacity.
The number of separate channels that can be broadcast within the currently available transmission bandwidth can be increased by employing a process for compressing and decompressing video signals. Video and audio program signals are converted to a digital format, compressed, encoded and multiplexed in accordance with an established compression algorithm or methodology.
The compressed digital system signal, or bitstream, which includes a video portion, an audio portion, and other informational portions, is then transmitted to a receiver. Transmission may be over existing television channels, cable television channels, satellite communication channels, and the like.
A decoder is provided at the receiver to de-multiplex, decompress and decode the received system signal in accordance with the compression algorithm. The decoded video and audio information is then output to a display device such as a television monitor for presentation to the user.
Video and audio compression and encoding is performed by suitable encoders which implement a selected data compression algorithm that conforms to a recognized standard or specification agreed to among the senders and receivers of digital video signals. Highly efficient compression standards have been developed by the Moving Pictures Experts Group, including MPEG 1 and MPEG 2. The MPEG standards enable several VCR-like viewing options such as Normal Forward, Play, Slow Forward, Fast Forward, Fast Reverse, and Freeze.
The MPEG specification defines a hierarchial data structure in the video portion of the bitstream as illustrated in FIG. 1a.
A video sequence includes a sequence header, one or more groups of pictures, and an end-of-sequence code.
A group of pictures is a series of one or more pictures intended to allow random access into the sequence.
A picture is the primary coding unit of a video sequence. A picture consists of three rectangular matrices representing luminance (Y) and two chrominance (Cb,Cr) values. The Y matrix has an even number of rows and columns. The Cb and Cr matrices are one-half the size of the Y matrix in each direction (horizontal and vertical). Thus, for every four luminance values, there are two associated chrominance values (one Cb value and one Cr value).
A slice is one or more contiguous macroblocks. Slices are important in the handling of errors. If the bitstream contains an error, the decoder can skip to the start of the next slice.
A macroblock is a 16 pixel .times.16 line section of luminance components and the corresponding chrominance components. As illustrated in FIG. 1b, a macroblock includes four Y blocks, one Cb block and one Cr block. The numbers correspond to the ordering of the blocks in the data stream, with block 1 first.
A block is an 8.times.8 set of values of a luminance or chrominance component.
The MPEG standard defines three main types of video pictures.
1. Intracoded pictures (I-pictures) which are coded without reference to any other pictures.
2. Predictive-coded pictures (P-pictures) which are coded using motion-compensated forward prediction from a previous I or P reference picture.
3. Bidirectional predictive-coded pictures (B-pictures) which are coded using interpolated motion compensation from a previous and a future I or P picture.
I pictures are coded using only the Discrete Cosine Transform (DCT) which converts time and space domain into frequency and amplitude domain for the purpose of achieving data compression.
The macroblock is the basic motion compensation unit for P and B pictures. Each macroblock is coded by computing a motion compensation vector which defines the displacement between the macroblock, and the corresponding macroblock in the reference I or P picture(s) from which it is being predicted. If there is little or no motion, the motion compensation vector will not be transmitted.
A comparison macroblock is then generated by displacing the reference macroblock by the amount indicated by the motion compensation vector, which is then subtracted from the macroblock of the P or B picture that is being coded to produce an error signal which corresponds to the difference therebetween. The error signal is then coded using DCT (similar to an intracoded picture) and transmitted with the motion vector. If, however, the error signal is small or zero, no error component is transmitted.
Thus, a predictive coded macroblock (P or B) can consist of only a motion compensation component, only a transform (DCT) coded component, or both.
After motion compensation and DCT coding are performed, the macroblock is quantized, and Variable Length Coded (VLC) to further compress the data bitstream. The macroblocks are then assembled into slices, pictures, groups of pictures and video sequences, multiplexed with associated audio data, and transmitted to a user for decoding and presentation.
FIG. 2 illustrates a basic decoding system 10 for decoding an MPEG video data bitstream. The bitstream is de-multiplexed, Variable Length Decoded (VLD) by a VLD decoder 12, inverse quantized by an inverse quantizer 14, and any DCT coded blocks are subjected to Inverse Discrete Cosine Transformation (IDCT) decoding by an IDCT decoder 16. The pictures are then reconstructed by a reconstruction unit 18 and output as decoded pictures.
I and P pictures that are to be used as reference pictures for forward or backward prediction are output from the reconstruction unit 18 and stored in forward and backward picture stores (memories) 20 and 22 respectively.
I pictures that are not to be used for future prediction are output directly. The reconstruction unit 18 applies the motion compensation vector and error (DCT coded) data from a P picture to a reference picture stored in the forward picture store 20 to reconstruct the P picture. The reconstruction unit 18 uses both a forward picture and a backward picture from the stores 20 and 22 to interpolate a B picture based on its motion compensation and DCT coded information.
A coded macroblock is illustrated in simplified form in FIG. 3, and includes a header, four luminance blocks (Y), one chrominance block Cb and one chrominance block Cr. The components of the header which are relevant to the present invention are illustrated in FIG. 4, and include a type designation (I, P or B), a new quantizer scale if the quantization scale is to be changed, a motion compensation vector and a coded macroblock pattern.
Each block in a macroblock is DCT coded individually. If the error signal for a particular block is zero or very small, the block is not coded and is omitted from the bitstream. The coded block pattern indicates which blocks of the macroblock are included in DCT coded form in the bitstream. As discussed above, it is possible for a macroblock to include only a motion compensation vector and no DCT coded blocks.
Traditional or conventional reconstruction methods use two dedicated pipelines for parallel execution of the two major components; motion compensation (M) and DCT transform coded (I) data. Input and output data are temporarily stored in a memory, which is typically a Dynamic Random Access Memory (DRAM).
Although the amount of data and encoding type (there are a number of motion compensation and DCT encoding modes) can vary substantially for different macroblocks, the memory bandwidth allocation for the DRAM is fixed for all types of macroblocks. The memory bandwidth is the product of the memory word length (number of bits used for transferring data to and from the DRAM in parallel), and the memory access speed. The bandwidth allocation of the system per macroblock is determined by the worst case combination of M and I data.
In reality, however, the complex I and especially M modes are used in a low percentage of the macroblocks in a picture. It is virtually impossible to code a picture with the most complex mode in all macroblocks. Hence, a system with this fixed bandwidth allocation scheme has unnecessarily low bandwidth utilization and higher system bandwidth requirement for instantaneous bandwidth bursts.