The present invention relates to video decoders, and more particularly, to a method and apparatus for decoding encoded MPEG video data stream into raw video data.
MPEG Background
Moving Pictures Experts Group (xe2x80x9cMPEGxe2x80x9d) is a committee under the International Standards Organization (xe2x80x9cISOxe2x80x9d) and the International Electronics Commission (xe2x80x9cIECxe2x80x9d) that develops industry standards for compressing/decompressing video and audio data. Two such standards that have been ratified by MPEG are called MPEG-1 and MPEG-2. MPEG-1 is documented in ISO/IEC 11172 publication and is fully incorporated herein by reference. MPEG-2 is disclosed in ISO/IEC publication 11172 and 13818, and is also incorporated herein by reference.
MPEG-1 was developed with the intent to play back compressed video and audio data either from a CD-ROM, or transfer compressed data at a combined coded bit rate of approximately 1.5 Mbits/sec. MPEG-1 approximates the perceptual quality of a consumer videotape (VHS). However, MPEG-1 was not intended for broadcast quality. Hence, MPEG-1 syntax was enhanced to provide efficient representation of interlaced broadcast video signals. This became MPEG-2.
MPEG-1 and MPEG-2 can be applied at a wide range of bit rates and sample rates. Typically MPEG-1 processes data at a Source Input Resolution (SIF) of 352 pixelsxc3x97240 pixels at 30 frames per second, at a bit rate less than 1.5 Mbits/s. MPEG-2, developed to serve the requirements of the broadcast industry, typically processes 352 pixelsxc3x97240 lines at 30 frames/sec (xe2x80x9cLow Levelxe2x80x9d), and 720 pixels/linexc3x97480 lines at 30 frames/sec (xe2x80x9cMain Levelxe2x80x9d), at a rate of approximately 5 Mbits/sec.
MPEG standards efficiently represent video image sequences as compactly coded data. MPEG standards describe decoding (reconstruction) processes by which encoded bits of a transmitted bit stream are mapped from compressed data to the original raw video signal data suitable for video display.
MPEG Encoding
MPEG encodes video sequences such that RGB color images are converted to YUV space with two chrominance channels, U and V. A MPEG bitstream is compressed by using three types of frames: I or intra frames, P or predicted frames, and B or bi-directional frames. I frames are typically the largest frames containing enough information to qualify as entry points. Predicted frames are based on a previous frame and are highly compressed. Bi-directional frames refer both to future and previous frames, and are most highly compressed.
MPEG pictures can be simply intra-coded, with no motion compensation prediction involved, forward coded with pel prediction projected forward in time, backward coded with pel prediction backward in time, or bi-directionally coded, with reference to both forward and backward pictures. Pictures can be designated as I (formed with no prediction involved as a still image from the image data originating at the source, e.g., a video camera), P (formed with prediction from forward pictures) or B (formed with prediction both from a forward picture and/or a backward picture). An example of display sequence for MPEG frames might be shown as follows:
IBBPBBPBBPBBIBBPBBPB
Each MPEG picture is broken down into a series of slices and each slice is comprised of a series of adjacent macroblocks.
MPEG pictures can be progressive sequence or interlaced. For the interlaced GOP comprises of field and/or frame pictures. For frame pictures, macroblock prediction scheme is based upon fields (partial frames) or complete frames.
MPEG encoder decides how many pictures will occur in a GOP, and how many B pictures will be interleaved between each pair of I and P pictures or pair of P pictures in the sequence. Because of picture dependencies, i.e., temporal compression, the order in which the frames are transmitted, stored or retrieved, is not necessarily the video display order, but rather an order required by the decoder to properly decode pictures in the bitstream.
MPEG compression employs two fundamental techniques: Motion compensation and Spatial Redundancy. Motion compensation determines how predicted or bi-directional frames relate to their reference frame. A frame is divided into 16xc3x9716 pixel units called macroblocks. The macroblocks in one frame are compared to macroblocks of another frame, similarities between the frames are not coded. If similar macroblocks shift position between frames, the movement is explained by motion vectors, which are stored in a compressed MPEG stream.
Spatial redundancy technique reduces data by describing differences within corresponding macroblocks. Spatial compression is achieved by considering the frequency characteristics of a picture frame. The process uses discrete cosine transform (xe2x80x9cDCTxe2x80x9d) coefficients that spatially tracks changes in color and brightness. The DCTs are done on 8xc3x978 pixel blocks. The transformed blocks are converted to the xe2x80x9cDCT domainxe2x80x9d, where each entry in the transformed block is quantized with respect to a set of quantization tables. Huffman coding and zig-zag ordering is used to transmit the quantized values.
MPEG Decoding
MPEG Video decoders are known in the art. The video decoding process is generally the inverse of the video encoding process and is employed to reconstruct a motion picture sequence from a compressed and encoded bitstream. Generally MPEG video bitstream data is decoded according to syntax defined by MPEG standards. The decoder must first identify the beginning of a coded picture, identify the type of picture, and then decode each individual macroblock within a particular picture.
Generally, encoded video data is received in a rate or a video buffer verifier (xe2x80x9cVBVxe2x80x9d). The data is retrieved from the channel buffer by a MPEG decoder or reconstruction device for performing the decoding. MPEG decoder performs inverse scanning to remove any zig zag ordering and inverse quantization to de-quantize the data. Where frame or field DCTs are involved, MPEG decoding process utilizes frame and field Inverse Discrete Cosine Transforms (xe2x80x9cIDCTsxe2x80x9d) to decode the respective frame and field DCTs, and converts the encoded video signal from the frequency domain to the spatial domain to produce reconstructed raw video signal data.
MPEG decoder also performs motion compensation using transmitted motion vectors to reconstruct temporally compressed pictures. When reference pictures such as I or P pictures are decoded, they are stored in a memory buffer. When a reconstructed picture becomes a reference or anchor picture, it replaces the oldest reference picture. When a temporally compressed picture, also referred to as a target frame, is received, such as P or B picture, motion compensation is performed on the picture using neighboring decoded I or P reference pictures. MPEG decoder examines motion vector data, determines the respective reference block in the reference picture, and accesses the reference block from the frame buffer.
After the decoder has Huffman decoded all the macroblocks, the resultant coefficient data is then inverse quantized and operated on by an IDCT process to transform macroblock data from a frequency domain to data in space domain. Frames may need to be re-ordered before they are displayed in accordance with their display order instead of their coding order. After the frames are re-ordered, they may then be displayed on an appropriate device.
FIG. 1 shows a block diagram of a typical MPEG decoding system, as is known in the art. Shown in FIG. 1 are a MPEG Demux 10, a MPEG video decoder 11 and an audio decoder 12. MPEG Demux 10 receives encoded MPEG bit stream data 13 that consists of video and audio data, and splits MPEG bit stream data 13 into MPEG video stream data 14 and MPEG audio stream data 16. MPEG video stream data 14 is input into MPEG video decoder 11, and MPEG audio stream data 16 is input into an MPEG audio decoder 12. MPEG Demux 10 also extracts certain timing information 15, which is provided to video decoder 11 and audio decoder 12. Timing information 15 enable video decoder 11 and audio decoder 12 to synchronize an output video signal 17 (raw video signal data) from video decoder 11 with an output audio signal 18 (raw audio data) from audio decoder 12.
MPEG video decoders may have a core processor for reconstructing decoded MPEG video data into raw video signal data, and a co-processor (xe2x80x9cVLDxe2x80x9d) for doing variable length decoding of the MPEG video data stream. A direct memory access controller (xe2x80x9cDMAxe2x80x9d) either associated with or incorporated into a host computer, or associated with or incorporated into the MPEG video decoder, manages data transfer between the core processor, VLD and various memory buffers.
Current decoding processors such as those manufactured by Equator Technology Inc. (xe2x80x9cETIxe2x80x9d) process data on an individual block by block basis, rather than a macroblock level. For component block by block decoding and transfer, the speed of the processing of an entire macroblock may be limited by data transfer speed. For example, if a data transfer mechanism is able to transfer 2 bytes per cycle, for a macroblock with six (6) 8xc3x978 blocks comprising of 768 bytes of data, will require 384 cycles and an additional xe2x80x9cyxe2x80x9d number of cycles for overhead delay per transfer set. Hence, block by block decoding slows the overall decoding process.
Currently more DMA instructions are required to process each block of data vis-à-vis processing an entire macroblock of data. Also, conventional MPEG techniques have multiple waits for different DMA transfers and hence a significant amount of lead-time occurs that slows the overall decoding process.
Also, current decoding techniques adversely impact parallelism between VLD and the core processor and have inefficient VLIW pipelines. Furthermore, currently, VLD can only detect errors and is not able to correct those errors.
Therefore, a decoding system is needed that can efficiently transfer data between VLD and core processor, and also optimally utilize the resources of both processors, and perform error recovery in the core processor.
The present invention addresses the foregoing drawbacks by providing an apparatus and method that synchronizes data exchange between a core processor that includes a very long instruction word (VLIW) processor, and a variable length decoder (VLD) of an MPEG video decoder, and enhances core processor and co-processor parallelism.
According to one aspect, the present invention provides an incoming compressed and encoded MPEG video bit stream to a video decoder on a picture by picture basis. The input MPEG video stream data is organized into pictures and slices and further include macroblocks. Thereafter, VLIW adds a fake slice start code and fake macroblock data at the end of each MPEG input picture, and VLD utilizes the fake slice start code and fake macroblock data to skip to a next picture. The fake macroblock data indicates an error to VLD stopping the decoding process until the core processor reinitiates decoding of a selected slice.
VLIW then provides the input MPEG coded data stream to VLD on a picture by picture basis. VLD decodes the header of a current macroblock and the video data of a previous macroblock whose header has been decoded. The encoded MPEG video data includes DCT coefficients.
Thereafter, VLD transfers the current decoded header along with the decoded DCT coefficients of a previously decoded macroblock to the core processor on a macroblock by macroblock basis. VLIW performs motion vector reconstruction based upon decoded header data, inverse discrete cosine transforms based upon the decoded DCT coefficients, and motion compensation based upon reference data of a previous macroblock(s), and converts the data into raw video data.
The present invention has numerous advantages over the existing art. The decoding of an entire macroblock of video data assists in maintaining continues and efficient pipelined operation. Since a macroblock includes a macroblock header for a current macroblock and DCT coefficients for a previous macroblock, VLIW can easily locate data for motion vector reconstruction and compensation.
The foregoing aspects of the invention also simplify the decoding and reconstruction process because VLD decodes a macroblock header for a current macroblock, e.g. MB(i) and stores the decoded header data with a macroblock already decoded, e.g. MB(ixe2x88x921), and transfers the decoded header and macroblock data (DCTs) to a data cache for access by VLIW. This enables VLIW to acquire reference data for a macroblock prior to performing motion compensation and IDCTs. This reduces idle time and improves decoding efficiency. VLIW architecture also allows simultaneous data processing and data transfer, and hence improves parallelism. Furthermore, since VLIW controls VLD operations, error handling is streamlined and hence improves performance.
This brief summary has been provided so that the nature of the invention may be understood quickly. A more complete understanding of the invention can be obtained by reference to the following detailed description of the preferred embodiments thereof in connection with the attached drawings.