Video encoding schemes, for either still or motion pictures, generally compress video data. Compression allows video data to be more efficiently transported over a network, conserving bandwidth, or more efficiently stored in computer memory.
FIG. 1 illustrates one way that video data is transmitted from a source 100 to a destination 120. The video data can be digital pixel data captured by a video camera, downloaded from a Web site, stored on a DVD, or generated using other means. The source 100 comprises a memory 101 coupled to an encoder 105. The encoder 105 is in turn coupled to a network 110, such as the Internet. The destination 120 comprises a decoder 121 coupled to the network 110 and to a memory 125. In operation, digital pixel data is first stored in the memory 101. The digital pixel data is compressed by the encoder 105 to generate compressed video data, which is transmitted over the network 110. The compressed video data is received and decompressed by the decoder 121 to recover the digital pixel data, which is stored in the memory 125. A display engine (not shown) can then read the digital pixel data stored in the memory 125 and display it on a display device (not shown), such as a computer monitor.
FIG. 2 shows a more detailed view of an encoder 130. The encoder 130 comprises a motion estimation or compensation component 131 coupled to a DCT and quantization component 133, which in turn is coupled to a Huffman encoder 137. The motion estimation or compensation component 131 receives digital pixel data and uses redundancies in the digital pixel data to compress the digital pixel data. The DCT and quantization component 133 performs two functions: First, it translates digital pixel data in the spatial domain into DCT coefficients in the frequency domain. Second, it quantizes the DCT coefficients, packages runs of DCT coefficients, and combines the packaged DCT coefficients with quantization headers to form macroblocks. As discussed below, macroblocks contain strings of bits that are the integer representations of the DCT coefficients. Macroblocks also contain headers, which contain information used to reconstruct digital pixel data. The Huffman encoder 137 receives the macroblocks and uses statistical encoding to compress the macroblocks to form a coded video bitstream containing compressed digital pixel data. The Huffman encoder 137 uses a table to map integer representations of the DCT coefficients (characters) in the macroblock into unique variable-length codes (VLCs), also called symbols. The VLCs are chosen so that, on average, the total length of the VLCs is less than the total length of the original macroblock. Thus, the coded bitstream is generally smaller than the original digital pixel data so that it requires less bandwidth when transmitted over a network and less memory when stored in memory.
FIG. 3 shows a more detailed view of a decoder 140. The decoder 140 comprises a Huffman decoder 141 coupled to a dequantization and inverse DCT component 142, which in turn is coupled to a motion estimation or compensation component 143. In operation, the Huffman decoder 141 receives a coded video bitstream, containing compressed digital pixel data, over a network or another transmission medium. The Huffman decoder 141 performs an inverse function to that performed by the Huffman encoder 135 (FIG. 2). The Huffman decoder 141 thus performs a table lookup to map symbols to characters and thus recover macroblock data from the coded video bitstream. The macroblock data is then transmitted to the dequantization and inverse DCT component 142 to recover the original digital pixel data. The dequantization and inverse DCT component 142 performs a function inverse to that of the DCT and quantization component 133 (FIG. 2) to generate digital pixel data. The motion compensation or estimation module 143 then performs motion compensation to recover the original pixel data. The recovered original digital pixel data is then stored in a memory (not shown), from which it can be retrieved by a display engine for later display on a display device. Digital pixel data is usually stored as frames, which when displayed sequentially, render a moving picture.
Compression becomes increasingly important when new generations of video data are transmitted or stored. For example, under the MPEG-4 standard, standardized in “Information Technology—Coding of audio-visual objects—Part 2: Visual,” reference number ISO/IEC 14496-2:2001(E), incorporated herein by reference, video data can be packaged with audio data, computer-generated images, and other data. Under the MPEG-4 standard, separate video objects that together make up a scene can be transmitted separately, thus allowing users to manipulate video data by adding, deleting, or moving objects within a scene. Under the MPEG-4 standard, other information, such as that used to perform error recovery, is also transmitted with the video data.
This increased flexibility is achieved by transmitting extra data to an end user. Transmitting extra data increases the time it takes to decode the coded video data at a destination. For an end user at the destination to realize the added capabilities of video transmitted according to the MPEG-4 standard, especially for real-time applications, the coded video data must be decoded as quickly and efficiently as possible.
One video decoder, from Equator Technologies, Inc., employs two processors in an attempt to decrease the time it takes to decode coded video data to generate digital pixel data. FIG. 4 illustrates an Equator decoder 200, the MAP-CA DSP MPEG-4 video decoder, available from Equator Technologies, Inc., of Campbell, Calif. The Equator decoder 200 apportions tasks between the two processors in an attempt to decrease the total decoding time. As illustrated in FIG. 4, the Equator decoder 200 comprises (a) a variable length encoder/decoder, the VLx coprocessor 210; and (b) a very-long instruction word core central processing unit, the VLIW core 220. The VLx coprocessor 210 is coupled to a first DataStreamer controller buffer 230, which in turn is coupled to an input bitstream memory 231. The VLx coprocessor 210 is coupled to the VLIW core 220 by a second DataStreamer controller buffer 232. The VLx coprocessor 210 comprises a GetBits engine 211 coupled on one end to the first DataStreamer controller buffer 230 and on another end to a VLx CPU 212. The VLx CPU 212 is coupled to a VLx memory 213, which in turn is coupled to the second DataStreamer controller buffer 232. As illustrated in FIG. 4, the VLx memory 213 is partitioned into two frame buffers. The VLIW core 220 comprises a data cache 221 coupled on one end to the second DataStreamer controller buffer 232 and on another end to a VLIW core CPU 222. As illustrated in FIG. 4, the data cache 221 is partitioned into four frame buffers.
In operation, a coded video bitstream containing variable-length codes (VLCs) is stored in the input bitstream memory 231. The coded bitstream is then transferred to the first DataStreamer controller buffer 230, from which the GetBits engine 211 can transfer it to the VLx CPU 212. The VLx CPU 212 then (1) decodes the header symbols and stores the results in a header information buffer, (2) decodes the macroblock symbols and stores them in a macroblock information buffer, and (3) produces DCT coefficients. The second DataStreamer controller buffer 232 transfers the DCT coefficients to the data cache 221. The VLIW core 222 then reads the DCT coefficients stored in the data cache 221 and performs inverse quantization, inverse DCT, motion compensation, and pixel additions to produce pictures for rendering on a display device, such as a personal computer monitor or an NTSC monitor.
The Equator decoder 200 is ill-equipped to efficiently recover digital pixel data. For example, if the VLx CPU 212 encounters an error while computing DCT coefficients, it cannot rely on the computational power of the VLIW core 222 to help it recover from the error. Instead, the VLx CPU 212 must either (1) try to recover from the error itself or (2) let the VLIW core 220 handle the error itself. The second alternative interrupts the VLIW core 220 from performing other high-level processing and does not involve the VLx CPU 212, which is generally better suited to process the DCT coefficients.
Accordingly, there is a need for an apparatus and method of decompressing video data without monopolizing a high-level processor that is ill-equipped to decode data quickly and efficiently.