The present invention relates to the field of video decompression devices, and is more specifically directed to methods and circuits for reducing the memory required during decompression by storing compressed information using discrete cosine transform (DCT) based techniques.
The size of a digital representation of uncompressed video images dopends on the resolution and color depth of the image. A movie composed of a sequence of uncompressed video images, and accompanying audio signals quickly becomes too large to fit entirely onto conventional recording medium, such as a compact disk (CD). Moreover, transmitting such an uncompressed movie over a communication link is prohibitively expensive because of the excessive quantity of data to be transmitted.
It is therefore advantageous to compress video and audio sequences before they are transmitted or stored. A great deal of effort is being expanded to develop systems to compress these sequences. There are several coding standards currently used that are based on the DCT algorithm including MPEG-1, MPEG-2, H.261, and H.263. (MPEG is an acronym for xe2x80x9cMotion Picture Expert Groupxe2x80x9d, a committee of the International Organization for Standardization, ISO.) The MPEG-1, MPEG-2, H.261 and H.263 standards include decompression protocols that describe how an encoded (i.e. compressed) bitstream is to be decoded (i.e. decompressed). The encoding can be done in any manner, as long as the resulting bitstream complies with the standard.
Video and/or audio compression devices (hereinafter encoders) are used to encode the video and/or audio sequence before the sequence is transmitted or stored. The resulting encoded bitstream is decoded by a video and/or audio decompression device (hereinafter decoder) before the video and/or audio sequence is output. However, a bitstream can only be decoded by a decoder if it complies with the standard used by the encoder. To be able to decode the bitstream on a large number of systems, it is advantageous to encode the video and/or audio sequences according to a well accepted encoding/decoding standard. The MPEG standards are currently well accepted standards for one way communication. H.261, and H.263 are currently well accepted standards for two way communication, such as video telephony.
Once decoded, the decoded video and audio sequences can be output on an electronic system dedicated to outputting video and audio, such as a television or a video cassette recorder (VCR) player, or on an electronic system where image display and audio is just one feature of the system, such as a computer. A decoder needs to be added to these electronic systems to allow them to decode the compressed bitstream into uncompressed data, before it can be output. An encoder needs to be added to allow such electronic systems to compress video and/or audio sequences that are to be transmitted or stored. Both the encoder and decoder need to be added for two way communication.
FIG. 1A shows a block diagram of the architecture of a typical decoder, such as an MPEG-2 decoder 10. The decoder 10 can be both a video and audio decoder or just a video decoder, where the audio portion of the decoder 10 can be performed in any known conventional way. The encoded bitstream is received by an input buffer, typically a first-in-first-out (FIFO) buffer 30, hereinafter FIFO 30, although the buffer can be any type of memory. The FIFO 30 buffers the incoming encoded bitstream as previously received data is being decoded.
The encoded bitstream for video contains compressed frames. A frame is a data structure representing the encoded data for one displayable image in the video sequence. This data structure consists of one two-dimensional array of luminance pixels, and two two-dimensional arrays of chrominance samples, i.e., color difference samples. The color difference samples are typically sampled at half the sampling rate of the luminance samples in both vertical and horizontal directions, producing a sampling mode of 4:2:0 (luminance:chrominance:chrominance). Although, the color difference can also be sampled at other frequencies, for example one-half the sampling rate of the luminance in the vertical direction and the same sampling rate as the luminance in the horizontal direction, producing a sampling mode of 4:2:2.
A frame is typically further subdivided into smaller subunits, such as macroblocks. A macroblock is a data structure having a 16xc3x9716 array of luminance samples and two 8xc3x978 array of adjacent chrominance samples. The macroblock contains a header portion having motion compensation information and 4 block data structures. A block is the basic unit for DCT-based transform coding and is a data structure encoding an 8xc3x978 sub array of pixels. A macroblock represents four luminance blocks and two chrominance blocks.
Both NMPEG-1 and MPEG-2 support multiple types of coded frames: Intra (I) frames, Forward Predicted (P) frames, and Bidirectionally Predicted (B) frames. I frames contain only intrapicture coding. P and B frames may contain both intrapicture and interpicture coding. I and P frames are used as reference frames for interpicture coding.
In interpicture coding, the redundancy between two frames is eliminated as much as possible and the residual differences, i.e. interpicture prediction errors, between the two frames are transmitted, the frame being decoded and a prediction frame. Motion vectors are also transmitted in interpicture coding that uses motion compensation. The motion vectors describe how far, and in what direction the macroblock has moved compared to the prediction macroblock. Interpicture coding equires the decoder 10 to have access to the previous and/or future images, i.e. the I and/or P frames, that contain information needed to decode or encode the current image. These previous and/or future images need to be stored and then used to decode the current image.
Intrapicture coding for I frames involves the reduction of redundancy between the original pixels in the frame using block-based DCT techniques, although other coding techniques can be used. For P and B frames, intrapicture coding involves using the same DCT-based techniques to remove redundancy between the interpicture prediction error pixels.
Referring again to FIG. 1A. The output of the FIFO 30 is coupled to a macroblock header parser 36. The header parser 36 parses the information into macroblocks, and then parses the macroblocks and sends the header portion of each macroblock to an address calculation circuit 96. The address calculation circuit 96 determines the type of prediction to be performed to determine which prediction frames a motion compensation engine will need to access. Using the motion vector information, the address calculation circuit 96 also determines the address in memory 160 where the prediction frame, and the prediction macroblock within the frame, that is needed to decode the motion compensated prediction for the given macroblock to be decoded is located.
The prediction macroblock is obtained from memory 160 and input into the half-pel filter 78, which is coupled to the address calculation circuit 96. Typically there is a DMA engine 162 in the decoder that controls all of the interfaces with the memory 180. The half-pel filter 78 performs vertical and horizontal half-pixel interpolation on the fetched prediction macroblock as dictated by the motion vectors. This obtains prediction macroblocks.
As explained earlier, pixel blocks in I frames and prediction error pixels blocks in P or B frames are encoded using DCT-based techniques. In this approach, the pixels are transformed using the DCT into DCT coefficients. These coefficients are then quantized in accordance with quantization tables. The quantized DCT coefficients are then further encoded as variable length Huffinan codes to maximize efficiency, with the most frequently repeated values given the smallest codes and increasing the length of the codes as the frequency of the values decreases. Although codes other than the Huffman codes can be used depending on the decompression protocol. The coefficients are ordered in a rectangular array format, with the largest value in the top left of the array and typically decreasing in value to the right and bottom of the array. To produce a serial data bitstream the array is re-ordered. The order of the serialization of the coefficients is in a zig-zag format starting in the top right comer of the array, i.e if the array is thought of in a matrix format the order of the elements in zig-zag format is 11, 12, 21, 31, 22, 13, 14, etc., as shown in FIG. 1B. The quantization can be performed either before or after the zig-zag scan.
Still referring to FIG. 1A, the header parser 36 sends the encoded block data structures to a block decoder 42. The block decoder 42 decodes variable length codes representing the encoded blocks and converts them into fixed length pulse code modulation (PCM) codes. These codes represent the DCT coefficients of the encoded blocks. The PCM codes are a serial representation of the 8xc3x978 block array obtained in a zig-zag format. The inverse zig-zag scanner 54, connected to the block decoder 42, converts the serial representation of the 8xc3x978 block array obtained in a zig-zag format to a rectangular 8xc3x978 block array, which is passed to the inverse quantizer 48. The inverse quantizer 48 performs the inverse quantization based on the appropriate quantization tables and then passes that to the IDCT circuit 66. The IDCT circuit 66 performs the inverse DCT on its input block and produces the decompressed 8xc3x978 block. The inventors have found that these circuits can be broken down into functional blocks.
The prediction macroblock and the interpicture prediction errors are summed in the summing circuit 72 and passed to the assembly unit 102. Because some frames in interpicture compression require access to future frames to be decoded, the required frames should be sent before the frame that requires them. In the MPEG-2 standard frames can require both past and future frames for decompression, and the compressed frames are not sent in the same order that they are displayed in the video sequence. The assembly unit 102 ensures that the information is placed in the correct place in memory to correspond to the frame being decompressed. The resulting decoded macroblock now needs to be stored in the memory 160 in the place designated for it by the assembly unit 102. All frames need to be stored in memory 160 because the decoded macroblock may not be the next macroblock that is to be sent to the display due to the storing and transmission format of the decompression protocol. In MPEG-2 and other decompression protocols that use interpicture compression, the frames are encoded based on past and future frames. Therefore, in order to decode the frames properly, the frames are not sent in order and need to be stored until they are to be displayed.
The memory requirements of the decoder 10 for a Phase Alternation Line (PAL) and National Television Standards Committee (NTSC) application, capable of supporting 16 Mbits PAL video signals, are typically broken down to the audio and MPEG-2 video requirements. When the audio is MPEG-1, the audio decoder requires 131,072 bits of memory. The MPEG-2 video memory 160 can be logically configured into buffers as follows:
A xe2x80x9cBit buffer,xe2x80x9d which is a buffer for compressed data that the MPEG-2 standard fixes at 1.75 Mbits plus an extra amount, for example 835,584 bits, for a non-ideal decompression process;
An I frame buffer for a decompressed I-frame in a 4:2:0 format;
A P frame buffer for a decompressed P-frame in a 4:2:0 format;
A B frame buffer for a decompressed B-frame in a 4:2:0 format. The B frame buffer can be optimized to require a reduced amount of memory, that is 0.7407 or 0.6111 of an I frame respectively for PAL or NTSC system.
According to the present MPEG-2 standard technique, and regardless of which frame, i.e. I, P, or B, is concerned, each frame buffer may occupy an amount of memory given by the following table:
Taking a PAL system, which represents the most burdensome case, as a reference example, the total amount of memory required is given by:
1,835,008+835,584+4,976,640+4,976,640+(4,976,640*0.7407)=16,310,070 bits.
This calculation takes into account a 0.7407 optimization of the B-picture frame buffer.
Therefore a typical MPEG-2 decoder 10 requires 16 Mbits of memory to operate in the main profile at main level mode (MP at ML). This means that the decoder requires a 2 Mbyte memory 160. Memory 160 is dedicated to the MPEG decoder 10 and increases the price of the decoder 10. In current technology the cost of this additional dedicated memory 160 can be a significant percentage of the cost of the decoder.
Additionally, the decoder 10 should be able to access the memory 160 quickly enough to be able to operate in real time. This means than the decoder 10 should decode images fast enough so that any delay in decoding cannot be detected by a human viewer. A goal is to have the decoder 10 operate in real time without dropping so many frames that it becomes noticeable to the human viewer of the movie. If the decoder 10 does not operate in real time the decoded movie would stop and wait periodically between images until the decoder 10 can get access to the memory to process the next image.
When the memory 160 used for data storage is on a separate chip than the decoder 10, the two chips must be electrically coupled. The input/output pins of the decoder 10 are coupled to the input/output pins of the memory 160 by external metal connections. This increases the amount of time it takes for the decoder 10 to read data out of the memory 160 and write data into the memory 160.
The present invention provides a method of reducing memory required for decompression of a compressed frame by storing frames in a compressed format using DCT-based techniques. The decoder includes a decoder module coupled to a DCT encoder module. The DCT encoder module has an output coupled to a memory. The stored DCT decoder module has an input coupled to the memory, and two outputs, one coupled to the decoder module and the other coupled to an output of the decoder.
In operation, the compressed frame is decompressed in the decoder module to obtain a decompressed frame. The decompressed frame is compressed in the DCT encoder module to obtain a recompressed frame. The recompressed frame is then stored in memory. In a DCT-based decoder, preferably this is only performed for frames having interpicture prediction errors. Most of the decoder module and all of the DCT encoder module can be by-passed for frames not having interpicture prediction errors. The recompressed frame is stored in the memory without having been decompressed. The digital representation of a compressed frame encoded using DCT techniques is much smaller than the digital representation of a decompressed frame and needs much less room in memory than the decompressed frame. Because the frames that are used in the decoding of other frames or that are displayed are stored in a compressed format, the decoder requires much less memory.
The reduction in the required memory allows the memory to be smaller and embedded in the decoder.
In another embodiment of the invention, when the decoder is a DCT-based decoder the stored DCT decoder module can be eliminated and the DCT decoder module contained in the decoder module can be used to both decompress the compressed frame and to decompress the frames needed by the motion compensation engine, provided the DCT decoder module operates fast enough to perform both functions satisfactorily.
Another advantage of the present invention is the significant reduction of memory required by a decoder for decompression of images.
A further advantage of the present invention is the significant cost reduction in the cost of the decoder due to the decoder""s lower memory requirement.
Another advantage of the present invention is that the memory needed by the decoder to decompress images can be embedded in the decoder reducing the time the decoder takes to access the memory and decreasing overall system cost by eliminating external memory devices.
A further advantage is that the above advantages are achieved without a significant increase in die area of the decoder at current integration levels.