The invention relates to digital video decoding, and more particularly, to a method and apparatus for digital video decoding having reduced memory requirements.
The Moving Picture Experts Group (MPEG) MPEG-2 standard (ISO-13818) is utilized with video applications. The MPEG-2 standard describes an encoded and compressed bit-stream that has substantial bandwidth reduction. The compression is a subjective loss compression followed by a lossless compression. The encoded, compressed digital video data is subsequently decompressed and decoded by an MPEG-2 standard compliant decoder.
The MPEG-2 standard specifies a bit-stream from and a decoder for a very high compression technique that achieves overall image bit-stream compression not achievable with either intraframe coding alone or interframe coding alone, while preserving the random access advantages of pure intraframe coding. The combination of block based frequency domain intraframe encoding and interpolative/predictive interframe encoding of the MPEG-2 standard results in a combination of intraframe encoding advantages and interframe encoding advantages.
The MPEG-2 standard specifies predictive and interpolative interframe encoding and frequency domain intraframe encoding. Block based motion compensation is utilized for the reduction of temporal redundancy, and block based Discrete Cosine Transform based compression is utilized for the reduction of spatial redundancy. Under the MPEG-2 standard, motion compensation is achieved by predictive coding, interpolative coding, and variable length coded motion vectors. The information relative to motion is based on a 16×16 array of pixels and is transmitted with the spatial information. Motion information is compressed with Variable Length Codes, such as Huffman codes.
In general, there are some spatial similarities in chromatic, geometrical, or other characteristic values within a picture/image. In order to eliminate these spatial redundancies, it is required to identify important elements of the picture and to remove the redundant elements that are less important. For example, according to the MPEG-2 standard, a picture is compressed by eliminating the spatial redundancies by chrominance sampling, discrete cosine transform (DCT), and quantization. In addition, video data is actually formed by a continuous series of pictures, which are perceived as a moving picture due to the persistence of pictures in the vision of human eyes. Since the time interval between pictures is very short, the difference between neighboring pictures is very tiny and mostly appears as a change of location of visual objects. Therefore, the MPEG-2 standard eliminates temporal redundancies caused by the similarity between pictures to further compress the video data.
In order to eliminate the temporal redundancies mentioned above, a process referred to as motion compensation is employed in the MPEG-2 standard. Motion compensation relates to the redundancy between pictures. Before performing motion compensation, a current picture to be processed is typically divided into 16×16 pixel sized macroblocks (MB). For each current macroblock, a most similar prediction block of a reference picture is then determined by comparing the current macroblock with “candidate” macroblocks of a preceding picture or a succeeding picture. The most similar prediction block is treated as a reference block and the location difference between the current block and the reference block is then recorded as a motion vector. The above process of obtaining the motion vector is referred to as motion estimation. If the picture to which the reference block belongs is prior to the current picture, the process is called forward prediction. If the reference picture is posterior to the current picture, the process is called backward prediction. In addition, if the motion vector is obtained by referring both to a preceding picture and a succeeding picture of the current picture, the process is called bi-directional prediction. A commonly employed motion estimation method is a block-matching method. Because the reference block may not be completely the same with the current block, when using block-matching, it is required to calculate the difference between the current block and the reference block, which is also referred to as a prediction error. The prediction error is used for decoding the current block.
The MPEG 2 standard defines three encoding types for encoding pictures: intra encoding, predictive encoding, and bi-directionally predictive encoding. An intra-coded picture (I-picture) is encoded independently without using a preceding picture or a succeeding picture. A predictive encoded picture (P-picture) is encoded by referring to a preceding reference picture, wherein the preceding reference picture should be an I-picture or a P-picture. In addition, a bi-directionally predictive picture (B-picture) is encoded using both a preceding picture and a succeeding picture. Bi-directionally predictive pictures (B-pictures) have the highest degree of compression and require both a past picture and a future picture for reconstruction during decoding. It should also be noted that B-pictures are not used as reference pictures. Because I-pictures and P-pictures can be used as a reference to decode other pictures, the I-pictures and P-pictures are also referred to as reference pictures. As B-pictures are never used to decode other pictures, B-pictures are also referred to as non-reference pictures. Note that in other video compression standards such as SMPTE VC-1, B field pictures can be used as a reference to decode other pictures. Hence, the picture encoding types belonging to either reference pictures or non-reference pictures may vary according to different video compression standards.
As mentioned above, a picture is composed of a plurality of macro-blocks, and the picture is encoded macro-block by macro-block. Each macro-block has a corresponding motion type parameter representing its motion compensation type. In the MPEG 2 standard, for example, each macro-block in an I-picture is intra-coded. P-pictures can comprise intra-coded and forward motion compensated macro-blocks; and B-pictures can comprise intra-coded, forward motion compensated, backward motion compensated, and bi-directional motion compensated macro-blocks. As is well known in the art, an intra-coded macro-block is independently encoded without using other macro-blocks in a preceding picture or a succeeding picture. A forward motion compensated macro-block is encoded by using the forward prediction information of a most similar macro-block in the preceding picture. A bi-directional motion compensated macro-block is encoded by using the forward prediction information of a reference macro-block in the preceding picture and the backward prediction information of another reference macro-block in the succeeding picture. The formation of P-pictures from I-pictures, and the formation of B-pictures from a pair of past and future pictures are key features of the MPEG-2 standard.
FIG. 1 shows a conventional block-matching process of motion estimation. A current picture 120 is divided into blocks as shown in FIG. 1. Each block can be any size. For example, in the MPEG standard, the current picture 120 is typically divided into macro-blocks having 16×16 pixels. Each block in the current picture 120 is encoded in terms of its difference from a block in a preceding picture 110 or a succeeding picture 130. During the block-matching process of a current block 100, the current block 100 is compared with similar-sized “candidate” blocks within a search range 115 of the preceding picture 110 or within a search range 135 of the succeeding picture 130. The candidate block of the preceding picture 110 or the succeeding picture 130 that is determined to have the smallest difference with respect to the current block 100, e.g. a block 150 of the preceding picture 110, is selected as a reference block. The motion vectors and residues between the reference block 150 and the current block 100 are computed and coded. As a result, the current block 100 can be restored during decompression using the coding of the reference block 150 as well as the motion vectors and residues for the current block 100.
The motion compensation unit under the MPEG-2 Standard is the macroblock unit. The MPEG-2 standard sized macroblocks are 16×16 pixels. Motion information consists of one vector for forward predicted macroblocks, one vector for backward predicted macroblocks, and two vectors for bi-directionally predicted macroblocks. The motion information associated with each macroblock is coded differentially with respect to the motion information present in the reference macroblock. In this way a macroblock of pixels is predicted by a translation of a macroblock of pixels from a past or future picture. The difference between the source pixels and the predicted pixels is included in the corresponding bit-stream. That is, the output of the video encoder is a digital video bit-stream comprising encoded pictures that can be decoded by a decoder system.
FIG. 2 shows difference between the display order and the transmission order of pictures of the MPEG-2 standard. As mentioned, the MPEG-2 standard provides temporal redundancy reduction through the use of various predictive and interpolative tools. This is illustrated in FIG. 2 with the use of three different types of frames (also referred to as pictures): “I” intra-coded pictures, “P” predicted Pictures, and “B” bi-directional interpolated pictures. As shown in FIG. 2, in order to decode encoded pictures being P-pictures or B-pictures, the picture transmission order in the digital video bit-stream is not the same as the desired picture display order.
A decoder adds a correction term to the block of predicted pixels to produce the reconstructed block. Typically, a video decoder receives the digital video bit-stream and generates decoded digital video information, which is stored in an external memory area in frame buffers. As described above and illustrated in FIG. 2, each macroblock of a P-picture can be coded with respect to the closest previous I-picture, or with respect to the closest previous P-picture. That is, each macroblock of a B-picture can be coded by forward prediction from the closest past I-picture or P-picture, by backward prediction from the closest future I-picture or P-picture, or bi-directionally using both the closest past I-picture or P-picture and the closest future I-picture or P-picture. Therefore, in order to properly decode all the types of encoded pictures and display the digital video information, at least the following three frame buffers are required:                1. Past reference frame buffer        2. Future reference frame buffer        3. Decompressed B-frame buffer        
Each buffer must be large enough to hold a complete picture's worth of digital video data (e.g., 720×480 pixels for MPEG-2 Main Profile/Main Level). Additionally, as is well known by a person of ordinary skill in the art, both luma (short for luminance) data and chroma (short for chrominance) data require similar processing. In order to keep the cost of the video decoder products down, an important goal has been to reduce the amount of external memory (i.e., the size of the frame buffers) required to support the decode function.