The invention relates to image processing for encoding or decoding video signals, and more particularly, to methods of performing motion compensation or motion estimation having reduced memory requirements when encoding or decoding video.
The Moving Picture Experts Group (MPEG) standards such as MPEG-2 (ISO-13818) are widely utilized with video applications. The MPEG-2 standard describes an encoded and compressed bit-stream that has substantial bandwidth reduction. The compression is a subjective loss compression followed by a lossless compression. The encoded, compressed digital video data is subsequently decompressed and decoded by an MPEG-2 standard compliant decoder.
The MPEG-2 standard specifies a bit-stream form encoded by a very high compression technique that achieves overall image bit-stream compression not achievable with either intraframe coding alone or interframe coding alone, while preserving the random access advantages of pure intraframe coding. The combination of block based frequency domain intraframe encoding and interpolative/predictive interframe encoding of the MPEG-2 standard results in a combination of intraframe encoding advantages and interframe encoding advantages.
The MPEG-2 standard specifies predictive and interpolative interframe encoding and frequency domain intraframe encoding. Block based motion estimation and motion compensation are utilized for the reduction of temporal redundancy, and block based Discrete Cosine Transform (DCT) based compression is utilized for the reduction of spatial redundancy. Under the MPEG-2 standard, motion compensation is achieved by predictive coding, interpolative coding, and Variable Length Coded (VLC) motion vectors. The information relative to motion is based on a 16×16 array of pixels and is transmitted with the spatial information. Motion information is compressed with Variable Length Codes, such as Huffman codes.
In general, there are some spatial similarities in chromatic, geometrical, or other characteristic values within a picture/image. In order to eliminate these spatial redundancies, it is required to identify important elements of the picture and to remove the redundant elements that are less important or are repeated. For example, according to the MPEG-2 standard, a picture is compressed by eliminating the spatial redundancies by chrominance sampling, discrete cosine transform (DCT), and quantization. In addition, video data is actually formed by a continuous series of pictures, which are perceived as a moving picture due to the persistence of pictures in the vision of human eyes. Since the time interval between pictures is very short, the difference between neighboring pictures is very tiny and mostly appears as a change of location of visual objects. Therefore, the MPEG-2 standard eliminates temporal redundancies caused by the similarity between consecutive pictures to further compress the video data.
In order to eliminate the temporal redundancies mentioned above, a process referred to as motion estimation or motion compensation is employed in the MPEG-2 standard. Motion estimation or motion compensation relate to determining the redundancy between pictures. Before performing motion compensation, a current picture to be processed is typically divided into 16×16 pixel sized macro-blocks (MB). For each current macro-block, a most similar prediction block of a reference picture (which can be a preceding picture or a succeeding picture) is then determined by comparing the current macro-block with “candidate” macro-blocks of the reference picture. The most similar prediction block is treated as a reference block and the location difference between the current block and the reference block is then recorded as a motion vector. The above process of obtaining the motion vector is referred to as motion estimation. If the picture to which the reference block belongs is prior to the current picture, the process is called forward prediction. If the reference picture is posterior to the current picture, the process is called backward prediction. In addition, if the motion vector is obtained by referring both to a preceding picture and a succeeding picture of the current picture, the process is called bi-directional prediction. A commonly employed motion estimation method is a block-matching method. Because the reference block may not be completely the same with the current block, when using block-matching, it is required to calculate the difference between the current block and the reference block, which is also referred to as a prediction error. The prediction error is used for decoding the current block.
The MPEG 2 standard defines three encoding types for encoding pictures: intra encoding, predictive encoding, and bi-directionally predictive encoding. An intra-coded picture (I picture) is encoded independently without using a preceding picture or a succeeding picture. A predictive encoded picture (P picture) is encoded by referring to a preceding reference picture, wherein the preceding reference picture should be an I picture or a P picture. In addition, a bi-directionally predictive picture (B picture) is encoded using both a preceding picture and a succeeding picture. Bi-directionally predictive pictures (B pictures) have the highest degree of compression and require both a past picture and a future picture for reconstruction during decoding. I pictures and P pictures can be used as reference pictures to encode or decode other pictures. As B pictures are never used to decode other pictures, B pictures are also referred to as non-reference pictures. Note that in other video compression standard such as H.264, B pictures can be used as a reference to decode other pictures. Hence, the picture encoding types belonging to either reference picture or non-reference picture may vary according to different video compression standard.
As mentioned above, a picture is composed of a plurality of macro-blocks, and the picture is encoded macro-block by macro-block. Each macro-block has a corresponding motion type parameter representing its motion compensation type. FIG. 1 shows a conventional block-matching process of motion estimation. A current picture 120 is divided into blocks. Each block can be any size. For example, in the MPEG standard, the current picture 120 is typically divided into macro-blocks having 16×16 pixels. Each interframe coded block in the current picture 120 is encoded in terms of its difference from a block in a preceding picture 110 or a succeeding picture 130. During the block-matching process of a current block 100, the current block 100 is compared with similar-sized “candidate” blocks within a search range 115 of the preceding picture 110 or within a search range 135 of the succeeding picture 130. The candidate block of the preceding picture 110 or the succeeding picture 130 that is determined to have the smallest difference with respect to the current block 100, e.g. a block 150 of the preceding picture 110, is selected as a reference block. The motion vectors and residues between the reference block 150 and the current block 100 are computed and coded. As a result, the current block 100 can be restored during decompression using the coding of the reference block 150 as well as the motion vectors and residues for the current block 100.
The basic unit for motion compensation under the MPEG-2 Standard is a macro-block. The MPEG-2 standard sized macro-blocks are 16×16 pixels. Motion information consists of one vector for forward predicted macro-blocks, one vector for backward predicted macro-blocks, and two vectors for bi-directionally predicted macro-blocks. In this way a macro-block of pixels is predicted by a translation of a macro-block of pixels from a past or future picture. The difference between the source pixels and the predicted pixels is included in the corresponding bit-stream. That is, the output of the video encoder is a digital video bit-stream comprising encoded pictures that can be decoded by a decoder system.
FIG. 2 shows a difference between the display order and the decoding order of pictures of the MPEG-2 standard. As mentioned, the MPEG-2 standard provides temporal redundancy reduction through the use of various predictive and interpolative tools. This is illustrated in FIG. 2 with the use of three different types of frames (also referred to as pictures): “I” intra-coded pictures, “P” predicted Pictures, and “B” bi-directional interpolated pictures. As shown in FIG. 2, in order to encode or decode pictures being P pictures or B pictures, the picture transmission order in the digital video bit-stream is not the same as the desired picture display order.
A decoder adds a correction term to the block of predicted pixels to produce the reconstructed block. Typically, a video decoder receives the digital video bit-stream and generates decoded digital video information, which is stored in an external memory area in frame buffers. As described above and illustrated in FIG. 2, each macro-block of a P picture can be coded with respect to the closest previous I picture, or with respect to the closest previous P picture. Each macro-block of a B picture can be coded by forward prediction from the closest past I picture or P picture, by backward prediction from the closest future I picture or P picture, or bi-directionally using both the closest past I picture or P picture and the closest future I picture or P picture. Therefore, in order to properly decode all the types of encoded pictures and display the digital video information, at least the following three frame buffers are required:
1. Past reference frame buffer
2. Future reference frame buffer
3. Current frame buffer
Each buffer must be large enough to hold a complete picture of digital video data (e.g., 720×480 pixels for MPEG-2 Main Profile/Main Level). Additionally, as is well known by a person of ordinary skill in the art, both luminance data and chrominance data require similar processing. In order to keep the cost of the video decoder products down, an important goal has been to reduce the amount of memory (i.e., the size of the frame buffers) required to support the decode function.
For example, different related art methods reduce memory required for decompression of a compressed frame by storing frame data in the frame buffers in a compressed format. During operations, the compressed frame is decompressed by the decoder module to obtain a decompressed frame. However, the decompressed frame is then compressed by an additional compression module to obtain a recompressed frame, which is stored in the memory. Because the frames that are used in the decoding of other frames or that are displayed are stored in a compressed format, the decoder system requires less memory. However, some drawbacks exist in the related art. Firstly, the recompressed reference frame does not allow easily random accessing of a prediction block within regions of the recompressed reference frames stored in the memory. Secondly, the additional recompression and decompression modules dramatically increase the hardware cost and power consumption of the decoder system. Additionally, these solutions have similar problems when used in a video encoder.