This invention relates generally to MPEG motion estimation methods, and in particular to an efficient and improved half pixel accuracy motion estimation prior to motion compensation.
With the advent of computer networks, the storage and transmission of multimedia content has become commonplace. In this environment, a number of compression techniques and standards have emerged to reconcile data-intensive media such as audio and video with the typically limited storage capacity of computers, and with the typically limited data rates for networks.
One such standard for digital audio/video compression has developed by the Moving Pictures Excerpt Group (MPEG) of the International Standards Organization. This standard was first promulgated as MPEG-1, and has undergone several revisions named MPEG-2 (broadcast as a standard for high-definition television, now canceled), and MPEG-4 (medium resolution videoconferencing with low frame rates in a sixty-four-kilobit-per-second channel). These standards are collectively referred to herein as MPEG.
MPEG employs single-frame compression based upon a two-dimensional discrete cosine transformation (xe2x80x9cDCTxe2x80x9d), and quantization of the resulting coefficients. In this respect, it resembles the Joint Photographic Excerpts Group(xe2x80x9cJPEGxe2x80x9d) still image compression standard. The MPEG standard provides further compression based upon temporal redundancy.
The MPEG standard is complex, particularly in view of the Constrained Parameter Bitstream (CPB) profile, which defines the MPEG standard to ensure compatibility among particular implementations. However, since MPEG achieves high compression ratios, it is widely used. Even with the CPB profile, MPEG provides a significant amount of design flexibility. While the flexibility of MPEG has led attention to be focused on methods for achieving greater compression ratios in the video stream, and on ensuring that the video can be decoded at an adequate frame rate, there remains significant room for improvement at the encoding end of MPEG systems.
The known basic scheme is to predict motion from frame to frame in the temporal direction and in the spatial directions. The DCT""s (Discrete Cosine Transforms) organize any redundancy in the spatial directions. The DCT""s may be done on 8xc3x978 blocks, and the motion prediction is done in the luminance (Y) channel on 16xc3x9716 blocks. In other words, given the 16xc3x9716 block in the current frame that is intended to be coded, the object is to look for a close match to that block in a previous or future frame (there are backward prediction modes where later frames are sent first to allow interpolating between frames). The DCT coefficients (of either the actual data or the difference between this block and the close match) are quantized, which means that they are divided by some value to drop bits off the lower end. Hopefully, many of the coefficients will end up being zero. The quantization can change for every macroblock (a macroblock is 16xc3x9716 of Y and the corresponding 8xc3x978""s in both U and V). The result of all this, which includes the DCT coefficients, the motion vectors and the quantization parameters is Huffman coded preferably using fixed tables. The DCT coefficients have a special Huffman table that is two-dimensional in that one code specifies a run-length of zeros and the other specifies a non-zero value that ended the run.
As known in the art, there are three types of coded frames. There are I or intra frames. They are simply a frame coded as a still image, not using any past history. Then there are P or predicted frames. They are predicted from the most recently reconstructed I or P frame. Each macroblock in a P frame can either come with a vector and difference DCT coefficients for a close match in the last I or P, or it can just be intra coded (like in the I frames or P frames) if there was no good match.
Lastly, there are B (bi-directional) frames. They are predicted from the closest two I or P frame, one in the past and one in the future. It is desirable to search for matching blocks in those frames, and try different comparisons, e.g., the forward vector, the backward vector, and try averaging the two blocks from the future and past frames, and subtracting that from the block being coded. If none of those will work, the block may be intra coded.
In video information processing, even though motion prediction accuracy historically has been governed by integer pel (or pixel) accuracy, more recently, motion prediction accuracy has been considerably improved by half pel motion estimation. It is known that video standards such as MPEG xc2xd and H 261/263 endorse the act of specifying motion vectors to half pixel accuracy.
An overview of the advantages of performing estimation using the half pixel method in the H 203/261 video compression standards can be found inxe2x80x9cA Fast Software-only H 263 Video Coder: by Wei-Lien Hsu, published by Digital Equipment Corporation, Nashua, NH 03062 which is incorporated herein by reference.
It is to be noted that in H.263 although an interpolated image based in a reconstructed reference is created and used it requires a subsampling process, to read the interpolated pixels. For assembly code implementations, subsampling an unaligned reference image is considerably expensive because it requires extra steps to load more data and to skip pixels. In video coding algorithms, generally, reference images are unaligned. Furthermore the reference image has to be created by enlarging the half pixel image four times, which involves an expensive calculation. The enlarged reference image may have to be aligned for comparison purposes with a current frame which needs to be estimated.
There is generally a need for a more efficient and economical method of motion estimation, preferably by an improved half pixel method.
The present invention consists in obtaining an improved form of half pixel accuracy in motion estimation with certain attendant advantages. Half pixel search is done in the present invention by a method of averaging, as explained in more detail herein after.
Traditional implementations of half pixel motion estimation/compensation are known to offer some advantages but still are associated with set backs. Known half pixel motion estimation methods have disadvantages in that:
1. they require extra steps to down sample an interpolated reference image; or
2. it is necessary to perform pixel interpolation repeatedly during each calculation of the sum of absolute distortion (SAD).
In conventional MPEG video compression, bilinear interpolation is performed whenever the interpolated pixel is required. In other words, in conventional MPEG video compression, there is redundancy and lack of efficiency at least in the following three areas:
(i.) Motion estimation: In the motion estimation phase, in as much as the interpolated blocks are highly overlapped, the interpolation on each pixel must be repeatedly done.
(ii.) Reconstruction: Since the reference images are interpolated (using the half pixel approach) during the motion estimation phase, it will be necessary to redo the interpolation for the video reconstruction phase.
(iii.) In as much as the interpolated reference images are NOT stored in conventional MPEG video compression, there is no prior basis available for B frame calculations. B frames in the same group are bi-directional predicted, based on the same reference frames (I-P or P-P frames). For B frame calculations it is necessary to redo the interpolation, which can be avoided if the interpolated images are stored and used for B frame calculations.
The present invention optimizes the speed of half pixel accuracy motion estimation/compensation in two aspects:
a. To reduce memory traffic, an interpolated reference image is created before coding, so that it can be preloaded into a cache and can be used whenever needed without having to newly create it; and
b. To avoid redundant processing during the access of the subsampling of interpolated data, the interpolated image (using a half pixel method) is partitioned into four areas.
The four areas are defined based on where in a 2xc3x972 square region, the pixels fall. The partitioned interpolated image is stored into four distinct predetermined areas as described in more detail herein after. Using the inventive method based in the above inventive methodology has shown that this implementation has improved the performance of MPEG-2 encoding by about 10%.
The invention in its broad form resides in a half pixel motion estimation and reconstruction method of the type wherein a block from a current video frame is compared with a selected interpolated reference video image, comprising the steps of: creating an interpolated reference image with half pixel accuracy from a reference video data frame before coding; partitioning the interpolated reference image into four areas and storing interpolated reference image pixels in four separate buffer memory areas; calculating four half-pixel values for the interpolated reference image by using bilinear interpolation methods based on pixel position; ensuring that the current video block and said best match have similar pixel positions; searching within the interpolated image to obtain a best interpolated match block for a current video block which is being estimated, by applying a vector to the current video block, recognizing that the vector has half pixel accuracy; completing motion estimation by calculating a block error as a result of said step of searching; and reconstructing estimated video data by assembling estimated video data without having to skip pixels.
The invention in another aspect resides in a motion estimation method for MPEG-2 encoding of the type which uses a half pixel block matching method, wherein a current data frame to be encoded is compared with a reference frame, comprising the steps of: creating an interpolated reference image before coding of incoming video information is done; dividing and partitioning the interpolated reference image into four areas constituting reference blocks wherein each area relates to one type of interpolated pixel; storing the partitioned reference image in a buffer in four distinct buffer regions; and to perform estimation, comparing an integer pixel block current frame from an incoming video information with a selected one of said four buffer regions, whereby a block error in the form of a sum of absolute distortion (SAD) is calculated for the motion estimation.