This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
Motion Compensated Prediction (MCP) is a technique used by video compression standards to reduce the size of an encoded bitstream. In MCP, a prediction for a current frame is formed using a previously coded frame(s), where only the difference between original and prediction signals, representative of the current and predicted frames, is encoded and sent to a decoder. A prediction signal, representative of a prediction frame, is formed by first dividing a current frame into blocks, e.g., macroblocks, and searching for a best match in a reference frame for each block. In this way, the motion of a block relative to the reference frame is determined and this motion information is coded into a bitstream as motion vectors. A decoder is able to reconstruct the exact prediction frame by decoding the motion vector data encoded in the bitstream.
The motion vectors are not limited to having full-pixel accuracy, but could have fractional-pixel accuracy as well. That is, motion vectors can point to fractional-pixel positions/locations of the reference frame, where the fractional-pixel locations can refer to, for example, locations “in between” image pixels. In order to obtain samples at fractional-pixel locations, interpolation filters are used in the MCP process. Conventional video coding standards describe how a decoder can obtain samples at fractional-pixel accuracy by defining an interpolation filter. In MPEG-2, for example, motion vectors can have at most, half-pixel accuracy, where the samples at half-pixel locations are obtained by a simple averaging of neighboring samples at full-pixel locations. The H.264/AVC video coding standard supports motion vectors with up to quarter-pixel accuracy. Furthermore, in the H.264/AVC video coding standard, half-pixel samples are obtained through the use of symmetric and separable 6-tap filters, while quarter-pixel samples are obtained by averaging the nearest half or full-pixel samples.
It is known that better prediction in the MCP process could be obtained by using higher accuracy motion vectors. For example, using motion vectors with ⅛ (one-eighth) pixel accuracy can increase the coding efficiency of video coding system. However, the conventional use of high accuracy motion vectors (e.g., in literature studies and as considered during the development of the H.264/AVC video coding standard) generally increases both encoding and decoding complexity, where the increase in complexity involves two factors. A first factor is that an encoder must perform additional motion estimation steps to check the candidate one-eighth pixel accuracy positions. A second factor is the need for both the encoder and decoder to perform additional and usually complex interpolation to obtain the one-eighth pixel samples.
For example, two different interpolation techniques for one-eighth pixel accuracy are described in the following references: T. Wedi, “⅛-pel motion vector resolution for H.26L”, ITU-T Q.15/SG16, doc. Q15-K-21, Portland, Oreg. USA, August 2000; and T Wedi “Complexity reduced motion compensated prediction with ⅛-pel displacement vector resolution”, ITU-T Q.6/SG16, doc. VCEG-L20, Eibsee, Germany, January 2001. A first conventional algorithm uses a three-stage interpolation process to obtain at least one-eighth pixel sample in frame 106 as indicated in FIG. 1. At least a half-pixel and a quarter-pixel sample can be obtained using 6-tap (or 8-tap) filtering with regard to frames 102 and 104, respectively, and the one-eighth pixel sample can be obtained using bi-linear filtering. In particular, two instances of the 6-tap or 8-tap filtering can be utilized, e.g., Filter 108 and Filter 110, while a single instance of the bi-linear filtering, e.g., Filter 112, can be utilized. With this conventional approach, the interpolation complexity of both quarter-pixel and one-eighth pixel samples are significantly larger than that of the H.264/AVC video coding standard due to the need, for example, to perform the at least two cascaded 6-tap interpolations applied with regard to a full frame 100. In addition, an encoder would need to store at least quarter-pixel upsampled data in a memory unit to perform efficient motion estimation.
A second algorithm involves using direct interpolation for quarter and one-eighth pixel samples to reduce the decoding complexity. Hence, direct interpolation can refer to quarter and one-eighth pixel samples being obtained only by using integer samples. In this way, there is no need to perform operations with long cascading filters. This conventional algorithm has a similar decoding complexity to the H.264/AVC video coding standard. However, this algorithm has drawbacks with regard to encoder complexity. This is because the encoder needs to perform high-complexity interpolation for each candidate quarter and one-eighth pixel motion vectors in a motion estimation stage. Performing such high-complexity interpolation increases the complexity of the encoding process by a considerable amount.
Another alternative still, is to pre-calculate quarter and one-eighth pixel samples before the frame encoding process and store this in memory. However, this approach significantly increases the memory required for the encoder. Therefore, a system and method is needed for utilizing high accuracy motion vectors which does not increase the complexity of the encoding and/or decoding processes, and which increases the coding efficiency of the system and method.