1. Field of the Invention
The present invention relates to a motion compensation for an inter-frame prediction and in particular to a fractional sample interpolation used in the motion compensation which and achieves low complexity and high precision.
2. Description of the Related Art
Digital video requires a large amount of data to represent each and every frame of a digital video sequence (e.g., series of frames) in an uncompressed manner. It is not feasible for most applications to transmit uncompressed digital video across computer networks because of bandwidth limitations. In addition, uncompressed digital video requires a large amount of storage space. The digital video is normally encoded in some manner to reduce the storage requirements and reduce the bandwidth requirements.
One technique for encoding digital video is inter-frame prediction, or inter-prediction. Inter-prediction exploits temporal redundancies among different frames. Temporally adjacent frames of video typically include blocks of pixels, which remain substantially the same. During the encoding process, a motion vector interrelates the movement of a block of pixels in one frame to a block of similar pixels in another frame. Accordingly, the system is not required to encode the block of pixels twice, but rather encodes the block of pixels once and provides a motion vector to predict the other block of pixels.
Another technique for encoding digital video is intra-frame prediction or intra-prediction. Intra-prediction encodes a frame or a portion thereof without reference to pixels in other frames. Intra-prediction exploits spatial redundancies among blocks of pixels within a frame. Because spatially adjacent blocks of pixels generally have similar attributes, the efficiency of the coding process is improved by referencing the spatial correlation between adjacent blocks. This correlation may be exploited by prediction of a target block based on prediction modes used in adjacent blocks.
In the inter-prediction, a received picture is predicted, based on motion estimation and compensation. Moving objects in video often appear from frame to frame with which all or part of it are relocated in those subsequent frames. Despite those relocations, correlation among the sequence of the frames is high and gives rise to redundancy. This temporal redundancy can be reduced by comparing and relating the samples in the current frame to the location of the same object in the reference frames. Specifically, during motion estimation, the current frame or a partition thereof is compared with reference frames which may be temporally previous or forward of the current frame. A pattern of pixels within a search range set in the respective reference frame is compared with the pattern of pixels exhibited in the current frame until a reference frame is found which contains a pixel pattern best matching the pixel pattern in the current frame to be encoded. Based on the comparison results, an inter-frame displacement vector or a motion vector is estimated. Using the estimated motion vector, motion compensation yields a prediction of the current frame.
The motion vector accuracy and coding efficiency can be increased by applying interpolation to the pixels in the reference picture, which are called samples at integer positions, or simply integer samples, to increase the resolution of the reference picture. Interpolation is to generate fractional samples between each integer sample, using the values of the integer samples. The more fractional samples are generated between the integer samples, the higher the resolution of the reference picture becomes, and the more precisely and accurately a fractional sample displacement can be compensated. For example, in order to accurately compensate a movement of a moving object which is a displacement of only half a pixel, at least half-pixel (pel) interpolation is needed. Motion estimation and compensation may be performed using a number of different block sizes. Individual motion vectors may be determined for partitions having 4×4, 4×8, 8×4, 8×8, 8×16, 16×8 or 16×16 pixels. The provision of small motion compensation partitions improves the ability to handle fine motion details.
H.264/AVC takes a 2-step approach and achieves motion compensation up to a quarter-pel resolution. In H.264/AVC, the first step uses a 6-tap filter to generate intermediate values at a half-pel resolution from the values of surrounding integer samples. In the second step, the values of integer samples and the intermediate values are averaged or the intermediate values are averaged among themselves to generate fractional samples at quarter-pel positions, or simply quarter-pel samples. In B slices, two predictions fractional samples from two predictions may further be averaged. Please note, however, that multiple averaging operations, when cascaded, introduce rounding errors which adversely affects the accuracy and the efficiency of motion compensation. Proposals D321 and E242 of Joint Collaborative Team on Video Coding (JCT-VC) address the rounding error issue associated with bi-directional averaging. These documents propose that a rounding operation be limited to taking place at the last step of bi-directional averaging after two predictions are added.
JCT-VC Draft E603 discloses the use of an 8-tap filter to achieve the quarter-pel resolution. In E603, some of the quarter-pel samples are derived by applying an 8-tap filter to the nearest integer samples and truncating the filtered results down to a predetermined bit depth. The rest of the quarter-pel samples are derived through two processes. In the first process, intermediate values are derived by applying the 8-tap filter to the nearest integer samples in the vertical direction. In the second process, the 8-tap filter is applied to the intermediate values in the horizontal direction and the filtered results are truncated to a predetermined bit depth. This 2-process approach is advantageous in that there is required no fixed order for the vertical filtering and the horizontal filtering in the second process, and thus no signaling to a decoder is necessary regarding the order of the vertical filtering and the horizontal filtering in the second process. However, the motion compensation discussed in E603 requires the definition of additional filtering operations to generate the intermediate values. The filtering operation applied to the intermediate values is costly and requires high computation complexity, in particular for video data with a high bit depth.
Further, in the motion compensation discussed in E603, the intermediate values are not truncated so as to assure the accuracy of the quarter-pel samples calculated therefrom. Thus, the bitwise precision of the calculated values is not constant during the motion compensation discussed in E603. At the end of the first process explained above, the precision of the resultant sample values is increased by an amount determined by the gain of the 8-tap filter. By applying the 8-tap filter to the intermediate values, the precision is then increased again by the same amount as in the first process before truncation to a predetermined precision. Therefore, twice as much truncation of the resolution is needed in the second process as is needed in the first process in order to bring the resolution back to the original bit depth at the end of the second step.