In a video encoding system using motion compensation, motion of the encoding target image is detected using a reference image which has already been encoded in the past and stored in a frame memory, and a motion compensated image (predicted image) is created from the reference image, using the detected motion vector. In this case, accuracy of the reference image is changed from an accuracy in pixel units that originally existed in the reference image (integral pixel accuracy) to an accuracy in pixel units located between adjacent pixels in the reference image (fractional pixel accuracy), so that the motion of the encoding target image can be compensated at high accuracy, and the encoding efficiency can be improved.
In the case of the H.264 encoding system, cited in International Telecommunication Union, “Advanced Video Coding for Generic Audio Visual Services”, high encoding efficiency is implemented by performing motion detection and motion compensation using a reference image at ¼ pixel accuracy.
Specifically, a reference image with a ½ pixel accuracy is generated by using a 6-tap filter with coefficients of (1, −5, 20, 20, −5, 1)/32 for a reference image with an integral pixel accuracy. Then a reference image with a ¼ pixel accuracy is generated by using a 2-tap averaging filter with a coefficients (1, 1)/2 for a reference image with ½ pixel accuracy.
A method for generating a reference image with a ¼ pixel accuracy according to the H.264 encoding system will be described in detail with reference to FIG. 1. FIG. 1 is a diagram depicting an arrangement of pixels in a reference image with ¼ pixel accuracy. A ½ pixel signal in a mid-position of two integral pixel signals in the horizontal direction is generated by a 6-tap filter in the horizontal direction. For example, pixel b is calculated as the following Expression (1), by using the 6-tap filter in the horizontal direction for integral pixels E, F, G, H, I and J.b=(E−5F+20G+20H−5I+J)/32  (1)
A ½ pixel signal in a mid-position of two integral pixel signals in the vertical direction is generated by a 6-tap filter in the vertical direction. For example, the pixel h is calculated as the following Expression (2) by using the 6-tap filter in the vertical direction for integral pixels A, C, G, M, R and T.h=(A−5C+20G+20M−5R+T)/32  (2)
A ½ pixel signal in a mid-position of four integral pixel signals is generated by using a 6-tap filter in both the horizontal and vertical directions. For example, a pixel j is calculated as the following Expression (3) by generating ½ pixel signals aa, bb, b, s, gg and hh using the 6-tap filter in the horizontal direction, and then using the 6-tap filter in the vertical direction for these signals.j=(aa−5bb+20b+20s−5gg+hh)/32  (3)
Or pixel j may be generated as the following Expression (4) by generating ½ pixel signals cc, dd, h, m, ee and ff by vertical filtering, and then performing horizontal filtering [for these signals].j=(cc−5dd+20h+20m−5ee+ff)/32  (4)
After all the ½ pixel signals are calculated, ¼ pixel signals are generated using an averaging filter Pixels a, c, i and k in FIG. 1 are generated by using an averaging filter in the horizontal direction for adjacent integral pixel signals or ½ pixel signals. For example, pixel a is calculated as the following Expression (5).a=(G+b)/2  (5)
Pixels d, f, n and q are generated by using an averaging filter in the vertical direction for adjacent integral pixel signals or ½ pixel signals. For example, pixel f is calculated as the following Expression (6).f=(b+j)/2  (6)
Pixels e, g, p and r are calculated using an averaging filter in a diagonal direction. For example, pixel r is calculated as the following Expression (7).r=(m+s)/2  (7)
In this way, according to the H.264 encoding system, a reference image with ¼ pixel accuracy is always generated from a reference image with an integral pixel accuracy by using a fixed 6-tap filter and a 2-tap averaging filter.
On the other hand, it is desirable to generate a reference image with a fractional pixel accuracy using a different filter depending on the frame, since video images have different motion quantities and frequency characteristics of pixel accuracy depending on the frame.
The following Non-patent Document 1 discloses that a reference image with ¼ pixel accuracy is generated by using a different filter depending on the frame. In concrete terms, a two-dimensional 6-tap filter, of which symmetry in the horizontal and vertical directions is limited is provided for each position (positions a, b, c, d, e, f, g, h, i, j, k, n, p, q and r in FIG. 1) with fractal pixel accuracy, and a reference image with ¼ pixel accuracy is directly generated by using each filter for the reference image with integral pixel accuracy. In this case, a filter for generating a reference image with ¼ pixel accuracy is changed for each frame, therefore information on 54 filter coefficients must be encoded and decoded for each frame.
Whereas in the following Non-patent Document 2, a filter for generating a reference image with ½ pixel accuracy is changed for each frame. In concrete terms, a reference image with ½ pixel accuracy is generated from a reference image with an integral pixel accuracy, by using a one-dimensional symmetrical 6-tap filter with such filter coefficients as (a1, a2, a3, a3, a2, a1). Since filter coefficients to generate a reference image with ½ pixel accuracy are different depending on the frame, information on three filter coefficients (a1, a2, a3) must be encoded and decoded for each frame.
Non-patent Document 1: Y. Vatis, B. Elder, D. Nguyen, J. Ostermann, “Motion- and Aliasing-Compensated Prediction Using a Two-Dimensional Non-Separable Adaptive Wiener Interpolation Filter,” Proc. ICIP 2005, IEEE International Conference on Image Processing, Genova, Italy, September, 2005
Non-patent Document 2: T. Wedi, “Adaptive Interpolation Filter for Motion Compensated Hybrid Video Coding”, Picture Coding Symposium (PCS 2001), 2001