In video coding, in a case of inter-frame prediction (motion compensation) coding, which performs prediction between different frames, an already decoded frame is referred to, a motion vector which minimizes the prediction error energy is determined, and a prediction error signal (also called a residual signal) thereof is subjected to orthogonal transform. Thereafter, quantization is applied, entropy encoding is performed, and finally binary data, i.e., a bitstream is obtained. In order to increase the coding efficiency, it is indispensable to reduce the prediction error energy, and thus a prediction scheme that provides high prediction accuracy is required.
A great number of tools for increasing the accuracy of inter-frame prediction have been introduced into video coding standard schemes. For example, if there is occlusion in the nearest frame, the prediction error energy can be further reduced by referring to a frame that is distant in the time domain to some extent, and thus, in H.264/AVC, multiple frames can be referred to. This tool is called multiple reference frame prediction. In addition, in order to be able to deal with motions having complex shapes, a block size can be subdivided, such as 16×8, 8×16, 8×4, 4×8, and 4×4, in addition to 16×16 and 8×8. This tool is called variable block size prediction.
Similar to these, ½ accuracy pixels are interpolated from integer-accuracy pixels of a reference frame using a 6-tap filter, and then ¼ accuracy pixels are generated by linear interpolation using these pixels. Accordingly, it becomes possible to realize accurate prediction for motions of fractional accuracy. This tool is called ¼ pixel accuracy prediction.
In order to develop the next-generation video coding standard scheme that provides higher coding efficiency than that of H.264/AVC, International Organization for Standardization/International Electrotechnical Commission “Moving Picture Experts Group” (the international organization for standardization ISO/IEC “MPEG”) and International Telecommunication Union-Telecommunication Standardization Sector “Video Coding Experts Group” (ITU-T “VCEG”) collaboratively established an investigation team (Joint Collaborative Team for Video Coding: JCT-VC). The next-generation standard scheme is called high efficiency video coding: HEVC, various novel coding technologies are now gathering from all over the world, and they are under discussion in the JCT-VC meetings.
Among them, in particular, many proposals related to inter-frame prediction (motion compensation) have been presented, and reference software for HEVC (HEVC test Model: HM) employs tools for improving the prediction efficiency of motion vectors and tools for extending the block size to 16×16 or larger.
Moreover, tools for increasing the interpolation accuracy of fractional-accuracy pixels have also been proposed, and a DCT-based interpolation filter: DCT-IF, in which interpolation filter coefficients are derived from basis of discrete cosine transform (DCT) coefficients, is highly effective and it is adopted in HM. In order to further increase the interpolation accuracy, interpolation filters which adaptively change interpolation filter coefficients on a frame-by-frame basis are also proposed, which are called adaptive interpolation filters: AIFs. The adaptive interpolation filter is highly effective in terms of an improvement in the coding efficiency, and it is also adopted in reference software for the next-generation video coding (key technical area: KTA) that was developed under the leadership of VCEG. Because of a high contribution to an improvement in the coding efficiency, an improvement in the performance of interpolation filters is a very expectative domain.
Conventional interpolation filters will be described in greater detail.
[Fixed Interpolation]
FIG. 8 is a diagram illustrating an interpolation method of a fractional-accuracy pixel in H.264/AVC. In H.264/AVC, as shown in FIG. 8, when a ½ pixel position is interpolated, interpolation is performed using six integer pixels in total including three points on the left side of the interpolation target pixel and three points on the right side of the interpolation target pixel. With respect to the vertical direction, interpolation is performed using six integer pixels in total including three points on the upper side and three points on the lower side. Filter coefficients are [(1, −5, 20, 20, −5, 1)/32]. After ½ pixels positions have been interpolated, ¼ pixels positions are interpolated using a mean filter of [½, ½]. Since it is necessary to interpolate all the ½ pixels positions, the computational complexity is high, but high-performance interpolation is possible, so that the coding efficiency is improved. Non-Patent Document 1 and so on disclose the above interpolation technology using a fixed filter.
Filters which use the same coefficient values for all the input pictures and for all the frames, such as a one-dimensional 6-tap filter of H.264/AVC, are called fixed interpolation filters.
As a scheme for further improving the performance of an interpolation filter adopted in H.264/AVC, the reference software HM for HEVC adopts a DCT-based interpolation filter (DCT-IF). FIG. 9 illustrates an interpolation method of a fractional-accuracy pixel by the DCT-based interpolation filter. As shown in FIG. 9, it is assumed that p denotes an interpolation target pixel at a fractional-accuracy position, px denotes an integer position pixel, and α(0≦α≦1) denotes a parameter indicating the position of p between integer position pixels. At this time, it is assumed that the number of integer position pixels to be used for interpolation, i.e., a tap length, is 2M (M is an integer that is greater than or equal to 1). From the definitional equation of DCT transform, Equation (1) holds.
                    [                  Equation          ⁢                                          ⁢          1                ]                                                                      C          k                =                              1            M                    ⁢                                    ∑                              l                =                                                      -                    M                                    +                  1                                            M                        ⁢                                                  ⁢                                          p                ⁡                                  (                  l                  )                                            ⁢                              cos                ⁡                                  (                                                                                    (                                                                              2                            ⁢                                                                                                                  ⁢                            l                                                    -                          1                          +                                                      2                            ⁢                                                                                                                  ⁢                            M                                                                          )                                            ⁢                      k                      ⁢                                                                                          ⁢                      π                                                              4                      ⁢                                                                                          ⁢                      M                                                        )                                                                                        (        1        )            
Moreover, from the definitional equation of inverse DCT transform, Equation (2) holds.
                    [                  Equation          ⁢                                          ⁢          2                ]                                                                      p          ⁡                      (            x            )                          =                                            C              0                        2                    ⁢                                    ∑                              k                =                1                                                              2                  ⁢                  M                                -                1                                      ⁢                                                  ⁢                                          C                k                            ⁢                              cos                ⁡                                  (                                                                                                              π                          ⁡                                                      (                                                                                          2                                ⁢                                                                                                                                  ⁢                                x                                                            -                              1                              +                                                              2                                ⁢                                                                                                                                  ⁢                                M                                                                                      )                                                                          ⁢                        k                                            ⁢                                                                                                                                  4                      ⁢                                                                                          ⁢                      M                                                        )                                                                                        (        2        )            
When x is regarded as a position, an equation for interpolating a pixel at a fractional position α is represented by the following Equation (3).
                    [                  Equation          ⁢                                          ⁢          3                ]                                                                      p          ⁡                      (            α            )                          =                                            C              0                        2                    +                                    ∑                              k                =                1                                                              2                  ⁢                  M                                -                1                                      ⁢                                                  ⁢                                          C                k                            ⁢                              cos                ⁡                                  (                                                                                                              π                          ⁡                                                      (                                                                                          2                                ⁢                                                                                                                                  ⁢                                α                                                            -                              1                              +                                                              2                                ⁢                                                                                                                                  ⁢                                M                                                                                      )                                                                          ⁢                        k                                            ⁢                                                                                                                                  4                      ⁢                                                                                          ⁢                      M                                                        )                                                                                        (        3        )            
From Equation (3), it is possible to uniquely derive coefficients once the tap length 2M to be used for interpolation and the interpolation target position α are determined. Examples of an interpolation filter obtained from the above discussion are collected in Table 1 and Table 2. The details of the above are disclosed in Non-Patent Document 2.
TABLE 1Fractional Position αFilter Coefficient Values (6-Tap Filter, 2M = 6)− 1/12{−4, 19, 254, −19, 8, −2}  1/12{4, −16, 252, 22, −8, 2}⅙{6, −28, 242, 48, −17, 5}¼{9, −37, 227, 75, −25, 7} 2/6{11, −42, 208, 103, −33, 9}  5/12{12, −44, 184, 132, −39, 11}½{11, −43, 160, 160, −43, 11}  7/12{11, −39, 132, 184, −44, 12}⅔{9, −33, 103, 208, −42, 11}¾{7, −25, 75, 227, −37, 9}⅚{5, −17, 48, 242, −28, 6}
TABLE 2FractionalPosition αFilter Coefficient Values (12-Tap Filter, 2M = 12)− 1/12{1, −3, 5, −10, 22, 253, −19, 10, −6, 4, −2, 1}  1/12{−1, 3, −5, 9, −19, 253, 23, −10, 6, −4, 2, −1}⅙{−2, 5, −9, 16, −34, 244, 49, −21, 12, −7, 4, −1}¼{−1, 6, −12, 21, −43, 229, 75, −30, 17, −10, 5, −1} 2/6{−3, 8, −15, 26, −50, 211, 105, −40, 22, −13, 7, −2}  5/12{−3, 9, −16, 28, −53, 188, 134, −47, 26, −15, 8, −3}½{−2, 7, −15, 28, −52, 162, 162, −52, 28, −15, 7, −2}  7/12{−3, 8, −15, 26, −47, 134, 188, −53, 28, −16, 9, −3}⅔{−2, 7, −13, 22, −40, 105, 211, −50, 26, −15, 8, −3}¾{−1, 5, −10, 17, −30, 75, 229, −43, 21, −12, 6, −1}⅚{−1, 4, −7, 12, −21, 49, 244, −34, 16, −9, 5, −2}
DCT-based interpolation filters are capable of dealing with any filter length and any interpolation accuracy and they are high-performance interpolation filters, so that they are adopted in the test model HM for HEVC.
[Adaptive Interpolation]
In H.264/AVC, the values of filter coefficients are constant, irrespective of conditions of an input picture (the type of a sequence, the size of a picture, and a frame rate) and coding conditions (the block size, the structure of a group of pictures (GOP), and quantization parameters (QP)). When the values of the filter coefficients are fixed, for example, effects that vary over time, such as aliasing, a quantization error, an error resulting from motion estimation, and camera noise, are not taken into consideration. Therefore, it is considered that an improvement in the performance is limited in terms of the coding efficiency. Accordingly, Non-Patent Document 3 proposes a scheme of adaptively changing interpolation filter coefficients, which is called a non-separable adaptive interpolation filter.
In Non-Patent Document 3, a two-dimensional interpolation filter (6×6=36 filter coefficients in total) is assumed, and the filter coefficients are determined so as to minimize the prediction error energy. Although it is possible to realize higher coding efficiency than that obtained by a one-dimensional 6-tap fixed interpolation filter used in H.264/AVC, the computational complexity for determining filter coefficients is very high, and thus Non-Patent Document 4 introduces a proposal for reducing the computational complexity.
The technique introduced in Non-Patent Document 4 is called a separable adaptive interpolation filter (SAIF), and it uses a one-dimensional 6-tap interpolation filter rather than a two-dimensional interpolation filter.
FIG. 10A to FIG. 10C are diagrams illustrating a method for interpolating a fractional-accuracy pixel in the separable adaptive interpolation filter (SAIF). Its procedure is such that, first, as shown by step 1 in FIG. 10B, pixels in the horizontal direction (a, b, and c) are interpolated. Integer-accuracy pixels C1 to C6 are used for determining filter coefficients. Filter coefficients in the horizontal direction that minimize a prediction error energy function Eh2 of Equation (4) are analytically determined by the commonly known least square method (see Non-Patent Document 3).
                    [                  Equation          ⁢                                          ⁢          4                ]                                                                      E          h          2                =                              ∑                          x              ,              y                                                                      ⁢                                          ⁢                                    (                                                S                                      x                    ,                    y                                                  -                                                      ∑                                          c                      i                                                                                                                      ⁢                                                                          ⁢                                      w                                                                  c                        i                                            ·                                              P                                                                                                            x                              ~                                                        +                                                          c                              i                                                                                ,                                                      y                            ~                                                                                                                                                          )                        2                                              (        4        )            
Here, S denotes an original picture, P denotes an already decoded reference picture, and x and y respectively denote positions in the horizontal direction and the vertical direction in a picture. Moreover, ˜x (˜ is a symbol placed above x; the same is also applied to the others) satisfies ˜x=x+MVx−FilterOffset, where MVx denotes the horizontal component of a motion vector that has been obtained beforehand, and FilterOffset denotes an offset for adjustment (the value obtained by dividing a filter length in the horizontal direction by 2). With respect to the vertical direction, ˜y=y+MVy is satisfied, where MVy denotes the vertical component of the motion vector. wci denotes a group of filter coefficients in the horizontal direction ci (0≦ci<6) that is to be determined.
Linear equations the number of which is equal to the number of the filter coefficients determined by Equation (4) are obtained, and minimizing processes are performed for fractional-pixel positions in the horizontal direction independently of one another. Through the minimizing processes, three groups of 6-tap filter coefficients are determined, and fractional-accuracy pixels a, b, and c are interpolated using these filter coefficient groups.
After the interpolation of the pixels in the horizontal direction has been completed, as shown by step 2 in FIG. 10C, an interpolation process in the vertical direction is performed. Filter coefficients in the vertical direction are determined by solving a linear problem similar to that in the horizontal direction. Specifically, filter coefficients in the vertical direction that minimize a prediction error energy function Ev2 of Equation (5) are analytically determined.
                    [                  Equation          ⁢                                          ⁢          5                ]                                                                      E          v          2                =                              ∑                          x              ,              y                                                                      ⁢                                          ⁢                                    (                                                S                                      x                    ,                    y                                                  -                                                      ∑                                          c                      j                                                                                                                      ⁢                                                                          ⁢                                                            w                                              c                        j                                                              ·                                                                  P                        ^                                                                                              x                          ~                                                ,                                                                              y                            ~                                                    +                                                      c                            j                                                                                                                                                          )                        2                                              (        5        )            
Here, S denotes an original picture, ^P (^ is a symbol placed above P) denotes a picture which has been decoded and then interpolated in the horizontal direction, and x and y respectively denote positions in the horizontal direction and the vertical direction in a picture. Moreover, ˜x is represented as 4·(x+MVx), where MVx denotes the horizontal component of a motion vector that has been rounded off to the nearest whole number. With respect to the vertical direction, ˜y is represented as y+MVy−FilterOffset, where MVy denotes the vertical component of the motion vector, and FilterOffset denotes an offset for adjustment (the value obtained by dividing a filter length by 2). wcj denotes a group of filter coefficients in the vertical direction cj (0≦cj≦6) that is to be determined.
Minimizing processes are performed for fractional-accuracy pixels independently of one another, and 12 groups of 6-tap filter coefficients are obtained. The remaining fractional-accuracy pixels are interpolated using these filter coefficients.
From the above, it is necessary to encode 90 (=6×15) filter coefficients in total and transmit them to a decoding end. In particular, since the overhead becomes large in low resolution coding, filter coefficients to be transmitted are reduced using the symmetry of a filter. For example, as show in FIG. 10A, viewed from integer-accuracy pixels, b, h, i, j, and k are positioned at the centers with respect to interpolation directions, and with respect to the horizontal direction, coefficients obtained by inverting coefficients to be used for three points on the left side can be applied to three points on the right side. Similarly, with respect to the vertical direction, coefficients obtained by inverting coefficients to be used for three points on the upper side can be applied to three points on the lower side (c1=c6, c2=c5, and c3=c4).
Additionally, since the relationship between d and l is symmetric about h, inverted filter coefficients can be used. That is, by transmitting 6 coefficients for d, their values can be applied to l. c(d)1=c(l)6, c(d)2=c(l)5, c(d)3=c(l)4, c(d)4=c(l)3, c(d)5=c(l)2, and c(d)6=c(l)1 are satisfied. This symmetry is also used for e and m, f and n, and g and o. Although the same theory holds for a and c, since the result for the horizontal direction affects interpolation in the vertical direction, a and c are transmitted separately without using symmetry. As a result of the use of the symmetry described above, the number of filter coefficients to be transmitted for each frame is 51 (15 for the horizontal direction and 36 for the vertical direction).
In the above adaptive interpolation filter of Non-Patent Document 4, the processing unit of the minimization process of the prediction error energy is fixed to a frame. 51 filter coefficients are determined per one frame. If an encoding target frame is roughly divided into two types of texture regions A and B (or multiple types), the optimum filter coefficients are a group of coefficients in which both of them (all the textures) are taken into consideration. Under a situation in which characteristic filter coefficients are essentially obtained only in the vertical direction with respect to the region A and filter coefficients are obtained only in the horizontal direction with respect to the region B, filter coefficients are derived as the average of both of them.
Non-Patent Document 5 proposes a method for achieving a reduction in the prediction error energy and realizing an improvement in the coding efficiency by performing division into regions in accordance with the local property of a picture and generating interpolation filter coefficients for each divided region, without being limited to one group of filter coefficients (51 coefficients) per one frame.
Moreover, in order to improve the performance of the adaptive interpolation filter of Non-Patent Document 4, a technology of grouping interpolation positions, selecting a fixed interpolation filter or an adaptive interpolation filter for each group so that the prediction error energy can be reduced, and generating an interpolated picture is proposed (see Non-Patent Document 6).