The present invention relates to digital processing systems of video images, and in particular, to systems for decoding sequences of compressed pictures by motion prediction and motion compensation algorithms, and to a method of motion estimation.
The spatio-temporal recursive estimation was disclosed for the first time in the article G. de Haan, P. W. A. C. Biezen, H. Huijgen, O. A. Ojo, xe2x80x9cTrue motion estimation with 3-D recursive search block matchingxe2x80x9d, IEEE Trans. Circuits and Systems for Video Technology, Vol. 3, October 1993, pp. 368-379. Instead of carrying out several comparisons (matching errors), the vectors that guarantee the best spatial coherence of the vector field are singled out among vectors associated to the neighboring macroblocks of the current macroblock; such vectors are called xe2x80x9cspatial predictorsxe2x80x9d. Even xe2x80x9ctemporal predictorsxe2x80x9d can be used, that is vectors calculated for pairs of preceding pictures, in order to have a temporal coherent vector field.
Following this approach, a new algorithm has been realized for H.263 applications, described in European Patent Applications 97402763.3 and 98200461.6.
Basic Principles
In motion estimation algorithms of the block-matching kind, a displacement vector (often unproperly called motion vector) d(bc, t) is assigned to the middle
bc=(xc, yc)tr
of a block of pixels B(bc), of the current picture I(x,t), where tr stands for xe2x80x9ctransposedxe2x80x9d. This assignment is carried out if B(bc) has a good correlation, or matching, with a similar block inside a search area SA(bc), also centered on bc, but belonging to the preceding picture, I(x,txe2x88x92T), where T is time interval between the two pictures being coded, which corresponds to an integer multiple n of the period Tq=40 ms for PAL sequences or 30 ms for NTSC sequences. The center of such a similar block will be shifted in respect to bc by the motion vector d(bc, t).
A number of candidate vectors C are tested, measuring an error e(C, bc, t) to quantify similarities among the considered blocks, in order to find d(bc, t). A diagram illustrating such a procedure is depicted in FIG. 1.
Pixels inside the B(bc) block have the following positions:
(xcxe2x88x92X/2xe2x89xa6xxe2x89xa6xc+x2)
(ycY/2xe2x89xa6yxe2x89xa6yc+Y/2)
where X is the block length and Y is the block height (both equal to 16 for MPEG-1, MPEG-2 and H.263) and x=(x,y)tr is its spatial position on the picture.
Candidate vectors are selected within the set of candidates CS(bc, t) determined by:                               CS          ⁡                      (                                          b                c                            ,              t                        )                          =                  {                                                                      (                                                                                    d                        ⁡                                                  (                                                                                                                    b                                c                                                            -                                                              (                                                                                                                                            X                                                                                                                                                                                  Y                                                                                                                                      )                                                                                      ,                            t                                                    )                                                                    +                                                                        U                          1                                                ⁡                                                  (                                                      b                            c                                                    )                                                                                      ,                                                                                                                        (                                                                                    d                        ⁡                                                  (                                                                                                                    b                                c                                                            -                                                              (                                                                                                                                                                                    -                                        X                                                                                                                                                                                                                        Y                                                                                                                                      )                                                                                      ,                            t                                                    )                                                                    +                                                                        U                          2                                                ⁡                                                  (                                                      b                            c                                                    )                                                                                      ,                                                                                                                                            d                    ⁡                                          (                                                                                                    b                            c                                                    -                                                      (                                                                                                                            0                                                                                                                                                                                                                                        -                                      2                                                                        ⁢                                    Y                                                                                                                                                        )                                                                          ,                                                  t                          -                          T                                                                    )                                                        )                                                              }                                    (        1        )            
where the update vectors U1(bc) e U2(bc) are randomly selected from a set of updates US, defined by:
US(bc)=USi(bc)∪Usf(bc)
where integer updates USi (bc) are given by:                                           US            f                    ⁡                      (                          b              c                        )                          =                  {                                                                                                                (                                                                                                    0                                                                                                                                0                                                                                              )                                        ,                                    ⁢                                      xe2x80x83                                                                                                                                            (                                                                                            0                                                                                                                      1                                                                                      )                                    ,                                      (                                                                                            0                                                                                                                                                  -                            1                                                                                                                )                                    ,                                      (                                                                                            1                                                                                                                      0                                                                                      )                                    ,                                      (                                                                                                                        -                            1                                                                                                                                                0                                                                                      )                                    ,                                                                                                                          (                                                                                            0                                                                                                                      2                                                                                      )                                    ,                                      (                                                                                            0                                                                                                                                                  -                            2                                                                                                                )                                    ,                                      (                                                                                            2                                                                                                                      0                                                                                      )                                    ,                                      (                                                                                                                        -                            2                                                                                                                                                0                                                                                      )                                    ,                                                                                                                          (                                                                                            0                                                                                                                      3                                                                                      )                                    ,                                      (                                                                                            0                                                                                                                                                  -                            3                                                                                                                )                                    ,                                      (                                                                                            3                                                                                                                      0                                                                                      )                                    ,                                      (                                                                                                                        -                            3                                                                                                                                                0                                                                                      )                                                                                }                                    (        2        )            
Update vectors USf(bc), necessary to realize a xc2xd pixel accuracy, are given by:                                           US            f                    ⁡                      (                          b              c                        )                          =                  {                                    (                                                                    0                                                                                                              1                      2                                                                                  )                        ,                          (                                                                    0                                                                                                              -                                              1                        2                                                                                                        )                        ,                          (                                                                                          1                      2                                                                                                            0                                                              )                        ,                          (                                                                                          -                                              1                        2                                                                                                                                  0                                                              )                                }                                    (        3        )            
Both U1(bc) and U2(bc) contain the zero update vector       (                            0                                      0                      )    .
From these equations it may be said that the set of candidates is constituted by temporal and spatial predictors, taken from a 3-D set and from an update prediction vector. This procedure implicitly makes the vector field spatially and temporally consistent. The updating process includes adding updates from time to time to one of the two spatial predictors. The positions of the spatial predictors and spatio-temporal predictors in respect to the current block are show in FIG. 2.
The displacement vector d(bc, t) resulting from the block-matching process, is a candidate vector C that produces the lowest value of the following error function e(C,bc, t):
d(bc,t)={Cxcex5CS|e(C,b"ugr"t)xe2x89xa6e(V,b"ugr"t)) ∀(Vxcex5CS(bc,t))}xe2x80x83xe2x80x83(4)
The error function is based on the difference of luminance values of the current block of the current picture I(x,t) and of those of the block shifted from the picture I(xxe2x88x92C, txe2x88x92T), summed over the block B(bc). A typical choice, which is also adopted here, is that of the sum of the absolute differences (SAD). The error function is then given by:                                                                         e                ⁡                                  (                                      C                    ,                                          b                      c                                        ,                    t                                    )                                            =                              SAD                ⁡                                  (                                      C                    ,                                          b                      c                                        ,                    t                                    )                                                                                                        =                                                ∑                                      x                    ∈                                          B                      ⁡                                              (                                                  b                          c                                                )                                                                                            ⁢                                  xe2x80x83                                ⁢                                  "LeftBracketingBar"                                                            I                      ⁡                                              (                                                  x                          ,                          t                                                )                                                              -                                          I                      ⁡                                              (                                                                              x                            -                            C                                                    ,                                                      t                            -                            T                                                                          )                                                                              "RightBracketingBar"                                                                                        (        5        )            
It could be possible to adapt the updates distribution to the calculated errors, as done in SLIMPEG, such that the lower the errors are the more concentrated the distribution is, while the greater the errors are the more dispersed the distribution is. Nevertheless such a distribution strongly depends on the sequences used to calculate them. In the H.263, the bit-rate and above all the frame-rate is strongly variable (i.e. the frame rate in MPEG-2 is constant, while in H.263 is not) therefore it would be very difficult to define appropriate xe2x80x9ctrainingxe2x80x9d sequences.
Iterative Estimation
In order to further improve the consistency of the vector field, the motion estimation is iterated m times (m=4 at most) on the same pair of pictures, by using the vectors calculated in the preceding iteration as temporal predictors of the current iteration. During the first and the third iteration, pictures are scanned in video scan-raster mode, that is from top to bottom and from left to right (as they are usually displayed on a TV screen). On the contrary, during the second and fourth iterations, pictures are scanned in the opposite direction, from bottom to top and from right to left. This is made possible by storing the pictures in a SDRAM (in fact if they were stored in a FIFO, only the scan-raster scanning would be possible).
Candidates vectors are selected from a new set of candidates CSxe2x80x2 (bc, t), which is defined by:             CS      xe2x80x2        ⁡          (                        b          c                ,        t            )        =      {                                        (                                                            d                  ⁡                                      (                                                                                            b                          c                                                -                                                  (                                                                                                                    X                                                                                                                                                                                                                                                                (                                                                              -                                        1                                                                            )                                                                                                              i                                      +                                      1                                                                                                        ⁢                                  Y                                                                                                                                              )                                                                    ,                      t                                        )                                                  +                                                      U                    1                                    ⁡                                      (                                          b                      c                                        )                                                              ,                                                                        (                                                            d                  ⁡                                      (                                                                                            b                          c                                                -                                                  (                                                                                                                                                      -                                  X                                                                                                                                                                                                                                                                                                (                                                                              -                                        1                                                                            )                                                                                                              i                                      +                                      1                                                                                                        ⁢                                  Y                                                                                                                                              )                                                                    ,                      t                                        )                                                  +                                                      U                    2                                    ⁡                                      (                                          b                      c                                        )                                                              ,                                                                        (                          d              ⁡                              (                                                                            b                      c                                        -                                          (                                                                                                    0                                                                                                                                                                                                                                (                                                                      -                                    1                                                                    )                                                                i                                                            ⁢                              2                              ⁢                              Y                                                                                                                          )                                                        ,                                      t                    -                                          T                      /                      i                                                                      )                                                          }  
where i is the number of the current iteration and d(bc, txe2x88x92T/i) is the d in the preceding iteration (ixe2x88x921), or, for each first iteration on a new pair of pictures, the last iteration on the preceding pair. The computing complexity of this estimator is practically constant when the frame rate changes, in fact the number of iterations for each pair of pictures varies according to the length of the time interval among said pictures. The case n3xe2x89xa75 is not considered because the controller of the bit-rate is supposed to be unlikely to decide jumping by 4 consecutive pictures (but it could happen).
Macroblocks Undersampling
The computing complexity of the motion estimation could be decreased, by halving the number of macroblocks on which vectors are calculated with the technique described in the article G. de Haan, P. W. A. C. Biezen, xe2x80x9cSub-pixel motion estimation with 3-D recursive block-matchingxe2x80x9d, Signal Processing: Image Communication 6 (1995), pp. 485-498. The grid of undersampled macroblocks has typically a quincunx shape pattern (as depicted in FIG. 3).
If the vector dm=d(bc, t) is missing, it can be calculated from horizontally adjacent vectors da, according to the following table:
Dm=median (dl, dr, dav)xe2x80x83xe2x80x83(6)
where             d      l        =                  d        a            ⁡              (                                            b              c                        -                          (                                                                    X                                                                                        0                                                              )                                ,          t                )                        d      r        =                  d        a            ⁡              (                                            b              c                        +                          (                                                                    X                                                                                        0                                                              )                                ,          t                )            
dav={fraction (1/2)}(dt+db)
and             d      t        =                  d        a            ⁡              (                                            b              c                        -                          (                                                                    X                                                                                        0                                                              )                                ,          t                )                        d      b        =                  d        a            ⁡              (                                            b              c                        -                          (                                                                    X                                                                                        0                                                              )                                ,          t                )            
The median interpolating filter acts separately on both the horizontal and vertical components of adjacent vectors. The undersampling grid is changed from one iteration to the other to calculate in a finer manner the vectors that in the preceding interpolation where interpolated.
The H.263 Standard
The H.263 standard is very similar to the older MPEG2 standard: an usual hybrid video coding DPCM/DCT is present, with temporal prediction. Nevertheless H.263 presents few novelties: the three Optional Mode known as APM, UMV, PB-frames, which allow a motion estimation even on 8xc3x978 blocks instead of 16xc3x9716 blocks, and a more sophisticated motion compensated interpolation. In the present context, an application of the H.263, that is the motion estimation and the temporal prediction for P-frames, is considered.
The reference H.263 coder, known as TMN5 and produced by Telenor, has a xe2x80x9cfull searchxe2x80x9d block matching estimator, with a 30xc3x9730 pixels search window centered on the corresponding macroblock of the preceding frame. Around this macroblock the full search carries out a spiral search. This is a very burdensome estimation technique because the calculation of the SAD must be carried out for each position among 30xc3x9730 possible ones. In real time applications such as videophone and videoconference for which the H.263 standard was developed, a reduction the computing complexity is very important even accounting for an acceptable decrease of the quality of images.
An object of the present invention is to strongly reduce the computing complexity, and retain a substantially undegraded quality of the pictures with the same compression.
It has been found that a non-negligible reduction of the computing complexity may be achieved by slightly increasing the bit-rate because of an increase of the prediction error, while benefitting from a shorter processing delay.
A method of motion estimation in an encoding system based on the prediction of motion compensated pictures, comprises, as it is well known, identifying the best predictor macroblock or predictor vector among pre-established prediction candidate macroblocks chosen among those that are spatially distributed around or near the macroblock under estimation on the same frame and that precede it in the order of scanning and pre-established predictor candidate macroblocks chosen among those that are spatially/temporally distributed around or near the macroblock under estimation and that follow it in the scanning order on picture frames that immediately temporally precede the frame of the macroblock under estimation.
Through a comparison algorithm of the values of pixels of a first candidate predictor macroblock having pixels in homologous positions to those of a reference macroblock of identical position on the frame to that of the macroblock being estimated on a reference frame of the present sequence of picture frames, a pre-established cost function is evaluated for each comparison. The best predictor is the one producing the minimum value of cost function. The algorithm may even comprise the summing to each predictor candidate of an update vector of smaller dimensions than the macroblocks, chosen among a plurality of pre-established update vectors, for accelerating the convergence process of the algorithm.
A method aspect of the invention includes comparing predictor candidate macroblocks and comparing, with the reference macroblock, and in calculating the cost function, only predictor macroblocks having different components among each other. According to a preferred embodiment, the method further contemplates that the calculation of the cost function for each predictor candidate macroblock is carried out on a reduced number of pixels of the macroblock, chosen according to a certain checkerboard undersampling scheme down to xc2xc the number of pixels.
A further reduction of the computing complexity is obtained by carrying out motion estimation on a reduced number of macroblocks of each frame, chosen according to a first checkerboard selection scheme and to a second checkerboard selection scheme of macroblocks of temporally consecutive frames of the sequence, and by operating a median filtering for determining a deselected macroblock of the current frame as a median value among horizontally adjacent macroblocks and the average of macroblocks vertically adjacent to the deselected macroblock.