The present invention relates to digital video processing, and more particularly to the analysis of a video sequence for motion estimation. Motion estimation looks for temporal correlations in a video sequence to associate directions of regularity to pixels of a frame according to the apparent movement of objects in the video sequence. It can be useful, for example, in the field of super-resolution video processing. Super-resolution video processing methods are used in various applications including super-resolution interpolation (such as frame-rate conversion, super-resolution video scaling and deinterlacing) and reduction of compression artifacts and/or noise.
In digital systems, a video sequence is typically represented as an array of pixel values It(x) where t is an integer time index, and x is a 2-dimensional integer index (x1, x2) representing the position of a pixel in the image. The pixel values can for example be single numbers (e.g. gray scale values), or triplets representing color coordinates in a color space (such as RGB, YUV, YCbCr, etc.).
In the block matching technique, estimating the motion at a pixel x=(x1, x2) and at time t+α (0≦α≦1) typically consists in identifying the displacement v=(v1, v2), also referred to as motion vector, or direction of regularity, which minimizes a matching energy Ex,t+α(v) over a spatial window W which is a set of offsets d=(d1, d2). The form of the matching energy can be:
                                          E                          x              ,                              t                +                α                                              ⁡                      (            v            )                          =                              ∑                          d              ∈              W                                ⁢                                                    g                                                      x                    +                    d                                    ,                                      t                    +                    α                                                              ⁡                              (                v                )                                      ⁢                                                  ⁢            where                                              (        1        )                                                      g                                          x                +                d                            ,                              t                +                α                                              ⁡                      (            v            )                          =                  f          ⁡                      [                                                            l                  t                                ⁡                                  (                                      x                    +                    d                    -                                          α                      ·                      v                                                        )                                            -                                                l                                      t                    +                    1                                                  ⁡                                  (                                      x                    +                    d                    +                                                                  (                                                  1                          -                          α                                                )                                            ·                      v                                                        )                                                      ]                                              (        2        )            and f is some measure function. For example, the so-called L1-energy is defined by f[z]=|z| and the so-called L2-energy, or Euclidean distance, is defined by f[z]=z2. In the optimization process, the displacements v=(v1, v2) are selected from a limited set Ω of candidate displacements in order to reduce the computation load.
The estimation of temporal correlation in a video sequence requires regularization by doing assumptions on the spatial regularity of the motion field. It is assumed for example that the vector field is uniform over a spatial window. Such assumption is valid where the motion is uniform, but not in zones where motion transitions take place, i.e. close to occlusion/disocclusion boundaries. In such cases, a naive spatial regularization produces inaccurate estimates yielding artifacts such as halo.
FIG. 1 shows part of an image with a foreground object F and a background object B having an occlusion boundary Z between them. For example, object F moves towards object B so that part of object B visible in frame t is covered by part of object F in frame t+1. Similar considerations apply to disocclusions, in which part of a background object masked by a foreground object in frame t is uncovered in frame t+1. Using a spatial window W, a displacement can be easily identified at pixels a and b shown in FIG. 1 because the window W centered on either a or b is fully contained in one of the moving objects and the speed of the objects are generally uniform over such window. However a pixel such as c, i.e. close to the occlusion boundary Z, may give rise to ambiguities because the motion is not uniform over the window W centered on c.
In the design of a motion estimation system, there is a tradeoff in the choice of the size of the spatial windows. With smaller windows (like window W′ shown around pixel c in FIG. 1), the areas where motion discontinuities have a negative impact on the quality of the estimation are smaller. However smaller windows are not without problems since they provide less stable measurements and can thus introduce noise.
There is a need for a technique that would combine the robustness of motion estimation obtained with large spatial windows and the improved estimations made with smaller windows near occlusion boundaries.