The present invention relates to digital video processing. It is applicable, in particular, in the field of super-resolution video processing. Super-resolution video processing methods are used in various applications including super-resolution interpolation (such as frame-rate conversion, super-resolution video scaling and deinterlacing) and reduction of compression artifacts and/or noise.
In digital systems, a video sequence is typically represented as an array of pixel values It(x) where t is an integer time index, and x is a 2-dimensional integer index (x1, x2) representing the position of a pixel in the image. The pixel values can for example be single numbers (e.g. gray scale values), or triplets representing color coordinates in a color space (such as RGB, YUV, YCbCr, etc.).
Super-resolution video processing methods consist in computing new pixel values (for interpolation) or new values of existing pixels (for noise reduction) by combining pixel values of several adjacent video frames in time.
WO 2007/115583 A1 discloses a super-resolution video processing method which exhibits very few artifacts. The method consists in selecting for each new pixel to be calculated an interpolator best suited for computing that pixel. For certain particular sequences, however, it may be necessary to enhance the method by increasing the total number of interpolators considered. The quality is increased but at the cost of a higher complexity.
In video interpolation applications, known techniques are motion adaptive or motion compensated.
Motion-adaptive video deinterlacing only provides full resolution deinterlaced frames when the video is not moving. Otherwise, the deinterlaced frames exhibit jagged contours or lower resolution textures, and flicker. An example of an advanced motion adaptive technique is described in U.S. Pat. No. 5,428,398.
Motion-compensated techniques are known to reach better quality levels, at the expense of being less robust and displaying in some cases substantially worse artifacts than motion-adaptive techniques. This happens in particular at locations of the video where motion estimation does not work well, like occlusions, transparent objects, or shadows. An example of a motion-compensated deinterlacing technique is described in U.S. Pat. No. 6,940,557.
A standard way to perform frame-rate conversion includes estimating motion estimation between two frames to compute a dense motion field, and computing new frames with motion-compensated interpolation. For the same reasons as above, frame-rate conversion based on such steps has a number of drawbacks. Dense motion estimation fails on periodic patterns, on contours or on flat areas.
A popular technique for motion estimation is referred to as “block matching”. In the block matching technique, estimating the motion at x and t consists in minimizing a matching energy Ex(v) over a window W which is a set of offsets d=(d1, d2). A possible form of the matching energy (L1-energy) is
            E      x        ⁡          (      v      )        =            ∑              d        ∈        W              ⁢                                                              I              t                        ⁡                          (                              x                +                d                            )                                -                                    I                              t                +                1                                      ⁡                          (                              x                +                d                +                v                            )                                                  .      Another form frequently used is the L2-energy or Euclidean distance:
            E      x        ⁡          (      v      )        =            ∑              d        ∈        W              ⁢                                                                                I                t                            ⁡                              (                                  x                  +                  d                                )                                      -                                          I                                  t                  +                  1                                            ⁡                              (                                  x                  +                  d                  +                  v                                )                                                              2            .      
Block matching is well suited for motion compensation in video compression schemes such as MPEG, which make use of block-based transforms. If the matching algorithm matches two windows of images that are similar, but do not represent the same object (e.g. matching the first ‘e’ with the second ‘e’ in an image of the word “sweet”), compression efficiency is not impaired. However, when doing video interpolation, matching groups of pixels which do not actually correspond to the same object leads to interpolation artifacts, because the interpolated pixels will reflect an “incorrect motion” due to spatial correlation in the objects appearing in the images.
Block matching methods are computationally intensive, in proportion to the number of possible displacements that are actually considered for each pixel. In video compression again, “fast” block matching strategies consist in limiting the range of possible displacements using predetermined motion subsets. This is not acceptable in video interpolation where using a displacement vector that is too inaccurate leads to blurry interpolated images or to artifacts.
To circumvent these problems in motion estimation, several methods have been developed. A first set of methods impose a smoothness constraint on the motion field, i.e. by imposing that for pixels that close one to another, the corresponding motion vectors are close. This can be achieved with multiscale motion estimation, or recursive block matching. Another type of method designed to solve this issue is phase correlation.
U.S. Pat. No. 5,742,710 discloses an approach based on multiscale block-matching. In the 2-scale case, block matching is performed between copies of It and It+1 that have been reduced in size by a factor of 2 in each dimension (i.e. four times less pixels) and the resulting displacement map is then refined to obtain a resolution twice finer. The refinement process is a search of limited range around the coarse scale results. As a result, the cost of the displacement search is reduced because full range searches are done only on smaller images. The resulting displacement field is also smoother because it is a refinement of a low resolution map. However, the motion in a scene cannot be accurately accounted for by a smooth displacement map: the motion field is inherently discontinuous, in particular around object occlusions. Enforcing a displacement map smoothness constraint is not an appropriate way to address the robustness issue.
Another method to handle in a similar way this problem is recursive block matching as disclosed in “True-Motion with 3D Recursive Search Block Matching”, G. De Haan et al., IEEE Transactions on Circuits and Systems for Video Technology, Vol. 3, No. 5, October 1993, pp. 368-379. This method significantly reduces the cost of computing a motion map, but it can still be misled by periodic patterns or even occlusions.
GB-A-2 188 510 discloses a so-called phase correlation method in which a displacement energy map is computed over a large image window for a set of candidate displacements. This map can be computed efficiently using fast Fourier transform. A subset of displacements corresponding to peak values in the energy map is determined as including the most representative displacements over this window. Then block matching is performed as a second step pixelwise considering only this subset of displacements.
This method reduces the complexity of motion estimation, and is also able to detect discontinuous motion maps. With the phase correlation technique, the motion map is also regularized and constrained, but in a way very different from spatial regularization. Instead of imposing a local smoothness of the motion map, phase correlation limits to a fixed number the set of different possible vectors in a motion map.
However, phase correlation still requires relatively complex computations based on 2-dimensional fast Fourier transforms that are expensive to implement in hardware. Also, the method selects motion vectors on the basis of individual merit that is assessed with their phase correlation. So it has a limited ability to provide a minimal set of motion vectors. Indeed, when a moving pattern has a periodic structure or is translation-invariant, several vectors have comparable merit values, and phase correlation is not able to arbitrate between them. The resulting motion-compensated video interpolation process is thus of suboptimal robustness. This has also a cost in terms of complexity because for all pixels, more candidate motion vectors are considered than necessary.
Other classes of approaches include selecting a first subset of displacements by computing low-complexity matching energies on candidate vectors. This can reduce the computational complexity to some extent, but it is not an appropriate way to make the motion-compensated interpolation more reliable.
Classical and still popular methods for noise reduction in video sequences include motion-compensated recursive or non-recursive temporal filtering. See, e.g., “Noise reduction in Image Sequences Using Motion-Compensated Temporal Filtering”, E. Dubois and S. Sabri, IEEE Transactions on Communications, Vol. COM-32, No. 7, July 1984, pp. 826-832. This consists in estimating motion between a frame and a preceding frame, and filtering the video sequence along the estimated motion with a temporal filter.
Other known methods use motion-compensated 3D wavelet transforms. See “Three-Dimensional Embedded Subband Coding with Optimized Truncation (3D-ESCOT)”, Xu, et al., Applied and Computational Harmonic Analysis, Vol. 10, 2001, pp. 290-315. The motion-compensated 3D wavelet transform described in this paper can be used for noise reduction, by performing a wavelet thresholding on this 3D transform. The limitation of such an approach using lifting-based wavelet transform along motion threads is its very high sensitivity to the corruption of the motion map by noise.
WO 2007/059795 A1 describes a super-resolution processing method that can be used for long-range noise reduction or super-resolution scaling. The method is based on a bandlet transform using multiscale grouping of wavelet coefficients. This representation is much more appropriate for noise reduction or super-resolution scaling than the 3D transform described in the 3D-ESCOT paper. The multiscale grouping performs a variable range image registration that can be computed for example with block matching or any state of the art image registration process. For both super-resolution scaling and noise reduction, it is important that the image registration map used is not corrupted by noise or by aliasing artifacts.
Whatever the application (interpolation or noise reduction), using a motion-compensated approach with a dense flow field has limitations: aperture, irrelevance of a single motion model for contents with transparent objects or shadows. Analyzing the local invariance structure of video by detecting at each pixel one or more directions of regularity of the video signal in space and time, as described in WO 2007/115583 A1 provides a more general and robust way to do video interpolation. There is thus a need for a technique which makes it possible to detect such directions in an efficient way and with enhanced robustness.
An object of the present invention is to propose a method useful for detecting directions of regularity in an input video stream with high accuracy and high robustness. In particular, in super-resolution video interpolation, it is desired to avoid artifacts usually caused by incoherent interpolation directions. In video noise reduction, it is desired to select averaging directions that are not corrupted by noise.
Another object is to reduce substantially the implementation complexity of the super-resolution interpolation or noise reduction processing.