The present invention relates to the digital processing of moving images, and more particularly to techniques for estimating motion between successive images of a sequence.
Most video coding schemes (in particular MPEG-1,2,4 and ITU-T H26x) use a representation of the motion with the aid of translations over a blockwise partitioning of the images. This motion model generates numerous problems. It is in large part the source of the block effect often visible on decoding with the current video coding schemes, and it offers a representation model that is not well suited to certain types of motion (zooms, rotations, etc.).
Other modes of motion representation have been proposed so as to alleviate these defects. Among these modes, it is possible to point out “active meshes”. In this mode of representation, the motion is represented by means of a set of values defined on the nodes of a mesh positioned on an image. An interpolation technique is used to deduce on the basis of the values stored at the nodes of this mesh, a motion vector at any point of the image. Typically, this may involve a Lagrange type interpolation, that is to say the motion vector assigned to a point of the image is an affine function of the vectors calculated for the neighboring nodes.
It is thus possible to substitute the motion compensation mode of a video coder of MPEG or other type by a mesh-based motion compensation mode. It is also possible to use the meshes to decorrelate the motion and texture information of a video sequence so as to achieve a coding scheme of analysis-synthesis type.
These active meshes offer at one and the same time richer motion models and the possibility of improved coding effectiveness by virtue of a more effective coding of the motion information, in particular when hierarchical meshes are used (see for example WO 00/14969).
The deformable meshes define a continuous representation of a motion field, while the real motion of a video sequence is generally discontinuous in nature. Thus, when various planes and objects overlap in a scene, occultation and exposure zones appear, generating discontinuity lines.
Modeling of such artefacts by a global mesh, as opposed to the meshes segmented according to the constituent video objects making up the scene, constitutes a difficulty which cannot be solved without modification of the representation model. The issue is to eliminate this visual degradation and limit it in analysis terms, by determining the zones of discontinuity.
Conventionally, this type of disturbance of the real motion field leads to mesh cell inversions in its meshed representation.
A post-processing technique can be implemented to solve this problem. One of these techniques proceeds by a posteriori correction, and consists in applying the motion vectors such as the calculation produces them, in detecting those which are defective then in correcting their value. Another of these techniques proceeds iteratively, by adding a part of the anticipated displacement to nodes at each iteration in such a way that there is no inversion, and by continuing the iterations until the process converges.
The post-processing techniques act once the motion estimation has been carried out. Accordingly, the result is sub-optimal since the motion vectors are corrected independently of their contribution to the minimization of the prediction error.
An improvement consists in optimizing the motion field by taking into account non-inversion constraints in the optimization process. For this purpose, the motion estimation is adapted by adding to the quadratic prediction error an augmented Lagrangian making it possible to correct the deformation of the mesh cells when their area approximates zero. The latter technique actually makes it possible to determine the optimal solution, but on condition that the latter represents a continuous field. However, the nature of a video sequence is usually discontinuous.
Another technique, introduced in WO 01/43446, consists in identifying the discontinuity zones so as to restore them, by monitoring the appearance or disappearance of objects. A first motion estimation is performed between two successive instants t1, and t2 without preventing mesh cell inversions. By pinpointing the inversions on completion of this first calculation with the aid of geometric criteria, the discontinuity zones are detected. The process then consists in effecting a new motion estimation between t1 and t2, while excluding from the optimization criterion the contributions of the defective zones, containing at least one inversion, so as to minimize the prediction error between the two images considered. This reoptimization makes it possible to determine the optimal motion vectors for the continuous zone (admitting a bijection between t1 and t2) and thus to avoid the disturbance of the motion vector values obtained in the preceding optimization, generated by the discontinuity zones. The defective zones form the subject of a frequency or spatial approximation with image compression, and they are excluded from the method of optimization by tracking of video objects.
The various known techniques endeavor to render a discontinuous motion field continuous, by imposing a motion calculated on the basis of the continuous zones in the discontinuous zones. This results in a false motion and a poor temporal prediction of the texture in the discontinuous zones, and therefore a coding cost-overhead.
The technique which is aimed at excluding the discontinuous zones does not impose any motion in these zones and codes them differently. However, in the case of a significant number of discontinuous zones, there are as many zones to be coded differently, involving a cost overhead in coding the headers of these streams. Moreover, within the framework of a scalable coding this technique is relatively expensive.
An object of the invention is to estimate the motion of a video sequence with the aid of a 2D mesh and to represent this motion in a discontinuous manner so as to best represent the real motion field.