The concept of motion between two images of a sequence is well known in the scientific and technological community working on image processing. It is defined as the 2D vector characterizing the difference of positions between a pixel of one image and the homologous pixel in the other image, it being assumed that these pixels correspond to the same physical point in each of the two images. The assignment of a shift vector to each pixel in the image therefore defines a dense motion field in the image, as shown in FIG. 1. The set 10 of the N*M vectors obtained for a size N*M image is called the field of motion, or optical stream, between two images 11 and 12 respectively corresponding to the instants t and t+1.
The estimation of the motion between two images (the computation of the field of shift or of disparities between two images) can be applied in all kinds of imaging and image processing. The following can be mentioned as non-restrictive examples:                video encoding: the motion field is used to predict an image from previously decoded images. The mesh representation of such a field is defined especially in different standards such as the MPEG-4 standard.        medical imaging: the analysis of the motion of the human body, the heart, etc.        the tracking of objects (for example in road traffic control).        
The invention can be applied especially to the processing of 2D images, but also to the processing of images representing multidimensional objects (especially 3D objects).
At present, there are several techniques for estimating motion in image sequences.
In a first technique, a dense field of motion is computed for the image, and a shift vector is applied to each pixel.
A second technique is aimed at computing one shift vector per rectangular block of the image.
A third technique is based on the computation of a mathematical model of the motion by region, possibly in arbitrary form, of the image. Such a technique therefore also implements a segmentation of the image into regions, corresponding for example to the different video objects constituting the illustrated scene.
A last technique, described for example in the patent application FR 2 783 123 entitled “Procédé d'estimation du mouvement entre deux images” (Method for the estimation of motion between two images), consists in computing a field defined by a model of finite elements. A mesh is associated with the image, and an interpolation or extrapolation function is used to compute the value of the field at each pixel of the image, within each of the mesh points, as shown in FIG. 2.
This method, as it happens, is based on the implementation of a differential method, which determines the motion parameters by optimization of a mathematical criterion of quality (for example a quadratic error between the image and its predicted value by compensation for the motion). Such an optimization is achieved by means of a method of differential mathematical optimization, in the case of criteria possessing the necessary mathematical properties (for example, a derivable criterion).
Thus, for example, a triangular mesh is associated with the image 21. The value of the motion vectors of the vertices referenced 22, 23 and 24 of the mesh is optimized. Then an interpolation (or an extrapolation) is made in order to determine the motion vectors of the pixels that do not correspond to vertices of the mesh, such as the points referenced 25 and 26 for example.
A motion estimation method of this kind, relying on the implementation of a hierarchical mesh representing the image to be encoded, is especially suited to the gradual transmission of the data, and therefore represents an advantageous solution to the problems of bandwidth saturation and of adaptability to communications networks of various kinds and capacities.
However, such a method comes up again certain difficulties during motion tracking.
Thus, one drawback of this prior art technique is that it gives rise to the shifting of the mesh points, leading to either crowding or uncovering in certain regions of the image, as shown in FIG. 5. Indeed, the deformable meshes define a continuous representation of a motion field (that is, the mesh follows the objects moving in the scene), while the real motion of a video sequence is discontinuous in nature. Consequently, when an object shifts for example from left to right in the image 51, the left-hand region 52, uncovered by the shifting of the object, represents a new piece of information, and is therefore no longer meshed. The right-hand region 53 for its part takes a crowding of mesh points. Similarly, when different planes and objects overlap in a scene, regions of concealment appear, generating lines of discontinuity.
Now, apart from the information on motion, the vertices of the mesh are also carriers of photometric and/or colorimetric information used to generate a photometric and/or colorimetric field, and therefore to achieve an approximation of the image. The appearance of non-meshed uncovering regions therefore corresponds to the entry, into the image, of regions whose photometry and/or colorimetry cannot be approximated; in other words, these are black regions.
The solutions generally proposed to compensate for the appearance of non-meshed uncovering regions, or crowding regions, characterized by a redundancy of information, consist in forcing the nodes of the edge of the image (for example the node referenced 511) to remain still.
One drawback of this prior art technique is that the approximation of the entering regions is of mediocre quality, owing to the stretching of the associated mesh points.
Another drawback of this prior art technique is that the motion estimation is biased by the constraint of mobility imposed on the nodes of the edges of the image.
This prior art technique also has the drawback of giving rise to an excessively costly over-approximation of the regions going out of the image, corresponding to the mesh point crowding regions. Indeed, in the crowding regions, an over-representation of the motion field is obtained. This is because an excessively great number of mesh points is used to obtain the approximation of a limited portion of the motion field. Such an over-representation does not harm the quality of approximation of the motion field, but gives rise to extra transmission costs.
In another solution envisaged to compensate for the appearance of uncovering regions or crowded regions of the mesh, new pixels are inserted in the black regions and a constrained Delaunay mesh is constructed.
One drawback of this prior art technique is that it is costly in terms of transmission bit rate and/or of information storage.
Another problem related to the disturbances of the motion field is the appearance of reversal of the mesh points of the hierarchical mesh representing the image, as shown in FIG. 3, when certain mesh points 31, 32 shift with respect to one another along antagonistic motion vectors 33, 34.
Several methods have been envisaged to overcome such mesh point reversals.
Placing constraints on the different mesh points has been considered, so as to prohibit the reversal phenomena.
The performance of a post-processing operation on the mesh has also been considered. This could be done into distinct modes. In a first embodiment of a post-processing operation, the motion vectors as estimated are applied to the different nodes of the mesh. Then the motion vectors that have led to the appearance of defects in the mesh are detected, and finally their value is corrected, so as to compensate for the mesh reversal phenomena.
A second embodiment of a post-processing operation consists of an iterative method: at each iteration, a part of the estimated shift is applied to each of the nodes of the mesh, so as not to generate any mesh point reversal. The iterations are then repeated until a convergence of the method is obtained.
However, since the post-processing methods act after the motion vectors of the different nodes of the mesh had been estimated, they do not provide for optimum management of the mesh point reversals. Indeed, in the post-processing methods, the motion vectors are corrected independently of their contribution to the minimizing of the prediction error (for example the minimizing of the quadratic error between the image and its predicted value by motion compensation).
The technique described in the French patent application FR 99 15568, “Procédé d'estimation de mouvement entre deux images avec gestion de retournements de mailles et procédé de décodage correspondant” (Method for the estimation of motion between two images with management of mesh point reversal and corresponding method of decoding) proposes a solution to the problem of reversal generated by the motion estimator. This solution relies on the implementation of a hierarchical mesh.
Such a technique consists in carrying out an initial optimization of the motion vectors of the mesh, in letting the motion estimator create reversals if any between two successive instants t1 and t2, so as to detect the regions of discontinuity thus generated. The method then consists in making a further motion estimation between the instants t1 and t2, in excluding the defective regions (namely the regions containing at least one mesh point reversal), in order to minimize the prediction error between the two images corresponding to the instants t1 and t2 considered.
This new estimation is used to determine the optimal motion vectors for the continual region of the image (that is, assuming a bijection between t1 and t2) and thus prevent the values of the motion vectors obtained during the initial optimization from being disturbed by the existence of regions of discontinuity. The defective regions are then approximated by a frequency method or spatial method, when the method is applied to image compression for example, and these regions are definitively excluded from the optimization method, when the technique is applied to the tracking of video objects for example.
One drawback of this prior art technique is that it cannot be used to manage the appearance of mesh point uncovering regions or mesh point crowding regions during a translation of an object within the image, as described here above.
It is a goal of the invention in particular to overcome these drawbacks of the prior art.
More specifically, it is a goal of the invention to provide a technique for the encoding of images represented by means of a mesh, enabling an estimation of and a compensation for the motion within the image.
It is another goal of the invention to implement an image encoding technique by which it is possible to obtain a good approximation of the photometric and/or colorimetric surface of the image, especially for the highly textured zones.
It is yet another goal of the invention to provide a simple and robust technique for the encoding of images.
It is also a goal of the invention to implement an image encoding technique adapted to all the fields in which the motion is estimated by means of meshes, and especially to the field of video encoding (such as video encoding according to the MPEG4 and H263+ standards for example).
It is another goal of the invention to provide an image encoding technique with reduced cost of information transmission.
It is yet another goal of the invention to implement an image encoding technique providing for the visual fluidity of the shifting of the image.
It is yet another goal of the invention to provide an image encoding technique enabling the efficient management of the appearance of uncovering zones and/or crowding zones during the shifting of the constituent objects of the image.
It is also a goal of the invention to implement an image encoding technique making it possible to manage the phenomena of reversal and concealment of mesh points.
It is another goal of the invention to provide an image encoding technique making it possible to reduce the transmission and/or the storage of the motion vectors for the mesh point crowding regions.
It is yet another goal of the invention to implement an image encoding technique making it possible to ensure the constancy of the ratio between the information transmission bit rate and the image distortion rate.