The invention relates to a method for segmenting a video image based on elementary objects.
At present, it is completely impossible to reproduce the functioning of the human visual and cognitive system using procedures for segmenting video images based on elementary objects emanating from computer-based vision processes. Specifically, the resulting image obtained by virtue of the implementation of the aforesaid processes is under-segmented or over-segmented. In neither case do these procedures allow automatic reproduction of the ideal segmentation carried out by a human operator.
Nevertheless, numerous applications have recourse to segmentation, which, in order to appear ideal, ought to be robust, fast, discriminating and nonspecific to a particular field of application. More particularly, the automatic following or calculation, with a view to the acquisition and tracking, of the trace of an object over time in a succession of video images remains a completely open problem, all the more so when the object may deform via complex transformations over time, natural or artificial transformations such as “morphing”.
Among the image segmentation procedures proposed hitherto, several families are customarily distinguished.
A first family corresponds to the conventional segmentation procedures based on filtering, mathematical morphology, region growth, partition of color histograms, Markov procedures. These automatic procedures are applied to an image but the results obtained depend strongly on the particular content of the image and are sensitive to the texture of the image. They do not allow segmentation of the image based on elementary objects in so far as it is difficult to retrieve the contours of an object of interest. The images are over-segmented and the contours detected do not all form a closed list, substantially guaranteeing the integrity of the contour of the object of interest and the segmentation of the latter. The scatter in the results is large between the various procedures and the results are not very robust, two very similar images possibly culminating in a very different segmentation and vice versa one and the same image possibly culminating in a very different segmentation with two procedures.
A second family groups together procedures based on mathematical morphology and which try to remedy the problems and the drawbacks of the procedures of the first family using processes based on a tree structure, a binary partition tree making it possible to characterize the content of the images. Such a tree structure describing the spatial organization of the image is obtained by iteratively merging neighboring regions according to a homogeneity criterion until a single region is obtained. The tree is constructed by preserving the trace of merged regions at each iteration of the process. This procedure offers the possibility of manually marking regions of interest on the original image and of retrieving nodes corresponding to this marking from the partition tree. The drawbacks of the procedures of this family reside in the fact that the entire image is segmented, that it is necessary to have prior knowledge of the number of regions constituting the object, and that the contours of the object which are obtained are not accurate enough or are not the right ones. Specifically, it often happens that the object of interest straddles several regions, the contours of the object, in such a case, therefore not corresponding to the contours of these regions.
A third family groups together statistical procedures based on Markov fields. These procedures carry out a tagging of the regions of the image according to a criterion to be maximized. They can take account of a wide set of a priori information about the image and are particularly suited to satellite images composed of textured and juxtaposed zones.
A fourth family relates to active contour procedures also designated snake. In this type of procedure, described in the article entitled “Snake: Active Contour Models”, published by M KASS, A. WITKIN and D. TERZOPOULOS in the International Journal of Computer Vision, vol. 1, pp. 321–332, 1998, the principle consists in iteratively deforming an initial curve until it hugs the content of the object, by minimizing an energy functional.
This energy is composed of two terms:                the internal energy of the contour, which energy depends on the intrinsic or geometrical properties of the active contour, such as length, curvature, etc. This internal energy term allows a contraction of the active contour around the object and causes a displacement of the latter's nodes in a direction which locally minimizes the energy;        the energy external to the contour, which energy corresponds to a term bound to the data. This external energy term is generally linked with the contours present in an image and slows down the contraction of the active contour around these contours present.        
It is noted in particular that this family of procedures involves a priori knowledge of the contours present in the image, something which, of itself, can be achieved only by virtue of a priori analysis of the image.
A fifth family of procedures corresponds to a development of the procedure of the previous family, in which development, as far as the external forces applied to the active contour are concerned, the model behaves like a balloon inflating under the effect of the aforesaid forces and stops when it encounters marked or predefined contours. Thus, the active contour can overstep contours which are not very marked. Other developments have proposed the use of deformable geometric active contours. These developments use level sets allowing automatic management of the changes of topology of the active contour. However, the procedures of the aforesaid family necessarily require an initialization which is close to the final solution, that is to say to the natural contour of the object, in order to obtain good convergence of the algorithm.
A sixth family of procedures is based on the definition of regions of the image, by prior estimation of these regions and of the background of the image. The curve of the evolution of the active contour is generally defined by deriving a criterion in the distributions sense. This criterion depends on constraints relating to two sets: the background of the image and the objects in motion. The evolution curve can comprise the following three terms:                a term bound to the data;        a hyperbolic term, allowing adaptation to the shape of the objects, and        a parabolic term stabilizing the solution by smoothing the contours.        
The direction of motion of the active contour varies over time, allowing the active contour to dilate or, conversely, to contract at certain nodes. However, these procedures require a labeling of the background of the image and the execution time remains too large, of the order of several minutes, for dynamic applications to moving objects of video images.
As far as the procedures for following objects in the image are concerned, also known as tracking procedures, various families of procedures are currently proposed.
A first family calls upon a meshing technique. According to a first procedure of this family, a hierarchical meshing structure successively estimates the dominant motion of the object, then the latter's internal motions. A hierarchy of meshes is generated from the mask of the object defining a polygonal envelope of this object. Before commencing the hierarchical cycle of motion estimation, an affine global model initializing the coarse mesh of the hierarchy is estimated. This estimation is then propagated to the finest levels where a global estimation is carried out. It sometimes happens that a node strays from the natural contour of the object and attaches itself to the background of the scene, dragging its neighboring nodes with it. This dragging process is linked to a temporal accumulation of errors of positioning of the nodes, since only the initial segmentation is available during optimization. To remedy the aforesaid dragging process, a solution has been proposed which consists in furthermore injecting a procedure much like the active contours procedure. Active contours are generated from the finest mesh of the hierarchization cycle and they evolve over the contours emanating from the segmented current image. These active contours are injected after the first estimation of the motion so as to constrain the vertices of the edges of the mesh to reposition themselves on the outer contours of the object. This solution has not however, been adopted, since the mesh structure is then very complex to use.
A second family calls upon the implementation of active contours, according to the procedures described above. The active contour obtained on the current image is propagated from one image to the next and deforms so as to hug the contours of the object of interest on the successive images. Motion constraints can be added during the minimization of the energy functional.
These procedures can furthermore combine procedures for estimating parameters based on optical flow or based on a model of motion, such as translation, affine transformation, perspective, bilinear deformation or the like, and active contour procedures, with the aim of making object tracking or following more robust. In a specific example, the object following procedure combines an active contour procedure and an analysis of the motion based on regions of the image. The motion of the object is detected by a motion-based segmentation algorithm. An active contour model is then used with the aim of following and segmenting the object. Thereafter, the motion of the region defined inside the active contour is then estimated by a multi-resolution approach based on an affine model. A Kalman filter is used to predict the position of the aforesaid region and hence to initialize the active contour in the next image.
A third family of procedures calls upon techniques based on tag maps, which utilize the image partitioning processes, or tag maps over the pixels of an image. In a first procedure, a technique combining information regarding motion and spatial organization over the images has been proposed with the aim of following an object. The current image is partitioned by a mathematical morphology procedure and the resulting image is compensated by the motion vectors estimated coarsely by a block matching algorithm. The spatial homogeneity of the regions or markers is verified thereafter. These procedures have the limitations of conventional active contour procedures, in particular slowness of convergence.
A second procedure is based on the technique of Markov fields. This procedure comprises a procedure for segmenting an image into regions which are homogeneous in the motion sense by statistical tagging. The partition is obtained according to a criterion of intensity, color and texture.
A third procedure carries out a spatial segmentation of the image into homogeneous regions and tracking is carried out by a back-projection procedure. This involves determining the mask of the object of interest on the current image. Each region of the segmented current image is then back-projected according to the motion onto the previous segmented image. The back-projected regions belonging to the mask of the object then form the new mask of the object on the current image. These procedures have the drawback of yielding rather inaccurate object contours. Specifically, holes or artefacts appear, because of the use of an initial segmentation of the image.