1. Field of the Invention
The invention relates to a method for the detection of the relative depth of objects in an image from a pair of images.
The general context of the invention is the estimation of the relative depths of objects seen in images coming from different shots whether these images are taken successively in time or from different angles.
This method is part of the framework of various applications requiring a semantic analysis of image sequences. In particular, in the field of multimedia, the growing quantity of audiovisual data is creating the need for new functions such as the interactivity and the integration of objects of different kinds. The new MPEG-4 standard makes it possible to show a scene as a composition of objects but it does not specify the way in which the scene is analyzed.
2. Description of the Prior Art
The current techniques relevant to the framework of this invention study the zones of occlusion that appear in a sequence of images. These techniques are described in an exemplary occlusion in the next paragraph.
FIG. 1 shows an exemplary situation giving rise to the occlusion. A sensor, herein represented by an eye, sees two objects A and B: one object B that moves leftward covers a second object A that moves rightward.
When the motion, namely the shifting of the objects with respect to each other, is observed, it becomes clear that, in a certain number of situations, one object passes in front of another.
The relative depth of an object with respect to another is the depth with respect to the view that an observer might have along the line that goes from the observer's eye and crosses the objects of the scene.
Along this axis, there are objects that are in different planes.
In fact, it is not sought here to assess the depth itself but to know which object is in the front with respect to another object of the scene. This information is necessary for a certain number of applications, especially for encoding when it is sought to carry out image prediction. This information makes it possible for example to reconstruct the background of an image.
Indeed, with the relative depth being known, it is possible to define the background of an image and, as the case may be, a) neutralize this background or b) make it fuzzy or c) replace it by another or d) compress the information with very few bits and concentrate the essential part of the information on the part that is in front.
The detection of the relative depth between objects is therefore aimed at providing a better understanding of the observed scene.
When the manner in which the objects move is observed, and when it is seen that they are behind other objects that do not move or have a motion is proper to them, it is thus possible to define the organization of the scene without introducing any semantic knowledge, i.e. without being capable of recognizing the type of object that is in the scene.
It is known simply that this is a set of components that are homogeneous in color and in texture, namely homogeneous zones that are related to one another because they have the same motion. The homogeneous zones are assembled in entities that have is motions proper to themselves.
By observing the motion boundaries between the different entities, it can be deduced therefrom that the entity E1 is locally in front of the entity E2 which is itself in front of the entity E3.
By integrating these information elements in time through images successively, it is possible to obtain a relatively deep structure.
There is therefore need to study the relative depth of the regions to detect their motion boundaries. In the prior art, these boundaries are obtained by means of a motion segmentation.
It may be recalled that image segmentation is a known technique consisting of the conversion of a set of pixels into a mosaic image where each particle related to the mosaic has a homogeneity of color of texture (namely luminance) or of motion or a combination of several criteria. In the case of motion segmentation, each mosaic has a homogeneity of motion.
Now, to study the shifting of a motion boundary, it is necessary to take account of three images of the scene by way of input information.
Indeed, the existing techniques seek to detect the motion boundaries and then compare the motion of these boundaries with the motion of the adjacent regions in order reach a conclusion. Now, to estimate the motion, it is necessary to analyze two successive images and, estimate the motion of the boundary, it is necessary to have two successive positions of the boundary, giving three images to be analyzed.
This technique is given in detail here below with reference to FIGS. 2A and 2B.
By analyzing two consecutive images I1, I2 of a sequence, it is possible to estimate the motion of the scene. This motion may be used to segment the scene into objects A, B whose motions are independent. FIG. 2A shows the motion of the two objects A, B as well as the segmentation.
This motion segmentation does not contain sufficient information to deduce the relative depth of the two objects. The analysis of the motion of a second pair of images I.sub.2 and I.sub.3 gives the missing information: the two types of segmentation enable the estimation of the motion of the contour (or boundary) between the two objects.
The comparison of the motion of the contour (boundary) with the motion of the texture (luminance) of the two sides enables the relative depth to be deduced: the region that has the same motion as the contour corresponds to the occluding object. In this example, the two successive segmentations of the motion, shown in FIGS. 2A and 2B, indicate that the contour moves leftward. Since the motion is identical to the motion of the right-hand region, it is concluded therefrom that the object to the right is occluding the object to the left.
In the literature, there are different approaches that make use of this fact. Thompson, Mutch and Berzins (ref. D14 hereinafter) use the pairing of characteristic points to obtain a sparse field of speed that explains the motion between two images. Then they detect discontinuities in this field of speed. The analysis of two fields of speed (computed from two pairs of images) enables them to deduce the relative depth.
A second approach is described by Darrell and Fleet (ref. D12 hereinafter). This approach segments the scene into planes with a coherent motion using exclusively the motion information. The progress of these planes makes it possible to determine the motion of the contours which in turn enable the estimation of the relative depth.
Reference may also be made to the prior art constituted by the documents D1-D18 cited here below for the techniques described and commonly used for image processing: D1: S. Beucher. Segmentation d'Images et Morphologie Mathematique (Image Segmentation and Mathematical Morphology), Phd thesis, E.N.S. des Mines de Paris, 1990. D2: J. Barron, D. Fleet and S. Beauchemin. Performance of Optical Flow Techniques. International Journal of Computer Vision, 12(1): 43-77, 1994. D3: K. M. Mutch and W. B. Thompson. Analysis of Accretion and Deletion at Boundaries in Dynamic Scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 7: 133-138, 1985. D4: E. Decenciere Ferrandiere, C. de Fouquet and F. Meyer. Applications of Kriging to Image Sequence Coding. Accepted for publication in Signal Processing: Image Communication, 1997. D5: F. Hampel, E. Ronchetti P. Rousseeuw, and W. Stahel. Robust Statistics--The Approach Based on Influence Function. Wiley, 1986. D6: P. Huber. Robust Statistics. John Wiley, New York, 1981. D7: Peter Meer, Doron Mintz, Dong Yoon Kim, and Azriel Rosenfeld. Robust Regression Methods for Computer Vision. A Review. International Journal of Computer Vision, 6(1): 59-70, April 1991. D8: Nikhil R. Pal and Sankar K. Pal. A Review on Image Segmentation Techniques. PatternRecognition, 26(9): 1277-1294,1993. D9: J. Y. A. Wang and E. H. Adelson. Representing Moving Images with Layers. The IEEE Transactions on Image Processing Special Issue: Image Sequence Compression, 3(5): 625-638, September 1994. D10: G. Wolberg. Digital Image Warping. IEEE Computer Press, 1990. D11: J. Cichosz and F. Meyer. Morphological Multiscale Image Segmentation. In Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS'97), pages 161-166, Louvain-la-Neuve (Belgium), June 1997. D12: Trevor Darrel and David Fleet. Second-Order Method for Occlusion Relationships in Motion Layers. Technical Report 314, MIT Media Lab Vismod, 1995. D13: B. K. P. Horn and B. G. Schunck. Determining Optical Flow. Artificial Intelligence, 17: 185-203, 1981. D14: W. B. Thompson, K. M. Mutch, and V. A. Berzins. Dynamic Occlusion Analysis in Optical Flow Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 7: 374-383, 1985. D15: Zhengyou Zhang. Parameter Estimation Techniques: A Tutorial with Application to Conic Fitting. Technical Report 2676, Institute National de Recherche en Informatique et en Automatique, Sophia-Antipolis Cedex, France, October 1995. D16: P. Chauvet Aide Memoire de geostatique lineaire. Ecole des Mines de Paris, 1993. D17: Michael J. Black and Allan D. Jepson. Estimating Optical Flow in Segmented Images Using Variable-Order Parametric Models With Local Deformations. IEEE Trans. Pattern Analysis and Machine Intelligence, 18(10): 972-986, October 1996. D18: L. Bergen and F. Meyer. Segmentation du mouvement des objets dans une scene (Segmentation of the Motion of the Objects in a Scene). In Coresa 97, 1997.
The drawbacks of the techniques presented in the above paragraph (documents D12, D14) are the following:
These techniques are based entirely on motion leading to an imprecise localization of the motion boundaries.
These techniques use three images (two motion segmentations) to determine the motion of the contours; the problems of lack of precision in the localizing of the boundaries spread over into the estimation of the motion of the contours and therefore in the detection of the depth. Furthermore, this leads to an additional delay in analysis.
Furthermore, the field of application of these techniques is restricted to cases where the motion is relatively great.