1. Field of the Invention
The present invention relates to a method and apparatus employing stereo vision methodology for determining depth information between an object and an image.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
2. Description of the Prior Art
Stereo vision, is widely used in computer vision for determining the distance between an object and an image plane. In stereo vision images as seen by two eyes are slightly different due to binocular parallax. An object point produces associated image points in the two images seen by the eyes. When the two images are overlayed, there is a given finite distance between the two pixels corresponding to the same object point. The distance between these two pixels is called disparity. The disparity varies over the visual field and is inversely proportional to the distance (depth) between the object point and the observer, i.e., the image plane. It is known that the distance between the image plane and the object point can be determined from the disparity.
A significant problem in determining that distance is the accurate determination of the disparity. The difficulty in finding a pair of pixels in two images corresponding to the same object point is referred to as the correspondence problem in stereo vision. The process of finding such pairs of points for determining disparity is referred to as stereo matching.
Most stereo vision algorithms in the literature disclose recovering depth information utilizing two or three images, i.e., binocular or trinocular stereo, respectively. The algorithms compute depth information between an image plane and an object point in a noise insensitive manner. Image resolution causes problems in computing the depth values accurately because the measure of the disparity that corresponds to the image of an object point is quantized to an integer. Quantization can be viewed as adding an error to the measure of disparity. Accurate disparity measurement is crucial for computing depth information accurately. Various methodologies solving this problem use what is referred to as relaxation methods to smooth constraints on disparity fields and optimization methods to improve the accuracy of disparity measurements.
From knowledge of the relative locations of two camera positions, the disparity vector between two pixels that correspond to images of an object point at the two camera positions is constrained to a line referred to as the epi-polar line constraint.
The prior art methods control camera position and orientation to recover depth information. Some of these methodologies compute image or optical flows from a sequence of images and use estimated flow fields to compute the depth. Other methodologies control both position and orientation of a camera so that an image point has zero flow in the different images, i.e., a fixed point. A fixed point is referred to as a point of fixation whose depth can be computed easily. Information about a scene in different images can be computed from fixed points and flow fields. It is also possible to perform fixation repeatedly to recover depth information for a given scene. Stereo vision is a specialized method of flow-based methodologies. Disparity vectors are image flows.
The most obvious approach to stereo matching uses correlation described in more detail in an article by Rafael C. Gonzalez and Paul Wintz, Digital Image Processing, 2nd Ed. Addision Wesley, New York, 1987. This approach compares the correlation between intensities of points in two image-patches. The image patch that has the highest cross correlation with the given patch is chosen as the corresponding patch of the given one.
This approach suffers from some of the draw-backs of traditional stereo vision described in an article by S.T. Barnard, Stochastic stereo matching over scale, International Journal of Computer Vision, 3:17-32, 1989.
The drawbacks include:
1. The size of the patches affects the likelihood of a false match. The given patch must be large enough to differentiate this patch from others. PA1 2. The patch must be small compared to the variation in the disparity map. If the patch is too large the correlation will be insensitive to significant abrupt changes in the image (such as those pixels corresponding to edges of the object). This problem has motivated the use of coarse-to-fine stereo-matching. PA1 3. The correlation is not a reliable matching mechanism if the viewed area consists of uniform or slowly varying intensity. PA1 1. Some false matches are likely to happen at areas containing many features. PA1 2. The error of the feature location in image coordinates must be smaller than the error of its disparity to locate 3D features accurately. PA1 3. This approach provides sparse matching only.
A second approach is to match information-rich points, called features which are usually discrete. The stereo-matching process establishes the correspondence between two discrete sets of features. This approach has the following draw-backs (see the Barnard article noted above).
A third approach decomposes each image into a pyramid of patches. Coarse stereo matching is performed on large patches first as discussed in an article by Umesh R. Dhond and J.K. Aggarwal, "Structure from Stereo--A Review", IEEE Transactions on Systems, Man, and Cybernetics, Vol. 19, No. 6, November/December, 1989, pp. 1489-1510. The course disparity is used to guide the match of smaller patches across images. This approach has some computational advantages and can ameliorate a false-target problem. The scale space must be well-chosen such that the coarse disparity is a good boundary of the fine disparity. Automatically selecting such a scale-space is difficult.
Another approach is to map the stereo matching problem to an optimization problem discussed in articles by T. Poggio, V. Torre, and C. Koch, Computational Vision and Regularization Theory, Nature, 317: 314-319, 1985 and A. Witkin, D. Terzopoulis, and M. Kass, Signal matching through scale space, International Journal of Computer Vision, 1:133-144, 1987. This approach represents tile mismatch between pixels of each pair and the variation of the disparity field by a potential energy. The stereo matching is solved by finding corresponding pixels such that the total potential of the disparity field will be minimum. Such methods are usually expensive in computation. Good initial values need be given if the optimization problem is non-linear. The potential functions used in this approach frequently have parameters relevant to the type of surface. However, the goal is to determine the characterization of the object surface when no knowledge about the surface is available. The definition of a potential function cannot be given if there is no prior knowledge. This approach causes some problems in computer vision. Another problem with this approach is due to the nature of the sensory device, typically a CCD camera, which provides discrete data. The model used, in contrast, usually is expressed in a continuous form. The discrete approximation of the continuous form introduces errors in computing the solution. The methods mentioned above are solutions having trade-offs between efficiency and the c/quality of matching.
They are all vulnerable in certain conditions. There are two main causes of the problems: (1) Too few constraints are available to guide the stereo matching; and (2) images are taken in different imaging geometries and many events could confuse the matching.