As indicated above, in stereoscopic computer vision there is often a need to match the left and right images of a scene, usually in the presence of noise or other disturbances that introduce uncertainty and complicate making the match. As a consequence, a need has developed for algorithms that facilitate making correct correspondences. Correct correspondences are important for reliable depth estimation. Measuring depth or range via stereoscopic vision instead of by laser, radar or acoustic time-of-flight has several advantages. First, stereo ranging is passive, i.e. it does not need to broadcast a signal to illuminate the objects of interest. This is important for military surveillance. Second, time-of-flight devices measure the depth or range to a single point. These sensors are then spatially scanned to determine the depth of more than one point. In contrast, stereo vision allows the depth of almost all points in the image to be determined simultaneously, thereby providing the potential for a much faster range imaging sensor. Possible applications include the machine inspection of three dimensional surfaces, e.g. automobile body parts, to ensure that the surfaces are within specified tolerances. One can expect to use stereoscopic vision for various other functions.
FIG. 1 shows the general geometric structure of all stereo vision system. It consists of two cameras, whose optical axes needed not be parallel although they often are. Each camera forms an image of the world based on perspective projection. A point A, visible to both cameras, forms an image in each camera's image plane, denoted A.sub.l and A.sub.r respectively. If A.sub.l and A.sub.r are known, together with the relative position of one camera with respect to the other, then simple triangulation allows the distance to the point A to be inferred. Determining the corresponding image points A.sub.l and A.sub.r is probably the most difficult task associated with stereo imaging. Fortunately, geometric constraints restrict matches to be along epipolar lines. For each point A in the scene, there exists an epipolar plane that passes through said point and the line joining to centers of the two camera lenses. This plane intersects the image planes of the two cameras to form two corresponding epipolar lines, one in each image. In fact, all the scene points in a epipolar plane are projected onto the two epipolar lines. The two dimensional matching problem is thus reduced to determining a set of one-dimensional matches, which significantly reduces the complexity of the problem. Determining the location of the epipolar lines requires careful calibration of the camera system, but is straightforward. The key problem which is addressed by this invention is to determine the set of correspondences between the left and right epipolar lines, i.e. for each point i in the left line, its corresponding point j in the right image must be determined. The offset between i and j, (i-j) is called the disparity at point i. The epipolar lines are therefore assumed to be given and, for convenience, are assumed to be parallel to the scanlines of the two cameras.
In the past, stereoscopic vision has depended on relatively complex algorithms to determine the correspondence between two stereoscopic images necessary for an accurate measure of the depth or distance of the object. Previous stereo algorithms can be characterized by (1) the primitive features that are matched, (2) the local cost of matching two primitive features, and (3) the global cost and associated constraints. Previous matching primitives have almost exclusively been either edge or correlation based. Edge-based stereo suffers from (1) the need to extract edges--this can be a difficult process and is the subject of much current research--and (2) that the resulting depth map is very sparse. Only the depth at the edges is known, but edge points usually represent only a very small fraction of all the points in the image. Correlation-based techniques examine the correlation between the intensities within regions or windows of varying sizes. While feature extraction is eliminated, it is replaced by the more difficult need to adaptively change the size of the correlation window depending on whether or not the window is over a disparity discontinuity. This is also the subject of active research. Stereo algorithms include a local "cost" function that measures the degree of similarity between two features, and a global cost, the best set of matches being the one with the lowest global cost. Generally the local cost function consists of a weighted squared error term together with a regularization term that is based on the differences between neighboring disparities or disparity gradients. This term implicitly assumes that surfaces are smooth, which is not the case at depth discontinuities that generally are the most important features of any depth map. The effect of this regularization term is to smooth or blur depth discontinuities. The regularization term has been considered necessary to constrain matches to nearby points. In addition, two other constraints are common. The first is uniqueness, i.e. a point in the left image can match no more than a single point in the right image. The second constraint is ordering, i.e. if point z.sub.l,i matches point z.sub.r,j then the point z.sub.l,i+1 must match a point z.sub.r,j+k where k&gt;0, i.e. the order of features in the left and right images is assumed preserved. This assumption is generally true for solid objects viewed from a distance, but can be violated for example by thin objects such as poles that are close to the observer.
The present invention seeks to improve upon the correct number of correspondences found between the two images by providing an improved algorithm for making the matches. This should result in systems with improved stereoscopic vision. It also facilitates determining the distance of a feature by stereoscopic computer vision.
The basic principles of stereo correspondence and definitions of much of the terminology being used in this application are included in a paper entitled "Computational Issues in Solving the Stereo Correspondence Problem" by J. P. Frisby and S. B. Pollard at pages 331-357 in a book entitled "Computational Models of Visual Processing" edited by M. S. Landy and J. A. Movshan, and published by the MIT Press, Cambridge, Mass. (1991).