Stereo correspondence is a process that captures two or more images of a certain scene, estimates a three-dimensional (3D) model of the scene through finding out matching pixels between the images precisely, and converts two-dimensional (2D) positions of the matching pixels between the images into 3D depths. In a simple imaging configuration (e.g., an imaging configuration of two eyes or two cameras looking straight forward), a disparity between the two eyes or the two cameras is in inverse proportion to a distance between the two eyes or the two cameras and an observed object (i.e., a stereo depth of the observed object in the captured image). Therefore, a disparity map is usually used to describe stereo depths of pixels in a captured image.
In traditional stereo correspondence algorithms, it is usually to take one of the two images captured by two eyes or two cameras as a reference image and another image as an object image, and output a disparity map of the object image with respect to the reference image.
Although there are lots of algorithms for stereo correspondence, they usually comprise the following steps: a matching cost computation step, a cost aggregation step, a disparity computation step, and a disparity optimization step, wherein:
The matching cost computation step computes pixel value differences, between the reference image and the object image, corresponding to every disparity value between a minimum disparity value (dmin.) and a maximum disparity value (dmax). All the disparity values between “dmin” and “dmax” and all the pixel value differences corresponding to the disparity values form an initial disparity space image (DSI). Traditional matching cost computation methods comprise: (1) a method for computing squares of intensity differences (SDs) and a method for computing absolute values of light intensity differences (ADs) (the two methods are both sensitive to noises); and (2) non-parametric methods such as ranking transformation and statistics transformation (these methods are not very sensitive to noises, but their computation time is long).
The cost aggregation step acquires a more reliable matching cost through summing matching costs in support windows on matching cost planes corresponding to every disparity value together. The most commonly-used cost aggregation method is to sum matching costs in a fixed window on a matching cost plane together. However, this method has defects in many aspects, because that: (1) it ignores discontinuity of stereo depths of pixels in an image; and (2) it does not process blurring regions in the image. Therefore, an ideal cost aggregation method should use a support window comprising as more as possible points corresponding a same disparity value on a matching cost plane. For this, support windows such as a movable window, a multiple-window and a variable window are provided. However, all of these windows fail to obtain a satisfactory result, and their efficiencies are not high.
The disparity computation step acquires a disparity map of an image based on the initial DSI. Generally, for a pixel on an image captured with respect to a scene, a disparity value corresponding to a minimum matching cost aggregation value associated with the pixel is selected as the disparity value of the pixel.
The disparity optimization step performs post-process on the acquired disparity value, and further comprises a sub-pixel optimization sub-step, an occlusion detection sub-step and an occlusion filling sub-step. Traditional occlusion filling methods comprise: (1) selecting a minimum disparity value of an un-occluded pixel that is closest-in-space to an occluded pixel in a same pixel line as the disparity value of the occluded pixel (the method would produce stripe-like artifacts); and (2) smoothing occluded regions by a bilateral filter (the processing speed of the method is comparative slow).