This disclosure relates generally to the field of image processing. More particularly, but not by way of limitation, it relates to a technique for improving disparity estimation operations by incorporating color-refinement estimation therein.
The process of estimating the depth of a scene from two cameras is commonly referred to as stereoscopic vision and, when using multiple cameras, multi-view stereo. In practice, many multi-camera systems use disparity as a proxy for depth. (As used herein, disparity is taken to mean the difference in the projected location of a scene point in one image compared to that same point in another image captured by a different camera.) With a geometrically calibrated camera system, disparity can be mapped onto scene depth. The fundamental task for such multi-camera vision-based depth estimation systems then is to find matches, or correspondences, of points between images from two or more cameras. Using geometric calibration, the correspondences of a point in a reference image (A) can be shown to lie along a certain line, curve or path in another image (B).
Typically image noise, differences in precise color calibration of each camera, and other factors can lead to multiple possible matches and incorrect matches when considering only single points (i.e., pixels). For this reason, many known matching techniques use image patches or neighborhoods to compare the region around a point in image A with the region around a candidate point in image B. Simply comparing a whole patch rather than a sampled pixel value can mitigate noise, but not color biases from one image to another such as are present between almost any two different sensors.
Methods such as Normalized Cross-Correlation (NCC) or Census transform can obtain better matches of image features when there are color or lighting changes between the images. While these approaches provide improved matching, they do so at the cost of filtering and discarding some of the original images' intrinsic information: namely areas of limited texture where there is still a slow gradient (e.g., a slow change in color or intensity). For example, a transition from light to dark across a large flat wall in a scene will be transformed by these methods so as to contain little matching information except at the area's edges. With either pixel-wise or patch-based matching, gradually changing image areas also cannot normally be matched.