1. Field of the Invention
The present invention relates to stereo matching.
2. Description of the Related Art
Stereo Matching (or stereo correspondence) is one of the most actively researched topics in computer vision. Though there are other available representations, most stereo matching methods produce a single-valued disparity function d(x, y) with respect to a reference image, which could be one of the input images, or a view in between some of the images. By using such representation, the concept of a disparity space (x, y, d) is naturally introduced. In computer vision, disparity is often treated as synonymous with inverse depth, since both are equivalent to each other as simple trigonometric relationship reveals. If the (x, y) coordinates of the disparity space are taken to be coincident with the pixel coordinates of a reference image chosen from input data set, the correspondence between a pixel (x, y) in reference image r and a pixel (x′, y′) in matching image m is then given byx′=x+d(x,y), y′=y,  (1)where d(x, y) is a disparity.
Once the disparity space has been specified, the concept of a disparity space image or DSI is introduced. In general, a DSI is any image or function defined over a continuous or discrete version of disparity space (x, y, d). In practice, the DSI usually represents the confidence or log likelihood (i.e., cost) of a particular match implied by d(x, y). The goal of a stereo correspondence algorithm is then to produce a single-valued function in disparity space d(x, y) that best describes the shape of the surfaces in the scene. This can be viewed as finding a surface embedded in the disparity space image that has some optimality property, such as lowest cost and best (piecewise) smoothness. FIG. 1 shows an example of slice through a typical DSI.
The stereo algorithms generally perform the following four steps: (step 1) matching cost computation; (step 2) cost (support) aggregation; (step 3) disparity computation/optimization; and (step 4) disparity refinement. The actual sequence of steps taken depends on the specific algorithm. Some local algorithms, however, combine steps 1 and 2 and use a matching cost that is based upon a support region, e.g. normalized cross-correlation and the rank transform. On the other hand, global algorithms make explicit smoothness assumptions and then solve an optimization problem. Such algorithms typically do not perform an aggregation step, but rather seek a disparity assignment (step 3) that minimizes a global cost function that combines data (step 1) and smoothness terms. The main distinction between these algorithms is the minimization procedure used, e.g., simulated annealing, probabilistic (mean-field) diffusion, or graph cuts.
In between these two broad classes are certain iterative algorithms that do not explicitly state a global function that is to be minimized, but whose behavior mimics closely that of iterative optimization algorithms. Hierarchical (coarse-to-fine) algorithms resemble such iterative algorithms, but typically operate on an image pyramid, where results from coarser levels are used to constrain a more local search at finer levels.
The vast majority of researches in stereo matching have been focused on improving the accuracy of a resulting disparity map. In contrast, reducing the processing time in real-time or near real-time stereo matching applications has been a relatively less popular research topic until recently. However, there are many important applications which require decent stereo matching accuracy while real-time requirements being met. Such applications include Augmented Reality (AR), New View Synthesis (NVS) (e.g., for gaze correction in video-conferencing or Free Viewpoint Video), robot vision for navigation or unmanned car driving, etc.
Processing time reduction in stereo matching is mostly achieved by hardware optimization or acceleration. For example, Wang et al. (“High-quality real-time stereo using adaptive cost aggregation and dynamic programming”, 3DPVT 2006) discussed GPU acceleration method for their algorithm proposed as an add-on to the existing basic Dynamic Programming (DP) among others, and thereby tried to meet the real-time requirements. On the other hand, (“Real-time stereo by using dynamic programming”, CVPR 2004) proposed a coarse to fine approach and MMX based assembler optimization, and also proposed a solution to reduce the size of DP matrix by first applying DP on every n-th scanline and then finding possible disparity range for applying DP on remaining in-between scanlines—which is a rare occasion to propose algorithm-level consideration for processing time reduction (i.e., by reducing the amount of computations) while its applicability is limited to the DP based stereo algorithm.
It is, however, highly desirable to provide the processing time reduction measure on an algorithm-level which is compatible with any hardware-level implementation for processing time reduction, since this may help make any type of stereo matching implementation more suitable for real time application (or at least enhance the processing speed).