With the ascent of new parallel computing platforms, such as the use of GPUs, as presented in NVIDIA: CUDA compute unified device architecture, prog. guide, version 1.1, 2007 and various accelerated processing units (APUs), real-time high-quality stereo imaging has become increasingly feasible. GPUs are comprised of a number of threaded Streaming multiprocessors (SMs), each of which is, in turn, comprised of a number of streaming processors (SPs), with example architectures presented in David Kirk and Wen-Mei W. Hwu, Programming Massively Parallel Processors A Hands-on Approach: Elsevier, 2010.
The human visual system is very hierarchical, and visual recognition is performed in layers, first by recognizing the most basic features of an image, and then recognizing higher-level combinations of those features. This process continues until the brain recognizes an adequately high-level representation of the visual input. FIG. 1 is a diagram illustrating possible different levels in the visual hierarchy as set forth in M. Marszalek and C. Schmid, “Semantic hierarchies for visual object recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR '07, MN, 2007, pp. 1-7. As is shown in FIG. 1, the most basic functions 110 are first recognized by an individual. Thereafter, higher level patterns and clusters 120 are recognized from these clusters 110. Moving up the hierarchy, shapes and segments 130 are recognized from groups of the patterns and clusters. Finally, after many layers of aggregation, one or more complex objects 140 may be recognized by the individual.
There are many different approaches to stereo imaging. In accordance with the present invention, segment-based approaches will be mainly utilized, and may also be referred to as surface stereo. This is because segment-based approaches best resemble the human visual system. Such algorithms are ones in which the 3D field-of-view is treated as a set of smooth, slowly varying surfaces as set forth in Michael Bleyer, Carsten Rother, and Pushmeet Kohli, “Surface Stereo with Soft Segmentation,” in Computer Vision and Pattern Recognition, 2010. Segment-based approaches have emerged in recent years as an alternative to many region-based and pixel-based approaches and have outperformed in accuracy on the Middlebury dataset almost any other algorithm. The Middlebury set is widely considered the reference dataset and metric for stereo/disparity computation algorithms as set forth in (2010) Middlebury Stereo Vision Page. [Online]. http://vision.middlebury.edu/stereo/.
There are many reasons why such methods today represent the more dominant approaches in stereo imaging, see Andreas Klaus, Mario Sormann, and Konrad Karner, “Segment-Based Stereo Matching Using Belief Propagation and a Self-Adapting Dissimilarity Measure,” in Proceedings of ICPR 2006, 2006, pp. 15-18. Segment-based approaches address semi-occlusions very well. They are also more robust to local changes. Other pixel and region-based approaches blur edges, causing ambiguity between background and foreground regions, as well as potentially removing smaller objects, as noted in Ines Ernst and Heiko Hirschmuller, “Mutual Information based Semi-Global Stereo Matching on the GPU,” in Lecture Notes in Computer Science, vol. 5358, 2008, pp. 228-239. A cross-based local approach as set forth in Jiangbo Lu, Ke Zhang, Gauthier Lafruit, and Francky Catthoor, “REAL-TIME STEREO MATCHING: A CROSS-BASED LOCAL APPROACH,” in 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009 represents an implementation of such approaches on the GPU, but is still impractical because it exhibits weaknesses at regions of high texture and regions with abrupt changes in color/intensity. However, many segment-based approaches are therefore tedious, inaccurate and require a significant amount of computation, even on the GPU.
Therefore, it would be beneficial to provide an improved segment-based approach that overcomes the drawbacks of the prior art.