Field of the Invention
This invention relates to visual point matching between a pair of images taken from different viewpoints, and more particularly to a technique of multi-scale correspondence point matching using a constellation of image chips.
Description of the Related Art
The problem of establishing correspondences between a pair of images taken from different viewpoints is central to many computer vision applications such as stereo vision, 3D reconstruction, image database retrieval, object recognition, autonomous navigation. Visual point matching for arbitrary image pairs can be very challenging because of the significant changes the scene can undergo between the two views and the complexity caused by the 3D structures: a change of viewing angle can cause a shift in perceived reflection and hue of the surface by the camera, a change of view can cause geometric distortion in the shape of objects (e.g., foreshortening due to 3D projection) in the images; a change of view can also result in object appearing at different scales or being occluded. Issues such as object motion, lighting condition change further complicate the task.
Visual point matching techniques have been investigated for decades. Earlier techniques focus on matching points taken by calibrated stereo camera pairs. More recently, there has been growing interest in techniques for matching points between images that are taken with different (possible unknown) cameras, possible at different time, and with arbitrary viewpoints. Correspondence methods in the published literature generally fall into two types: feature-based methods that attempt to extract small amount of local salient features to establish matches W. Förstner, “A feature based correspondence algorithm for image matching,” International Archives of Photogrammetry and Remote Sensing, vol. 26, no. 3, pp. 150-166, 1986 and C. Harris, “Geometry from visual motion,” in Active Vision, Cambridge, Mass. USA, MIT Press, 1993, pp. 263-284; direct methods that attempt to use all of the pixels to iteratively align images B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in Proceedings of the 7th International Joint Conference on Artificial Intelligence, 1981 and J. R. Bergen, P. Anandan, K. J. Hanna and H. Rajesh, “Hierarchical model-based motion estimation,” in Computer Vision—ECCV'92, 1992. The Middlebury stereo vision benchmark and the related more than 150 publications provide an assessment of the state-of-the-art. Scharstein and R. Szeliski, “Stereo—Middlebury Computer Vision,” http://vision.middlebury.edu/stereo/20 Oct. 2014.
In Brown, R. Szeliski and S. Winder, “Multi-image matching using multi-scale oriented patches,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, the authors proposed a correspondence technique based on matching up multi-scale Harris corner points. Harris corner points are detected over multi-resolution pyramids of input images. The authors define an 8×8 patch at each Harris corner point. Matching is done over the feature descriptor of the patches. This approach uses specific feature points (Harris corner points); it creates a feature descriptor by sampling a local 8×8 patch of pixels around the interesting point and performs the Haar wavelet transformation to form a 64-dimenstional vector. It then uses a nearest neighbor search to find the best matches.
In T. Li, G. Mona K., L. Kyungmoo, A. L. Wallace, H. K. Young and A. D. Michael, “Robust multiscale stereo matching from fundus images with radiometric differences,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 2245-2258, 2011, the authors developed a feature-point based, multi-scale stereo matching technique: the approach generates scale spaces of the input image pair with variable-scale Gaussian kernels and solve the dense point correspondence problem by evaluating the continuous behavior of the feature points in the scale space. The approach uses the predicted scale space drift behavior of “SIFT”-like feature points to regularize the search for the best match. In addition, the approach in (Li, Mona K., Kyungmoo, Wallace, Young, & Michael, 2011) propagates the search from coarse-to-fine scale in the scale space.
In J. Kim, C. Liu, F. Sha and K. Grauman, “Deformable spatial pyramid matching for fast dense correspondences,” in IEEE Conference on Computer Vision and Pattern Recognition, 2013, the authors developed a deformable spatial pyramid (DSP) graph based matching technique for the correspondence problem. The approach performs matching over multi-resolution pyramids of input images. The approach uses “cells” (group of pixels) to define the elements in each pyramid layer and defines a graph model over cells in the pyramid. In addition, the approach establishes correspondence over special feature points (Harris corner points) between the images via a graph search method.
In C. Barnes, E. Shechtman, D. B. Goldman and A. Finkelstein, “The generalized patchmatch correspondence algorithm,” in computer Vision—ECCV, 2010, the authors developed a multi-scale searching scheme to match rectangular patches of two images for the correspondence problem. The approach compares an unscaled patch in one image with patches at a range of rotations and scales in the other image and find the best match.