1. Technical Field
The present invention relates to the fields of image processing, feature recognition within images, image analysis, and computer vision. More specifically, the present invention pertains to a method and apparatus for generating three-dimensional models from still imagery or video streams from uncalibrated views.
2. Discussion
The problem of generating three-dimensional (3D) shapes using a set of two-dimensional (2D) images has been addressed to-date by several different techniques. A summary of the more noteworthy of these techniques is presented here. The first several techniques provide background information regarding the registration of 3D image portions to generate 3D shapes, and the subsequent techniques discuss the problem of generating 3D shapes from a set of 2D images.
One of the best-known methods for registration is the iterative closest point (ICP) algorithm of Besl and McKay. It uses a mean-square distance metric which converges monotonically to the nearest local minimum. The ICP algorithm is used for registering 3D shapes by considering the full six degrees of freedom in a set of motion parameters. It has been extended to include the Levenberg-Marquardt non-linear optimization and robust estimation techniques to minimize the registration error.
Another well-known method for registering 3D shapes is the work of Vermuri and Aggarwal where range and intensity data are used for reconstructing complete 3D models from partial ones. Registering range data for the purpose of building surface models of three-dimensional objects was also the focus of the work by Vermuri and Aggarwal entitled “Registering multiview range data to create 3D computer objects,” cited below. Matching image tokens across triplets, rather than pairs, of images has also been considered. In “3D model acquisition from extended image sequences,” cited below, the authors developed a robust estimator for a trifocal tensor based upon corresponding tokens across an image triplet. This was then used to recover a 3D structure. Reconstructing a 3D structure has also been considered using stereo image pairs from an uncalibrated video sequence. Most of these algorithms work well given good initial conditions (e.g. for 3D model alignment, the partial models have to be brought into approximate positions). The problem of automatic “crude” registration (in order to obtain good initial conditions) was addressed in “Invariant-based registration of surface patches,” cited below, where the authors used bitangent curve pairs which could be found and matched efficiently.
In the above methods, geometric properties are used to align 3D shapes. Another important area of interest for registration schemes is 2D image matching, which can be used for applications such as image mosaicing, retrieval from a database, medical imaging etc. 2D matching methods generally rely on extracting features or interest points. In “Comparing and evaluating interest points,” cited below, the authors demonstrate that interest points are stable under different geometric transformations and define their quality based on repeatability rate and information content. One of the most widely used schemes for tracking feature points is the KLT tracker, which combines feature selection and tracking across a sequence of images by minimizing the sum of squared intensity differences over windows in two frames. A probabilistic technique for feature matching in a multi-resolution Bayesian framework was developed and used in uncalibrated image mosaicing. A further approach involves the use of Zemike orthogonal polynomials to compute the relative rigid transformations between images. It allows the recovery of rotational and scaling parameters without the need for extensive correlation and search algorithms. Although, these techniques are somewhat effective, precise registration algorithms are required for applications such as medical imaging. A mutual information criterion, optimized using the simulated annealing technique, has been used to provide the precision necessary for aligning images of the retina.
Most of the state of the art techniques developed to date, as in the case of all the methods above, cannot stitch together two distinct 3D models of a scene or an object without having the 3D models approximately registered before attempting to perform 3D model alignment. In order to approximately register the 3D models, most of the prior art manually picks several points in common between the two 3D models to be stitched together by having a user clicking on “points in common” on both of the 3D models, with a computer mouse. Then the prior art uses these manually registered “points in common” on both of the 3D models to crudely align the models together, then proceeds to match the features extracted from the 3D models and uses the new matching features to morph one 3D model into the other 3D model eventually refining the initial crude alignment, and thus finally stitching the 3D models together.
In an attempt to avoid the manual registration of “points in common” between the two 3D models to be stitched together, various probabilistic schemes have also been used for registration problems. One of the most well-known techniques is the work of Viola and Wells for aligning 2D and 3D objects by maximizing mutual information. The technique is robust with respect to the surface properties of objects and illumination changes. A stochastic optimization procedure was proposed for maximizing the mutual information. A probabilistic technique for matching the spatial arrangement of features using shape statistics was also proposed in “A probabilistic approach to object recognition using local photometry and global geometry,” cited below. Most of these techniques in image registration work for rigid objects. However, constraints using intensity and shape usually break down for non-rigid objects, such as human faces.
Therefore, most of the state of the art techniques developed to date cannot stitch together two distinct 3D models of a scene or an object without having the 3D models approximately registered before attempting to perform 3D model alignment (i.e. good initial conditions). Furthermore, the methods trying to fix the problem of automatic “image registration” break down when attempting to align 3D models generated for non-rigid objects such as human faces. Thus, artisans are faced with the problem of choosing between 3D stitching algorithms that works for non-rigid objects but required manual “image registration” of “points in common” between the models, or choosing a 3D stitching algorithm that only works for rigid objects but does not required the manual “image registration” of the models.
In addition, the problem of automatic “image registration” increases greatly by obtaining 3D models extracted from uncalibrated image capturing devices, where the user does not have information concerning the location of an uncalibrated image capturing device with respect to the object of interest, and with respect to the location of any of the other uncalibrated image capturing devices generating the other 3D models to be stitched together.
A need exists in the art for a technique that does not require the manual “image registration” of “points in common” between the models prior to stitching them together. Instead, it would be desirable to automatically establish a global correspondence between two 3D models (or two 2D models) by minimizing the probability of error of a match between the entire constellation of features extracted from the models, thus taking into account the global spatial configuration of the features for each of the models. Furthermore, it would be desirable for the technique to work with both rigid and non-rigid types of objects as well as for complex scenes containing both rigid and non-rigid objects captured from multiple uncalibrated image capturing device locations.
While most of the state of the art techniques developed to date employ a local matching strategy that only establishes correspondence between the individual features within a local region in a model, it would be more desirable to employ a “global” matching strategy, thus emphasizing the “structural description” of a scene or an object within the model. In addition, the embodiment uses the object's prior shape information to generate a robust matching scheme which supports the detection of missing features and occlusions between views.
Thus, there is a great need in the art for a system for generating three-dimensional models from still imagery or video streams from uncalibrated views captured from an uncalibrated image capturing device location, where the system stitches together the three-dimensional models viewed from a subset of the uncalibrated image capturing device locations with out the need of manual “image registration” of “points in common” between the models, and wherein the system works for both rigid and non-rigid objects.
The following references are presented for further background information:    [1] P. Beardsley, P. Torr, and A. Zisserman, “3D model acquisition from extended image sequences,” in European Conference on Computer Vision, Cambridge, UK, 1996, pp. 683-695.    [2] R. Koch, M. Pollefeys, and L. Van Gool, “Multi viewpoint stereo from uncalibrated sequences,” in European Conference on Computer Vision, Freiburg, Germany, 1998, pp. 55-71.    [3] B. C. Vemuri and J. K. Aggarwal, “3d model construction from multiple views using range and intensity data,” in IEEE Computer Vision and Pattern Recognition, Miami Beach, 1986, pp. 435-437.    [4] J. Vanden Wyngaerd, L. VanGool, R. Koch, and M. Proesmans, “Invariant-based registration of surface patches,” in ICCV99, 1999, pp. 301-306.    [5] P. J. Besl and N. D. McKay, “A method for registration of 3-d shapes,” PAMI, vol. 14, no. 2, pp. 239-256, February 1992.    [6] P. A. Viola and W. M. Wells, III, “Alignment by maximization of mutual information,” IJCV, vol. 24, no. 2, pp. 137-154, September 1997.    [7] M. C. Burl, M. Weber, and P. Perona, “A probabilistic approach to object recognition using local photometry and global geometry,” in ECCV98, 1998.    [8] R. Sinkhorn, “A relationship between arbitrary positive matrices and doubly stochastic matrices,” Annals Math. Statist. vol. 35, pp. 876-879, 1964.    [9] M. D. Srinath, P. K. Rajasekaran, and R. Viswanathan, Introduction to Statistical Signal Processing with Applications, Prentice Hall, 1996.    [10] R. Duda and P. Hart, Pattern Classification and Scene Analysis, John Wiley and Sons, 1973.    [11] A. W. Fitzgibbon, “Robust registration of 2D and 3D point sets,” in British Machine Vision Conference, 2001, pp. 662-670.    [12] G. Blais and M. D. Levine, “Registering multiview range data to create 3D computer objects,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 17, pp. 820-824, AUGUST 1995.    [13] C. Schmid, R. Mohr, and C. Bauckhage, “Comparing and evaluating interest points,” in International Conference on Computer Vision, 1998, pp. 230-235.    [14] C. Tomasi and J. Shi, “Good features to track,” in IEEE Computer Vision and Pattern Recognition, 1994, pp. 593-600.    [15] T. J. Cham and R. Cipolla, “A statistical framework for long-range feature matching in uncalibrated image mosaicing,” in IEEE Computer Vision and Pattern Recognition, 1998, pp. 442-447.    [16] F. Badra, A. Qumsieh, and G. Dudek, “Robust mosaicing using Zernike moments,” PRAI, vol. 13, no. 5, pp.I 685, August 1999.    [17] N. Ritter, R. Owens, J. Cooper, R H. Eikelboom, and P. P. Van Saarloos, “Registration of stereo and temporal images of the retina,” IEEE Trans. on Medical Imaging, vol. 18, no. 5, pp. 404-418, May 1999.