Three-dimensional models of physical scenes are required for a wide range of applications. These applications include virtual reality walk-throughs, architectural modeling, and computer graphics special effects. These scenes have been typically generated manually through tedious and time-consuming processes. Because of the difficult and expensive process of manually generating such models, a vast research effort has been underway to investigate image-based schemes for constructing the 3-D models. Image-based schemes have traditionally involved inferring the 3-D geometry of the physical scene from a plurality of 2-D photographs. One such approach is that of Kang, S. B. and Szeliski, R. (“3-D Scene Data Recovery Using Omnidirectional Multibaseline Stereo,” Int. Journal of Comp. Vision, 25(2), pp. 167–183, 1997). In this approach, a series of 2-D panoramic images is generated, and these 2-D panoramic images are used in a stereo vision sense to extract 3-D scene data. The extracted 3-D scene data is then integrated, and the panoramic images are texture-mapped onto the 3-D model.
The drawback of traditional image-based schemes for 3-D modeling is that they typically yield sparse 3-D scene data. This forces the user to make somewhat arbitrary assumptions about the 3-D structure of the scene prior to the texture-mapping step. For this reason, recent research has turned to range imaging systems to provide dense 3-D scene data for reconstruction. Such systems are capable of automatically sensing the distance to objects in a scene as well as the intensity of incident light. Both range and intensity information is typically captured discretely across a two-dimensional array of image pixels.
An example of such a system is found in U.S. Pat. No. 4,935,616 (and further described in the Sandia Lab News, vol. 46, No. 19, Sep. 16, 1994), which describes a scannerless range imaging system using either an amplitude-modulated high-power laser diode or an array of amplitude-modulated light emitting diodes (LEDs) to completely illuminate a target scene. A version of such a scannerless range imaging system that is capable of yielding color intensity images in addition to the 3-D range images is described in commonly assigned, copending U.S. patent application Ser. No. 09/572,522, entitled “Method and Apparatus for a Color Scannerless Range Imaging System” and filed May 17, 2000 in the names of L. A. Ray and L. R. Gabello. The scannerless range imaging system will hereafter be referred to as an “SRI camera”.
D. F. Huber describes a method (in “Automatic 3-D Modeling Using Range Images Obtained from Unknown Viewpoints,” Proc. of the Third International Conference on 3-D Digital Imaging and Modeling (3DIM), May 28–Jun. 1, 2001) requiring no manual intervention for 3-D reconstruction using a plurality of range images. Huber's algorithm for 3-D modeling generates a 3-D model from a series of range images, assuming nothing is known about the relative views of the object. It can be broken down into three phases: (1) determining which views contain overlaps, (2) determining the transformation between overlapping views, and (3) determining the global position of all views. Huber's method does not assume that the overlapping views are known; therefore, it does not require any prior information to be supplied by the user.
The first two steps of Huber's algorithm use a previous algorithm described in a Ph.D. Thesis by A. E. Johnson, entitled “Spin-Images: A Representation for 3-D Surface Matching,” Carnegie Melon University, 1997. Johnson presents a system that is capable of automatically registering and integrating overlapping range images to form a complete 3-D model of an object or scene. This system is fully automatic and does not require any a priori knowledge of the relative positions of the individual range images. Johnson's algorithm begins by converting each range image to a surface mesh. This is accomplished by triangulating adjoining range values that are within a difference threshold. Range differences that exceed this threshold are assumed to indicate surface discontinuities.
The next step in Johnson's algorithm (and step (2) of Huber's algorithm) is to determine the transformations that align the surface meshes within a common coordinate system. This is accomplished by identifying correspondences between the overlapping regions of the meshes. Johnson uses a technique based on matching “spin-image” surface representations to automatically identify the approximate location of these correspondence points. The coarse alignment of the surface meshes is then refined using a variation of an Iterative Closest Point algorithm (see Besl, P. and McKay, N., “A Method for Registration of 3-D Shapes,” IEEE Trans. Pattern Analysis and Machine Intelligence, 14(2), pp. 239–256, February 1992).
Once the overlapping views and local transformations are estimated, step (3) of Huber's algorithm entails using a series of consistency measures in combination with a model graph to find any inconsistencies in the local transformations. Huber recognizes, however, that there are computational costs in scaling his technique to a large number of views. For that reason, the computational cost of step (3) can grow prohibitively expensive as the number of input range images gets large.
In certain situations where assumptions can be made about the relative views of a collection of range images, we need not resort to Huber's algorithm for 3-D modeling and reconstruction. For example, if a series of overlapping range images are captured from different views that have a common central nodal point, they can be merged to form a 3-dimensional panorama (a 360° model of both the 3-D spatial and intensity information visible from that central nodal point). This model is typically derived by utilizing a range camera to capture a sequence of overlapping range images as the camera is rotated around the focal point of the camera lens. The 3-D spatial and intensity information from the sequence of images are merged together to form the final 360 degree 3-D panorama.
An example of such a 3-D panoramic system that yields sparse range images is described in commonly assigned, copending U.S. patent application Ser. No. 09/686,610, entitled “Method for Three Dimensional Spatial Panorama Formation” and filed Oct. 11, 2000 in the names of S. Chen and L. A. Ray. An example of a system that yields dense range images using a SRI camera is described in commonly assigned, copending U.S. patent application Ser. No. 09/803,802, entitled “Three Dimensional Spatial Panorama Formation with Scannerless Range Imaging System” and filed Mar. 12, 2001 in the names of by S. Chen and N. D. Cahill.
Three-dimensional panoramas provide a natural means for capturing and representing a model of an environment as seen from a given viewpoint. However, in order to model a complete environment, it is necessary to merge information collected from a variety of spatial locations. If, as described in the prior art, a collection of individual range images collected from arbitrary spatial positions and viewpoint orientations are used to model the complete environment, the cost of determining global positions for each range image can be extremely expensive, as previously discussed. What is needed is a technique to reduce this computational cost.