In recent years, applications involving three-dimensional (3D) computer models of objects or scenes have been becoming increasingly common. For example, 3D models are commonly used to create computer generated imagery for entertainment applications such as motion pictures and computer games. The computer generated imagery may be viewed in a conventional two-dimensional format, or may alternatively be viewed using stereographic imaging systems. 3D models are also used in many medical imaging applications. For example, 3D models of a human body can be produced from images captured using various types of imaging devices such as CT scanners. The formation of 3D models can also be valuable to provide information useful for image understanding applications. The 3D information can be used to aid in operations such as object recognition, object tracking and image segmentation.
There are a number of different methods that have been developed for building a 3D model of a scene or an object. Some methods for forming 3D models of an object or a scene involve capturing a pair of conventional two-dimensional images from two different viewpoints. Corresponding features in the two captured images can be identified and range information (i.e., depth information) can be determined from the disparity between the positions of the corresponding features. Range values for the remaining points can be estimated by interpolating between the ranges for the determined points. A range map is a form of a 3D model which provides a set of z values for an array of (x,y) positions relative to a particular viewpoint. An algorithm of this type is described in the article “Developing 3D viewing model from 2D stereo pair with its occlusion ratio” by Johari et al. (International Journal of Image Processing, Vol. 4, pp. 251-262, 2010).
Other methods for building a 3D model of a scene or an object involve projecting a structured lighting pattern (e.g., a line, a grid or a periodic pattern) onto the surface of an object from a first direction, and then capturing an image of the object from a different direction. For example, see the articles “Model and algorithms for point cloud construction using digital projection patterns” by Peng et al. (ASME Journal of Computing and Information Science in Engineering, Vol. 7, pp. 372-381, 2007) and “Real-time 3D shape measurement with digital stripe projection by Texas Instruments micromirror devices (DMD)” by Frankowski et al. (Proc. SPIE, Vol. 3958, pp. 90-106, 2000). With such approaches, range information can be inferred from distortions in the pattern of the structured lighting due to parallax effects. Typically these methods capture one or more images of an object from a particular viewpoint. Consequently, the resulting 3D model will be incomplete because no information is available regarding the back side of any objects in the captured images. Other variations involve projecting a single vertical line onto an object then rotating the object through a range of angles to construct a 3D model of the object one stripe at a time. While this method can provide a complete 3D model for the object, it has the disadvantage that the object must be of a size and shape that it can be conveniently placed on a rotation stage.
Another method for forming 3D models is known as structure from motion. This method involves capturing a video sequence of a scene from a moving viewpoint. For example, see the article “Shape and motion from image streams under orthography: a factorization method” by Tomasi et al. (Int. J. of Computer Vision, Vol. 9, pp. 137-154, 1992). With structure from motion methods, the 3D positions of image features are determined by analyzing a set of image feature trajectories which track feature position as a function of time. The article “Structure from Motion without Correspondence” by Dellaert et al. (IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2000) teaches a method for extending the structure in motion approach so that the 3D positions can be determined without the need to identify corresponding features in the sequence of images. Structure from motion methods generally do not provide a high quality 3D model due to the fact that the set of corresponding features that can be identified are typically quite sparse.
Another method for forming 3D models of objects involves the use of “time of flight cameras.” Time of flight cameras infer range information based on the time it takes for a beam of reflected light to be returned from an object. One such method is described by Gokturk et al. in the article “A time-of-flight depth sensor—system description, issues, and solutions” (Proc. Computer Vision and Pattern Recognition Workshop, 2004). Range information determined using these methods is generally low in resolution (e.g., 128×128 pixels).
Most techniques for generating 3D models from 2D images produce incomplete 3D models due to the fact that no information is available regarding the back side of any objects in the captured images. Additional 2D images can be captured from additional viewpoints to provide information about portions of the objects that may be occluded from a single viewpoint. However, combining the range information determined from the different viewpoints is a difficult problem.
A variety of 3D imaging techniques have been developed for medical imaging applications such as computed tomography (CT). These methods typically determine an image of a slice through a 3D object. A series of slices can then be combined to construct 3D (volumetric) models of the objects. Such methods require complex and expensive equipment and are not practical for consumer applications.
U.S. Pat. No. 7,551,760 to Scharlack et al., entitled “Registration of 3D imaging of 3D objects,” teaches a method to register 3D models of dental structures. The 3D models are formed from two different perspectives using a 3D scanner. The two models are aligned based on the locations of recognition objects having a known geometry (e.g., small spheres having known sizes and positions) that are placed in proximity to the object being scanned.
U.S. Pat. No. 7,801,708 to Unal et al., entitled “Method and apparatus for the rigid and non-rigid registration of 3D shapes,” teaches a method for registering two 3D shapes representing ear impression models. The method works by minimizing a function representing an energy between signed distance functions created from the two ear impression models.
U.S. Patent Application Publication 2009/0232355 to Minear et al., entitled “Registration of 3D point cloud data using eigenanalysis,” teaches a method for registering multiple frames of 3D point cloud data captured from different perspectives. The method includes a coarse registration step based on finding centroids of blob-like objects in the scene. A fine registration step is used to refine the coarse registration by applying an iterative optimization method.
There remains a need for a simple and robust method for forming 3D models based on two or more images captured from different viewpoints, each image including a two-dimensional image together with a corresponding range map.