Constructing a photorealistic three-dimensional (3D) model from multiple two-dimensional (2D) images of a scene is a fundamental problem in computer vision and image-based modeling. The 3D model can then be used to reconstruct the scene from arbitrary viewpoints.
The emphasis for most prior art computer vision methods has been on automatic reconstruction of the scene with little or no user intervention. Consequently, computer vision methods make a priori assumptions about the geometry and reflectance in the scene. These assumptions are often incorrect.
In contrast, many image-based modeling systems require that a user directs the construction of the 3D model that is used as the scene representation. However, the specification of those 3D models can be difficult, and tools used to construct the models are often inflexible, i.e., they only deal with a very simple set of shapes, and thus, cannot be used for complex scenes.
A fundamental problem in 3D scene reconstruction is assigning correspondences between points in two or more images that are projections of the same point in the scene. Previous work uses pixels or object silhouettes to specify the point correspondences.
The problem of 3D model construction from images has received a tremendous amount of attention in the computer vision literature, see Kutulakos et al., “A theory of shape by space carving,” Intl. Journal of Computer Vision Vol. 38:3, pp. 199–218, 2000 for an overview.
Multi-View Stereo Reconstruction
Multi-view stereo methods construct 3D models by automatically determining pixel correspondences in multiple images. Stereo correspondence methods work well when the distance between different viewpoints, often called the baseline, is small.
This is especially true for a sequence of frames or video, where tracking matches correspondences between frames. To deal with large changes in viewpoints, some methods extract a partial 3D shape from a subset of images using multi-baseline stereo techniques.
However, to construct a single 3D model requires complex reconstruction and merge methods, and there is no guaranty on global consistency between the entire set of images and the merged model. Accurate point correspondences are difficult to determine in regions with homogeneous color and intensity.
View-dependent effects, such as specular highlights or reflections, lead to correspondence mismatches. Obtaining dense correspondence for many image points is especially hard. Differences between images due to occlusions are also difficult to handle. This is a severe problem for general 3D scene reconstruction where such occlusions happen frequently.
Shape-From-Silhouettes Reconstruction
Shape-from-silhouette methods construct the 3D model as an intersection of visual rays from the center of projection of the camera through all points in object silhouettes.
Shape-from-silhouette methods construct a shape known as a visual hull. However, those methods can never recover concavities. The visual hull can be improved by adding depth from multi-view stereo images, subject to some of the drawbacks mentioned above. Shape-from-silhouette methods fail for scenes with major occlusions, and they only work for outside-looking-in camera arrangements. That is, the scene must lie inside the convex hull of the cameras.
Therefore, it is desired to construct a 3D model with concavities, and to handles arbitrary scene occlusions and camera placements.
Photometric Reconstruction
Photometric constraints can be used to construct a 3D model that is demonstrably better than the visual hull. Voxel coloring can be used to gradually carve out voxels from a 3D volume that are not color-consistent with any image pixels to which the voxels project.
Voxel coloring has also been used with arbitrary camera placements, graphics hardware acceleration, and multiple color hypothesis for each voxel. However, voxel coloring does not work well for large scenes or objects with big differences in scale. Constructing a 3D model for interactive viewing requires a lengthy process that may introduce inconsistencies in the representation of the shape.
Fundamentally, all photometric approaches rely on a locally computable analytic model of reflectance. This assumption fails for global illumination effects such as shadows, transparency, or inter-reflections.
Because simultaneous recovery of surface shape, reflectance, and illumination is difficult, prior art photometric construction methods assume that surfaces are diffuse reflective or Lambertian, or nearly so. Therefore, photometric methods do not work for objects with a homogeneous surface color.
Therefore, it is desired to provide a user-assisted model construction method that overcomes the limitations of the prior art, and that can construct models of scenes containing highly specular, transparent, and uniformly colored objects.
Image-Based Reconstruction
Image-based modeling methods typically split the task of 3D model construction between the user and the computer. Some methods construct a model from a single image using user-defined billboards or user tools to paint depth images, edit the model, and change the illumination in images.
Despite the utility and versatility of those methods, specifying depth images for multiple images requires a substantial, if not an impractical amount of user input. More important, there is no guaranty that the resulting 3D model is globally consistent with all the input images.
It is a considerable advantage to use geometric constraints when the scene is known to have a certain structure. Some methods depend on user-guided placement of polyhedral primitives to reconstruct architectural scenes. Other methods exploit geometric characteristics of scenes with planes and parallel lines. However, those methods are generally not applicable for constructing 3D models of arbitrary scenes. In addition, those methods rely on lengthy optimization procedures.
Therefore, it is desired to use image-based modeling where a user assigns region correspondences so that a globally consistent model can be constructed without making any assumptions about the geometry or reflectance in the scene.