We consider the problem of acquiring photo-realistic 3D models of real environments from widely distributed viewpoints. This problem has sparked recent interest in the computer vision community [Kanade et al., 1995, Moezzi et al., 1996, Beardsley et al., 1996, Leymarie et al., 1996] as a result of new applications in telepresence, virtual walkthroughs, automatic 3D model construction, and other problems that require realistic textured object models.
We use the term photorealism to refer to 3D reconstructions of real scenes whose reprojections contain sufficient color and texture information to accurately reproduce images of the scene from a wide range of target viewpoints. To ensure accurate reprojections, the input images should be representative, i.e., distributed throughout the target range of viewpoints. Accordingly, we propose two criteria that a photorealistic reconstruction technique should satisfy:
Photo Integrity: The reprojected model should accurately reproduce the color and texture of the input images.
Broad Coverage: The input images should be widely distributed throughout the environment, enabling a wide coverage of scene surfaces.
Instead of using existing stereo and structure-from-motion methods to solve this problem, we choose to approach it from first principles. We are motivated by the fact that current reconstruction techniques were not designed with these objectives in mind and, as we will argue, do not fully meet these requirements. Driven by the belief that photo integrity has more to do with color than shape, we formulate a color reconstruction problem, in which the goal is an assignment of colors (radiances) to points in an (unknown) approximately Lambertian scene. It is shown that certain points have an invariant coloring, constant across all possible interpretations of the scene, consistent with the input images. This leads to a volumetric voxel coloring method that labels the invariant scene voxels based on their projected correlation with the input images. By traversing the voxels in a special order it is possible to fully account for occlusionsxe2x80x94a major advantage of this scene-based approach. The result is a complete 3D scene reconstruction, built to maximize photo integrity.
The photorealistic scene reconstruction problem, as presently formulated, raises a number of unique challenges that push the limits of existing techniques. First, the reconstructions must be dense and sufficiently accurate to reproduce the original images. This requirement poses a problem for feature-based reconstruction methods, which product relatively sparse reconstructions. Although sparse reconstructions can be augmented by fitting surfaces (e.g., [Beardsley et al., 1996]), the triangulation techniques currently used cannot easily cope with discontinuities and, more importantly, are not image driven. Consequently, surfaces derived from sparse reconstructions may only agree with the input images at points where image features were detected.
Contour-based methods (e.g., [Cipolla and Blake, 1992, Szeliski, 1993, Seales and Faugeras, 1995]) are attractive in their ability to cope with changes in visibility, but do not produce sufficiently accurate depth-maps due to problems with concavities and lack of parallax information. A purely contour-based reconstruction can be texture-mapped, as in [Moezzi et al., 1996], but not in a way that ensures projected consistency with all of the input images, due to the aforementioned problems. In addition, contour-based methods require occluding contours to be isolated; a difficult segmentation problem avoided by voxel coloring.
The second objective requires that the input views be scattered over a wide area and therefore exhibit large scale changes in visibility (i.e., occlusions, changing field of view). While some stereo methods can cope with limited occlusions, visibility changes of much greater magnitude appear to be beyond the state of the art. In addition, the views may be far apart, making the correspondence problem extremely difficult. Existing stereo-based approaches to this problem [Kanade et al., 1995] match nearby images two or three at a time to ameliorate visibility problems. This approach, however, does not fully integrate the image information and introduces new complications, such as how to merge the partial reconstructions.
The voxel coloring algorithm presented here works by discretizing scene space into a set of voxels that are traversed and colored in a special order. In this respect, the method is similar to Collins"" Space-Sweep approach [Collins, 1996], which performs an analogous scene traversal. However, the Space-Sweep algorithm doe snot provide a solution to the occlusion problem, a primary contribution of this paper. Katayama et al. [Katayama et al., 1995] described a related method in which images are matched by detecting lines through slices of an epipolar volume, noting that occlusions could be explained by labeling lines in order of increasing slope. This ordering is consistent with our results, following from the derivations in Section 2. However, their algorithm used a reference image, thereby ignoring points that are occluded in the reference image but visible elsewhere. Also, their line detection strategy requires that the views all lie on a straight line, a significant limitation. An image-space visibility ordering was described by McMillan and Bishop [McMillan and Bishop, 1995, Kang and Szeliski, 1996] algorithms that avoid field of view problems by matching 360 degree panoramic views directly. Panoramic reconstructions can also be achieved using our approach, but without the need to first build panoramic images (see FIGS. 1(b) and 4).
The remainder of the paper is organized as follows. Section 2 formulates and solves the voxel coloring problem, and describes its relationship to shape reconstruction. Section 3 presents an efficient algorithm for computing the voxel coloring from a set of images. Section 4 describes some experiments on real and synthetic image sequences that demonstrate how the method performs.