This invention relates to methods and apparatus for volumetric scene reconstruction from a set of input images of the scene and, more particularly, to methods and apparatus for reconstructing a high quality 3D model of the scene that is consistent with the input images.
Currently, there is a great deal of interest in image-based rendering techniques. These methods draw from the fields of computer graphics, computer vision, image processing and photogrammetry. The goal of these methods is to compute new views from two or more images of a scene, be they natural or synthetic. Several images of a scene are acquired from different camera viewpoints. The image data is used to create a three-dimensional (3D) computer model of the scene. The 3D model can then be used to compute one or more images of the scene from viewpoints that are different from the camera viewpoints. The new image is synthesized by projecting the 3D model to the desired viewpoint. These techniques may be referred to as xe2x80x9cnew view synthesis.xe2x80x9d A number of new view synthesis techniques have been disclosed in the prior art.
View morphing and light fields are solutions to the new view synthesis problem that do not create a 3D model as an intermediate step. View morphing is one of the simplest solutions to the problem. Given two images of a scene, it uses interpolation to create a new image intermediate in viewpoint between the input images. Because view morphing uses no 3D information about the scene, it cannot in general render images that are strictly correct, although the results often look convincing. Most obviously, the algorithm has limited means to correctly render objects in the scene that occlude one another.
Lumigraph and light field techniques use a sampling of the light radiated in every direction from every point on the surface of a volume. In theory, such a collection of data can produce nearly perfect new views. In practice, however, the large amount of input data required to synthesize high quality images is impractical to capture and store. Nevertheless, these methods have one advantage over nearly all competing approaches: they treat specular reflections correctly.
Stereo techniques find points in two or more input images that correspond to the same point in the scene. They then use knowledge of the camera locations and triangulation to determine the depth of the scene point. Unfortunately, stereo is difficult to apply to images taken from arbitrary viewpoints. If the input viewpoints are far apart, then corresponding image points are hard to find automatically. On the other hand, if the viewpoints are close together, then small measurement errors result in large errors in the calculated depths. Furthermore, stereo naturally produces a 2D depth map, and integrating many such maps into a true 3D model is a challenging problem.
Voxel coloring and space carving exploit a characteristic of lambertian surfaces: points on such surfaces are xe2x80x9ccolor consistentxe2x80x9d, i.e., they project onto similar colors in all the images from which they are visible. These methods start with an arbitrary number of calibrated images of the scene and a set of volume elements, or voxels, that is a superset of the scene. Each voxel is projected into all the images from which it is visible. If the voxel projects onto inconsistent colors in several images, it must not be on a surface, and so it is xe2x80x9ccarvedxe2x80x9d, i.e., declared to be transparent. Otherwise, the voxel is xe2x80x9ccoloredxe2x80x9d, i.e., declared to be opaque and assigned the color of its projections. These algorithms stop when all the opaque voxels project into consistent colors in the images. Because the final set of opaque voxels is color consistent, it is a good model of the scene.
Voxel coloring and space carving differ in the way they determine visibility, the knowledge of which voxels are visible from which pixels in the images. A voxel fails to be visible from an image if it projects outside the image or if it is blocked by other voxels that are currently considered to be opaque. When the opacity of a voxel changes, the visibility of other voxels potentially changes, so an efficient technique is needed to keep the visibility up-to-date.
Voxel coloring places constraints on the camera locations to simplify the visibility computation. It requires the cameras to be placed in such a way that the voxels can be processed, on a single scan, in front-to-back order relative to every camera. Typically, this condition is met by placing all the cameras on one side of the scene and scanning voxels in planes that are successively farther from the cameras. Thus, the transparency of all voxels that might occlude a given voxel is determined before the given voxel is checked for color consistency. Although it simplifies the visibility computation, the restriction on camera locations is a significant limitation. For example, the cameras cannot surround the scene, so some surfaces will not be visible in any image and hence cannot be reconstructed.
Space carving removes the restriction on camera locations. With the cameras placed arbitrarily, no single scan of the voxels, regardless of its order, will enable each voxel""s visibility in the final model (and hence its color consistency) to be computed correctly. Algorithms have been designed that evaluate the consistency of voxels multiple times during carving, using changing and incomplete visibility information, and yet yield a color consistent reconstruction at the end. Space carving initially considers all voxels to be opaque and only changes voxels to transparent, never the reverse. Consequently, as some voxels are carved, the remaining opaque voxels can only become more visible from the images. The consistency function is assumed to be monotonic, meaning that for any two sets of pixels S and Sxe2x80x2, where S is a subset of Sxe2x80x2, if S is inconsistent, then Sxe2x80x2 is also inconsistent. Given that the visibility of a voxel only increases as the algorithm runs and the consistency function is monotonic, it follows that carving is conservative, and no voxel will ever be carved if it would be color consistent in the final model.
Space carving scans voxels for color consistency similarly to voxel coloring, evaluating a plane of voxels at a time. It forces the scans to be front-to-back, relative to the cameras, by using only images whose cameras are currently behind the moving plane. Thus, when a voxel is evaluated, the transparency is already known of other voxels that might occlude it from the cameras currently being used. Unlike voxel coloring, space carving uses multiple scans, typically along the positive and negative directions of each of the three axes. Because carving is conservative, the set of opaque voxels is a shrinking superset of the desired color consistent model as the algorithm runs.
While space carving never carves voxels that it should not, it is likely to produce a model that includes some color inconsistent voxels. During scanning, cameras that are ahead of the moving plane are not used for color consistency checking, even when the voxels being checked are visible from those cameras. Hence, the color consistency of a voxel is, in general, never checked over the entire set of images from which it is visible.
Accordingly, there is a need for improved methods and apparatus for reconstructing a 3D model of a scene from calibrated images of the scene.
According to a first aspect of the invention, methods and apparatus are provided for reconstructing a three-dimensional model of a scene from a plurality of images of the scene taken from different viewpoints. The method comprises the steps of a) defining a reconstruction volume comprising a set of voxels that include the scene, b) initializing a surface voxel list (SVL) that includes uncarved voxels on the surface of the reconstruction volume, c) creating item buffers that contain, for each pixel in each of the images, the ID of the closest voxel that projects onto the pixel, d) processing each voxel V in the SVL, e) if any voxels have been carved since the step of creating the item buffers, repeating steps c) and d), and f) if no voxels have been carved since the step of creating the item buffers, saving the SVL. Each voxel V in the SVL is processed by d1) computing the set of pixels vis(V) in all the images from which voxel V is visible, d2) determining the color consistency of vis(V), and d3) if vis(V) is not color consistent, i) carving voxel V, ii) removing voxel V from the SVL, and iii) adding to the SVL all uncarved voxels and that are adjacent to voxel V and that are not on the SVL.
According to another aspect of the invention, methods and apparatus are provided for reconstructing a three-dimensional model of a scene from a plurality of images of the scene taken from different viewpoints. The method comprises the steps of a) defining a reconstruction volume comprising a set of voxels that include the scene, b) initializing a surface voxel list (SVL) that includes uncarved voxels on the surface of the reconstruction volume, c) creating layered depth images (LDIs) that contain, for each pixel in each of the images, a list of all surface voxels that project onto the pixel, sorted according to distance from the voxel to the image""s camera, d) copying the SVL to a changed visibility SVL (CVSVL), e) processing voxels in the CVSVL, and f) when the CVSVL is empty, saving the SVL. The processing of voxels in the CVSVL includes, while there are voxels in the CVSVL: e1) selecting a voxel V on the CVSVL and removing voxel V from the CVSVL, e2) computing the set of pixels vis(V) in all the images from which voxel V is visible, e3) determining the color consistency of vis(V), and e4) if vis(V) is not color consistent, i) carving voxel V, ii) removing voxel V from the SVL, iii) deleting voxel V from the LDIs, iv) if voxel V is at the head of LDI(P), adding the next voxel on LDI(P) to the CVSVL, and v) processing each uncarved voxel N that is adjacent to voxel V by adding voxel N to the CVSVL if its visibility has changed and inserting voxel N into the LDIs.