This invention relates generally to methods and apparatus for generating displayable digital models of physical objects, and in particular relates to such methods and apparatus that operate based on object surface scan and image data.
The creation of three-dimensional digital content by scanning real objects has become common practice in graphics applications for which visual quality is paramount, such as animation, e-commerce, and virtual museums. While a significant amount of attention has been devoted to the problem of accurately capturing the geometry of scanned objects, the acquisition of high-quality textures is equally important, but not as widely studied.
Three-dimensional scanners are used increasingly to capture digital models of objects for animation, virtual reality, and e-commerce applications for which the central concerns are efficient representation for interactivity and high visual quality.
Most high-end 3D scanners sample the surface of the target object at a very high resolution. Hence, models created from the scanned data are often over-tesselated, and require significant simplification before they can be used for visualization or modeling. Texture data is often acquired together with the geometry, however a typical system merely captures a collection of images containing the particular lighting conditions at the time of scanning. When these images are stitched together, discontinuity artifacts are usually visible. Moreover, it is rather difficult to simulate various lighting conditions realistically, or to immerse the model in a new environment.
A variety of techniques can be used to capture digital models of physical objects, including CAT scans and structure from motion applied to video sequences. The following description has been restricted for convenience to techniques involving instruments that capture range images (in which each pixel value represents depth) and intensity images (in which each pixel is proportional to the incident light). A detailed summary of such methods can be found in G. Roth, xe2x80x9cBuilding models from sensor data:an application shared by the computer vision and computer graphics communityxe2x80x9d, In Proc. of the NATO Workshop on the Confluence of Computer Vision and Computer Graphics, 2000.
The basic operations necessary to create a digital model from a series of captured images are illustrated in FIG. 1. After outliers are removed from the range images, they are in the form of individual height-field meshes. Step A is to align these meshes into a single global coordinate system. In high-end systems registration may be performed by accurate tracking. For instance, the scanner may be attached to a coordinate measurement machine that tracks its position and orientation with a high degree of accuracy. In less expensive systems an initial registration is found by scanning on a turntable, manual alignment, or approximate feature matching. The alignment is then refined automatically using techniques such as the Iterative Closest Point (ICP) algorithm of Besl and McKay.
After registration, scans do not form a single surface, but interpenetrate one another, due to acquisition errors primarily along the line-of-sight in each scan. To form a single surface, in step B the overlapping scans must be averaged. In stitching/zippering methods this averaging is performed between pairs of overlapping meshes. In volumetric/occupancy grid methods line-of-sight errors are averaged by letting all scanned points contribute to a function of surface probability defined on a single volume grid. An advantage of volumetric methods is that all scans representing a surface point influence the final result, rather than simply a pair of scans.
In step B the scans are integrated into a single mesh. The integration may be performed by zippering/stitching, isosurface extraction from volumes, or interpolating mesh algorithms applied to error-corrected points.
To use a texture map with the integrated mesh, in step C the surface is parameterized with respect to a 2D coordinate system and texture coordinates are interpolated between mesh vertices. A simple parameterization is to treat each triangle separately and to pack all of the individual texture maps into a larger texture image. However, the use of mip-mapping in this case is limited since adjacent pixels in the texture may not correspond to adjacent points on the geometry. Another approach is to locate patches of geometry which are height fields that can be parameterized by projecting the patch onto a plane. Stitching methods use this approach by simply considering sections of the scanned height fields as patches.
Other methods could be built on tiling methods developed for multiresolution analysis or interactive texture mapping.
Parallel to acquiring the geometry of the model, intensity images are captured to obtain information about the reflectance of the surface. Such images may be recorded with electronic or traditional cameras, or by using polychromatic laser technology. In step D, these images are aligned to the corresponding geometry. In some cases the image acquisition is decoupled from the geometry acquisition. The camera intrinsic and extrinsic parameters for the images are estimated by manual or automatic feature matching. The advantage is that acquisition modalities that cannot capture surface reflectance can be used for capturing geometry.
In most cases, however, the alignment is performed by calibration. Geometry and intensity are captured simultaneously from scanners with a measured transformation between sensing devices. The resolution of the intensity image may be the same as that of the range image or even higher. When the resolution is the same, texture mapping is unnecessary since a color can be assigned to each vertex. Nevertheless, such a representation is inefficient, and geometric simplification is typically performed before the surface parameterization step.
The main benefit of obtaining intensity and range images simultaneously is that the intensity information can be used in the registration process in step A. Various approaches have been developed to use intensity images in registration. For example, it is known to use color as an additional coordinate in the ICP optimization. This avoids local minima in the solution in areas that have no geometric features, but have significant variations in the intensity. For models with pronounced geometric and intensity features, the method has proven to be very effective. A drawback is having to combine position and color data with different ranges and error characteristics. For subtle feature variations, these can cause one type of data to erroneously overwhelm the other.
It is also known to use intensity images to avoid the spatial search required by ICP. Intensity and intensity gradient images from approximately aligned scans are transformed into a common camera view. Locations of corresponding points on overlapping scans are inferred based on the difference between intensity values at a given pixel and the gradient at that pixel. This method works well only if the spatial variation of the gradient is small relative to errors in the alignment of the scans.
It is also known to present a non-ICP method for using intensity images to refine an initial manual alignment. In this approach pairs of range images are aligned manually by marking three points on overlapping intensity images. The locations of the matching points are refined by searching their immediate neighborhoods with image cross-correlation. A least-squares optimization follows to determine a general 3D transformation that minimizes the distances between the point pairs. Image registration techniques are also used for image mosaics in which only rotations or translations are considered.
After the intensity images are aligned to the geometry, illumination invariant maps are computed to estimate the surface reflectance (step E). The number of scans versus the number of intensity images, as well as the resolution of the scans compared to the resolution of the images are considered at this stage. For a small number of scans and a large number of intensity images obtained under calibrated lighting conditions, a full Bidirectional Reflectance Distribution Function (BRDF) can be estimated.
If many scans are required to represent an object, and only a few high-resolution intensity images are captured per scan, photometric stereo techniques can be used to estimate Lambertian reflectance. Alternatively, if the range and intensity images have the same resolution, the geometry can be used to compute reflectance from a single image.
In step F the final texture is reconstructed. The illumination invariant maps are projected onto the integrated, parametrized surfaces. The main concerns at this step are that the final texture is as sharp as the best input images, that seams between scans or height-field patches are not visible, and that all information available is fully exploited to maximize the signal-to-noise ratio.
To maintain sharpness, a stitching approach has been proposed that uses a single illumination invariant map at any given surface point. Continuity in sharp features between adjoining maps is maintained by a local texture adjustment at texture boundaries. However, this approach requires high-quality input maps that have no visible noise and no scan-to-scan chromatic differences. Map adjustment techniques such as this, as well as de-ghosting methods for image mosaics, decouple texture from geometric variations. This may cause noticeable artifacts when these variations are correlated (e.g., dents and scratches that reveal underlying material with different reflectance properties.)
To avoid jumps in color appearance and to reduce noise, it is known to combine information from multiple overlapping scans. In this case, however, if texture alignment is imperfect then blurring or ghosting artifacts may be generated.
Reference can be had to K. Pulli, xe2x80x9cSurface reconstruction and display from range and color dataxe2x80x9d, PhD Thesis, Dept. of Computer Science and Engineering, Univ. of Washington, December 1997.
In general, this approach uses intensity images to pair points from overlapping scans. With the assumption that two scans are already roughly aligned, Pulli""s method starts by rendering the second scan, textured with its own intensity image, from the viewpoint of the first image. A planar perspective warping of the first image is then computed to match the rendered image of the second scan. For each corresponding pixel of the two images, under the computed transformation, a pair of points from the two scans is generated. A least-squares optimization is then performed to compute a registration matrix. The process is iterated until a convergence criterion is satisfied. Pulli also discusses an extension for multi-view registration.
Reference may also be made to K. Pulli et al., xe2x80x9cAcquisition and Visualization of Colored 3D Objectsxe2x80x9d, ICPR 1998, for a description of a system for scanning geometry and the surface color. The data is registered and a surface that approximates the data is constructed. It is said that the surface estimate can be fairly coarse, as the appearance of fine detail is recreated by view-dependent texturing of the surface using color images. This process uses three different weights (directional, sampling quality and feathering) when averaging together the colors of compatible rays.
Pulli et al. do not explicitly form texture images associated with geometry, but propose a dynamic, view-dependent texturing algorithm which determines a subset of the original images taken from a view direction that is close to the current view, and the synthesis of new color images from the model geometry and input images.
Based on the foregoing, it can be readily appreciated that a need exists for improved methods to construct accurate digital models of multi-scanned objects, in particular digital models that exhibit high-quality texture.
It is a first object and advantage of this invention to provide an improved method and apparatus for constructing accurate digital models that exhibit high-quality surface texture.
It is a further object and advantage of this invention to provide an improved method and apparatus for constructing, from object scan data, an accurate digital model of the object that exhibits high-quality surface texture.
The foregoing and other problems are overcome and the objects of the invention are realized by methods and apparatus in accordance with embodiments of this invention.
Disclosed herein are methods to construct accurate digital models of scanned objects by integrating high-quality texture and normal maps with geometric data. These methods can be used with inexpensive, electronic camera-based systems in which low-resolution range images and high-resolution intensity images are acquired. The resulting models are well-suited for interactive rendering on the latest-generation graphics hardware with support for bump mapping. In general, bump mapping refers to encoding small scale geometry in an image. The large scale geometry contains pointers to the bump map image that are used by the computer graphics hardware to display both large and small scale geometry.
The inventive methods provide techniques for processing range, albedo, and surface normal data, for image-based registration of scans, and for reconstructing high-quality textures for the output digital object.
The scanning system used during the execution of the methods described herein was equipped with a high-resolution digital color camera that acquires intensity images under controlled lighting conditions. Detailed normal and albedo maps of the surface are computed based on these images. By comparison, geometry is captured at lower resolution, typically at a resolution that is sufficient to resolve only the major shape features.
The benefits of such a system are twofold. First, it allows for the use of relatively inexpensive hardware by eliminating the nee d for dense geometric sampling, and by taking advantage of digital color cameras that are quickly gaining in resolution while dropping in price. Second, the generated models are more readily usable in a visualization or modeling environment that exploits the hardware-assisted bump mapping feature increasingly available in commercial-grade 3D accelerators.
In general, the issue of acquiring and reconstructing high-quality texture maps has received less attention than the issue of capturing high-quality geometry. The inventors have built upon existing techniques developed for texture acquisition, reconstruction, and image registration to generate maps of high visual quality for the scanned objects. Particularly because the noise and inaccuracies of a lower-cost scanner are greater than those of high-end, more expensive systems, it is desirable to exploit in full all of the geometric and image information acquired to improve the visual quality of the final representation.
A novel texture reconstruction framework is disclosed that uses illumination-invariant albedo and normal maps derived from calibration-registered range and intensity images. The albedo maps are used in a unique way to refine a geometry-only registration of the individual range images. After the range data is integrated into a single mesh, the resulting object is partitioned into a set of height-field patches. New textures are synthesized by projecting the maps onto the patches and combining the best data available at each point using weights that reflect the level of confidence in the data. The weighted averaging lowers noise present in the images, while the fine registration avoids blurring, ghosting, and loss of fine texture details.