This invention relates generally to methods and apparatus for processing image data and, more particularly, relates to methods and apparatus for computing surface normals from a plurality of sets of photometric data.
The creation of three-dimensional digital content by scanning real objects has become common practice in graphics applications for which visual quality is paramount, such as animation, e-commerce, and virtual museums. A significant amount of attention has been devoted to the problem of accurately capturing the geometry of scanned objects.
Three-dimensional scanners are used increasingly to capture digital models of objects for animation, virtual reality, and e-commerce applications for which the central concerns are efficient representation for interactivity and high visual quality.
Most high-end 3D scanners sample the surface of the target object at a very high resolution. Hence, models created from the scanned data are often over-tesselated, and require significant simplification before they can be used for visualization or modeling.
In general, a variety of techniques can be used to capture digital models of physical objects, including CAT scans and structure from motion applied to video sequences. The following description has been restricted for convenience to techniques involving instruments that capture range images (in which each pixel value represents depth) and intensity images (in which each pixel is proportional to the incident light). A detailed summary of such methods can be found in G. Roth, xe2x80x9cBuilding models from sensor data:an application shared by the computer vision and computer graphics communityxe2x80x9d, In Proc. of the NATO Workshop on the Confluence of Computer Vision and Computer Graphics, 2000.
The basic operations necessary to create a digital model from a series of captured images are as follows. After outliers are removed from the range images, they are in the form of individual height-field meshes. A first step aligns these meshes into a single global coordinate system. In high-end systems registration may be performed by accurate tracking. For instance, the scanner may be attached to a coordinate measurement machine that tracks its position and orientation with a high degree of accuracy. In less expensive systems an initial registration is found by scanning on a turntable, manual alignment, or approximate feature matching. The alignment is then refined automatically using techniques such as the Iterative Closest Point (ICP) algorithm of Besl and McKay.
After registration, scans do not form a single surface, but interpenetrate one another, due to acquisition errors primarily along the line-of-sight in each scan. To form a single surface, in a next step overlapping scans are averaged. In stitching/zippering methods this averaging is performed between pairs of overlapping meshes. In volumetric/occupancy grid methods line-of-sight errors are averaged by letting all scanned points contribute to a function of surface probability defined on a single volume grid. An advantage of volumetric methods is that all scans representing a surface point influence the final result, rather than simply a pair of scans.
In this step the scans are integrated into a single mesh. The integration may be performed by zippering/stitching, isosurface extraction from volumes, or interpolating mesh algorithms applied to error-corrected points.
If a texture map is to be used with the integrated mesh, in a next step the surface is parameterized with respect to a 2D coordinate system and texture coordinates are interpolated between mesh vertices. A simple parameterization is to treat each triangle separately and to pack all of the individual texture maps into a larger texture image. However, the use of mip-mapping in this case is limited since adjacent pixels in the texture may not correspond to adjacent points on the geometry. Another approach is to locate patches of geometry which are height fields that can be parameterized by projecting the patch onto a plane. Stitching methods use this approach by simply considering sections of the scanned height fields as patches. Other methods could be built on tiling methods developed for multiresolution analysis or interactive texture mapping.
Parallel to acquiring the geometry of the model, intensity images are captured to obtain information about the reflectance of the surface. Such images may be recorded with electronic or traditional cameras, or by using polychromatic laser technology. In a next step these images are aligned to the corresponding geometry. In some cases the image acquisition is decoupled from the geometry acquisition. The camera intrinsic and extrinsic parameters for the images are estimated by manual or automatic feature matching. The advantage is that acquisition modalities that cannot capture surface reflectance can be used for capturing geometry.
In most cases, however, the alignment is performed by calibration. Geometry and intensity are captured simultaneously from scanners with a measured transformation between sensing devices. The resolution of the intensity image may be the same as that of the range image or even higher.
One benefit of obtaining intensity and range images simultaneously is that the intensity information can be used in the registration process in the first step described above. Various approaches have been developed to use intensity images in registration. For example, it is known to use color as an additional coordinate in the ICP optimization. This avoids local minima in the solution in areas that have no geometric features, but have significant variations in the intensity. For models with pronounced geometric and intensity features, the method has proven to be very effective. A drawback is having to combine position and color data with different ranges and error characteristics. For subtle feature variations, these can cause one type of data to erroneously overwhelm the other.
It is also known to use intensity images to avoid the spatial search required by ICP. Intensity and intensity gradient images from approximately aligned scans are transformed into a common camera view. Locations of corresponding points on overlapping scans are inferred based on the difference between intensity values at a given pixel and the gradient at that pixel. This method works well only if the spatial variation of the gradient is small relative to errors in the alignment of the scans.
It is also known to employ a non-ICP method for using intensity images to refine an initial manual alignment. In this approach pairs of range images are aligned manually by marking three points on overlapping intensity images. The locations of the matching points are refined by searching their immediate neighborhoods with image cross-correlation. A least-squares optimization follows to determine a general 3D transformation that minimizes the distances between the point pairs.
Image registration techniques are also used for image mosaics in which only rotations or translations are considered.
After the intensity images are aligned to the geometry, illumination invariant maps are computed to estimate the surface reflectance. The number of scans versus the number of intensity images, as well as the resolution of the scans compared to the resolution of the images are considered at this stage. For a small number of scans and a large number of intensity images obtained under calibrated lighting conditions, a full Bidirectional Reflectance Distribution Function (BRDF) can be estimated.
If many scans are required to represent an object, and only a few high-resolution intensity images are captured per scan, photometric stereo techniques can be used to estimate Lambertian reflectance. Alternatively, if the range and intensity images have the same resolution, the geometry can be used to compute reflectance from a single image.
Problems arise when it is desired to employ a low complexity, low cost image capture system. In such a system variations in lighting and positioning result in slight but discernible variations in photometric results from mesh to mesh. This can result in the generation of surface normal maps that are inaccurate.
More particularly, high spatial frequency detail in the object model can be represented by normals maps, which are arrays of data in which each entry is a vector representing surface orientation. Low spatial frequency models of objects can be obtained using inexpensive camera-based systems, while high spatial frequency details in the form of normals maps can be obtained using photometric stereo. However, many maps are required to cover the surface of a typical object, and uncertainties in the physical parameters of the photometric stereo system result in normals maps which are not consistent with one another. As a result, when the normals maps are combined into a single representation of the object visible artifacts are apparent in the seams between the maps.
Based on the foregoing, it can be readily appreciated that a need exists for improved methods to construct accurate digital models of multi-scanned objects, in particular digital models that exhibit high-quality surface normals, and that eliminate visible artifacts that appear in the object model.
It is a first object and advantage of this invention to provide an improved method and system for constructing accurate digital models of multi-scanned objects, in particular digital models that exhibit high-quality surface normals.
It is a further object and advantage of this invention to provide a method and system for computing surface normals from multiple sets of photometric data, that are consistent with each other and with an underlying lower resolution mesh.
It is another object and advantage of this invention to provide a method and system wherein surface normals are computed by locally adjusting light source intensities using data from the underlying mesh.
It is one further object and advantage of this invention to provide a hybrid multiview/photometric method and system for scanning objects, wherein a base geometric model is obtained, and wherein a photometric system is used to obtain surface normals at a higher spatial resolution than the underlying base geometric model, where the surface normals are computed such that the results are consistent over the object, and also with the underlying base geometric model.
The foregoing and other problems are overcome and the objects and advantages of the invention are realized by methods and apparatus in accordance with embodiments of this invention.
A method is disclosed for computing normals from multiple sets of photometric data, that are consistent with each other and an underlying lower resolution mesh. Normals are computed by locally adjusting light source intensities using data from the underlying mesh.
This invention pertains generally to scanning objects that are large relative to a smallest geometric level of detail to be represented. Three dimensional models of the objects are obtained by combining the results of a number (e.g., hundreds) of individual scans. The inventors have developed a hybrid multiview/photometric method for scanning such objects. A multiview light striping system is used to obtain a base geometric model, and a photometric system is used to obtain surface normals at a higher spatial resolution than the underlying geometric model. This invention provides a technique for computing the normals so that the results are consistent over the object, and also with the underlying base geometry.
The teachings of this invention employ a low spatial resolution numerical representation of an object, such as one in the form of a triangular mesh, and combines information from the low spatial resolution mesh representation with images from the photometric system. The normals computed using the low spatial resolution mesh representation are consistent with the low spatial resolution mesh representation, and are thus consistent with one another. The low resolution representation is used to estimate the distance from the camera used to produce the photometric images, and is also used to adjust the relative light source intensities in the photometric system. Using the low resolution mesh representation, uncertainties in the parameters of the photometric system, specifically distances and light source intensities, are thereby corrected.