In the field of computer graphics and computer vision, there is a need to build accurate three-dimensional (3D) models that can be used in virtual reality walk-through, animation, solid modeling, visualization, multimedia, and object detection and recognition.
Three-dimensional digitizers are frequently used to generate models from real world objects. Considerations of resolution, repeatability, accuracy, reliability, speed, and ease of use, as well as overall system cost, are central to the construction of any digitizing system. Often, the design of a digitizing system involves a series of trade-offs between quality and performance.
Traditional 3D dimensional digitizers have focused on geometric quality measures for evaluating system performance. While such measures are objective, they are only indirectly related to an overall goal of a high quality rendition. In most 3D digitizer systems, the rendering quality of the models is largely a result of range accuracy in combination with the number of images acquired of the object.
Prior art digitizers include contact digitizers, active structured-light range-imaging systems, and passive stereo depth-extraction. For a survey, see Besl “Active Optical Range Imaging Sensors,” Advances in Machine Vision, Springer-Verlag, pp. 1-63, 1989.
Laser triangulation and time-of-flight point digitizers are other popular active digitizing approaches. Laser ranging systems often require a separate position-registration step to align separately acquired scanned range images. Because active digitizers emit light onto the object being digitized, it is difficult to capture both texture and shape information simultaneously. This introduces the problem of registering the range images with textures.
In other systems, multiple narrow-band illuminates, e.g., red, green, and blue lasers, are used to acquire a surface color estimate along lines-of-sight. However, this is not useful for capturing objects in realistic illumination environments.
Passive digitizers can be based on single cameras or stereo cameras. Passive digitizers have the advantage that the source images can be used to acquire both shape and texture, unless the object has insufficient texture.
Image-based rendering systems can also be used, see Nishino, K., Y. Sato, and K. Ikeuchi, “Eigen-Texture Method: Appearance Compression based on 3D Model,” Proc. of Computer Vision and Pattern Recognition, 1:618-624, 1999, and Pulli, K., M. Cohen, T. Duchamp, H. Hoppe, L. Shapiro, and W. Stuetzle, “View-based Rendering: Visualizing Real Objects from Scanned Range and Color Data,” Proceedings of the 8th Eurographics Workshop on Rendering, pp. 23-34, 1997. In these systems, images and geometry are acquired separately with no explicit consistency guarantees.
In image-based vision systems, there are two basic tasks to be performed. The first task is to determine the position of the camera, assuming the intrinsic parameters are known. Methods for calibrating intrinsic parameters are well known. A method for calibrating rigid multi-camera systems is described by Beardsley in U.S. patent application Ser. No. 09/923,884 “Hand-Held 3D Vision System” filed on Aug. 6, 2001, incorporated herein by reference.
The second task of the vision system is to use the images in conjunction with the known camera positions to extract accurate shape information. Typically, the shape of an object is determined from the pixels imaging the object. Thus, it becomes necessary to identify these pixels in the images. This is called object segmentation.
The most successful object segmentation methods make use of a background image in which the object is not present, followed by background subtraction. Typically, pixel intensities in foreground images are subtracted from corresponding pixels in the background image to generate a differential image. The background image can be acquired ahead of time when it is known that there are no foreground objects in the scene. Any pixels with a low intensity value in the differential image are considered to be part of the background, and pixels with higher values are presumed to part of the object. For a survey of background subtraction methods, see Toyama et al., “Wallflower: Principles and Practice of Background Maintenance,” Proceedings of the International Conference on Computer Vision, pp. 255-261, 1999.
Typically, prior art segmentation methods make use of controlled cameras with a controlled background. For example, a camera is directed at an object on a turntable with known angular positions and a known background. The background might be a known color, or an active display showing known patterns/colors. These type of systems are cumbersome and expensive to operate.
Therefore, it is desired to perform segmentation with inexpensive handheld cameras in uncontrolled environments.