Three-dimensional (3D) model acquisition, also known as 3D geometry recovery, 3D shape reconstruction, etc, is a challenging problem involving the recovery of 3D geometry (often represented as polynomial meshes) of objects and scenes from images taken by, for example, image sensors. There has been substantial work on the recovery of depth information or three-dimensional (3D) geometry of objects. One technique belongs to a category of approaches known as structured light methods, which recover the 3D geometry of an object by projecting specially designed light patterns onto the surface of an object and estimating the depth (i.e., the distance between the point on the surface to the camera) by analyzing the deformations of the projected patterns.
Among many techniques for 3D geometry recovery, structured light approaches are highly accurate and can be implemented in real-time, making them promising for real-world applications. Nevertheless, current structured light approaches have several limitations that hinder their application to real-world scenarios. For example, to realize accurate acquisition, these techniques require high power light and high contrast patterns to be projected onto the object surfaces. This is often unacceptable to human subjects such as actors in a movie scene.
In addition, current structured light techniques usually entail assuming that there is little or no texture on the object surface. The presence of the object texture would affect the appearance of the structured light patterns. Therefore, textures can significantly degrade the accuracy. Even further, when the captured images/videos are also used for content creation, such as making movies, the presence of structured light in the images and videos is often undesirable.
The various 3D acquisition techniques can be classified as active and passive approaches. Passive approaches acquire 3D geometry from images or videos taken under regular lighting conditions. 3D geometry is computed using the geometric or photometric features extracted from images and videos. Two of the most widely used approaches are stereo algorithm and photometric techniques. Stereo algorithm techniques estimate depth based on the disparity of a stereo image pair that is taken by two cameras with different viewing angles. Disparity is estimated by comparing the pixels in the stereo image pair and computing the coordinate difference. Photometric techniques estimate the normal of a surface based on the shading effect caused by surface orientations. For real-world images, both methods are inaccurate and require high computational cost. Stereo algorithms fail to estimate the depths of the pixels on flat regions (regions without texture). The photometric technique requires the materials (i.e. BRDF, Bidirectional Reflectance Distribution Function) of the object surfaces to be known a priori.
In contrast, active approaches are able to achieve high accuracy and real-time speed. Laser scanning, for example, is a technique that includes projecting laser light onto a surface and estimating the depth based on the time-of-flight principle or the deformation of the light pattern. The application of laser scanning is very restricted, however.
Another technique referred to as structured light is preferred in the art. Structured light techniques project visible or invisible (infrared) patterns onto the surfaces of objects and estimate the depth according to the deformation of the patterns. Structured light approaches have been extensively studied in the computer vision community. The state-of-the-art techniques include composite structured light, which uses multiple stripes instead of single stripe and color structured light, and implements color to distinguish and identify the stripes. However, to achieve high accuracy of depth estimation, these approaches require high strength structured light and/or smoothness of a surface such that the surface texture would not interfere with the structured light patterns. As such, recovering 3D geometry of real-world objects using currently available techniques is intrusive and inaccurate.