When a scene is filmed, the resulting video sequence contains implicit information on the three-dimensional (3D) geometry of the scene. While for adequate human perception this implicit information suffices, for many applications the exact geometry of the 3D scene is required. One category of these applications is when sophisticated data processing techniques are used, for instance in the generation of new views of the scene, or in the reconstruction of the 3D geometry for industrial inspection applications.
Recovering 3D information has been an active research area for some time. There are a large number of techniques in the literature that either captures 3D information directly, for example, using a laser range finder or recover 3D information from one or multiple two-dimensional (2D) images such as stereo or structure from motion techniques. 3D acquisition techniques in general can be classified as active and passive approaches, single view and multi-view approaches and geometric and photometric methods.
Passive approaches acquire 3D geometry from images or videos taken under regular lighting conditions. 3D geometry is computed using the geometric or photometric features extracted from images and videos. Active approaches use special light sources, such as laser, structure light or infrared light. Active approaches compute the geometry based on the response of the objects and scenes to the special light projected onto the surface of the objects and scenes.
Single-view approaches recover 3D geometry using multiple images taken from a single camera viewpoint. Examples include structure from motion and depth from defocus.
Multi-view approaches recover 3D geometry from multiple images taken from multiple camera viewpoints, resulted from object motion, or with different light source positions. Stereo matching is an example of multi-view 3D recovery by matching the pixels in the left image and right image in the stereo pair to obtain the depth information of the pixels.
Geometric methods recover 3D geometry by detecting geometric features such as corners, edges, lines or contours in single or multiple images. The spatial relationship among the extracted corners, edges, lines or contours can be used to infer the 3D coordinates of the pixels in images. Structure From Motion (SFM) is a technique that attempts to reconstruct the 3D structure of a scene from a sequence of images taken from a camera moving within the scene or a static camera and a moving object. Although many agree that SFM is fundamentally a nonlinear problem, several attempts at representing it linearly have been made that provide mathematical elegance as well as direct solution methods. On the other hand, nonlinear techniques require iterative optimization, and must contend with local minima. However, these techniques promise good numerical accuracy and flexibility. The advantage of SFM over the stereo matching is that one camera is needed. Feature based approaches can be made more effective by tracking techniques, which exploits the past history of the features' motion to predict disparities in the next frame.
Second, due to small spatial and temporal differences between 2 consecutive frames, the correspondence problem can be also cast as a problem of estimating the apparent motion of the image brightness pattern, called the optical flow. There are several algorithms that use SFM; most of them are based on the reconstruction of 3D geometry from 2D images. Some assume known correspondence values, and others use statistical approaches to reconstruct without correspondence.
The above-described methods have been extensively studied for decades. However, no single technique performs well in all situations and most of the past methods focus on 3D reconstruction under laboratory conditions, which are relatively easy. For real-world scenes, subjects could be in movement, lighting may be complicated, and depth range could be large. It is difficult for the above-identified techniques to handle these real-world conditions.