3D images are digital 3D models of real-world scenes that are captured for a variety of purposes, including visualization and information extraction. They are acquired by 3D imagers which are variously referred to as 3D sensors, 3D cameras, 3D scanners, VR cameras, 360° cameras, and depth cameras. They address the need for 3D information in applications used in global sectors including defense, security, entertainment, education, healthcare, infrastructure, manufacturing, and mobile.
A number of methods have been developed to extract 3D information from a scene. Many involve active light sources such as lasers and have limitations such as high power consumption and limited range. An almost ideal method is to use two or more images from inexpensive cameras (devices that form images by sensing a light field using detectors) to generate detailed scene models. The term Multi-View Stereo (MVS) will be used here, while it and variations are also known by other names such as photogrammetry, Structure-from-Motion (SfM), and Simultaneously Localization And Mapping (SLAM) among others. A number of such methods are presented in the Furukawa reference, “Multi-View Stereo: A Tutorial.” It frames MVS as an image/geometry consistency optimization problem. Robust implementations of photometric consistency and efficient optimization algorithms are found to be critical for successful algorithms.
To increase the robustness of the extraction of scene models from images, an improved modeling of the transport of light is needed. This includes the characteristics of light interactions with matter, including transmission, reflection, refraction, scattering and so on. The thesis of Jarosz, “Efficient Monte Carlo Methods for Light Transport in Scattering Media” (2008) provides an in-depth analysis of the subject.
In the simplest version of MVS, if the viewpoints and poses of a camera are known for two images, the position of a “landmark” 3D point in a scene can be computed if the projection of the point can be found in the two images (its 2D “feature” points) using some form of triangulation. (A feature is characteristics of an entity expressed in terms of a description and a pose. Examples of features include a spot, a glint, or a building. The description i) can be used to find instances of the feature at poses in a field (space in which entities can be posed), or ii) can be formed from descriptive characteristics at a pose in a field.) Surfaces are extracted by combining many landmarks. This works as long as the feature points are, indeed, correct projections of fixed landmark points in the scene and not caused by some viewpoint-dependent artifact (e.g., specular reflections, intersection of edges). This can be extended into many images and situations where the viewpoints and poses of the camera are not known. The process of resolving the landmark locations and camera parameters is called Bundle Adjustment (BA) although there are many variations and other names used for specific uses and situations. This topic is comprehensively discussed by Triggs in his paper “Bundle Adjustment—A Modern Synthesis” (2009). An important subtopic in BA is being able to compute a solution without explicitly generating derivatives analytically, which become increasingly difficult computationally as the situation becomes complex. An introduction to this is given by Brent in the book “Algorithms for Minimization Without Derivatives.”
While two properties of light, color and intensity, have been used in MVS, there are major limitations when used with everyday scenes. These include an inability to accurately represent surfaces without textures, non-Lambertian objects and transparent objects in the scene. (An object is media that is expected to be collocated. Examples of objects include: a leaf, a twig, a tree, fog, clouds and the earth.) To solve this, a third property of light, polarization, has been found to extend scene reconstruction capabilities. The use of polarimetric imaging in MVS is called Shape from Polarization (SIP). The Wolff, U.S. Pat. No. 5,028,138, discloses basic SfP apparatus and methods based on specular reflection. Diffuse reflections, if they exist, are assumed to be unpolarized. The Barbour U.S. Pat. No. 5,890,095 discloses a polarimetric imaging sensor apparatus and a micropolarizer array. The Barbour U.S. Pat. No. 6,810,141 discloses a general method of using a SPI sensor to provide information about objects, including information about 3D geometry. The d'Angelo patent DE102004062461 discloses apparatus and methods for determining geometry based on Shape from Shading (SfS) in combination with SfP. The d'Angelo patent DE102006013318 discloses apparatus and methods for determining geometry based on SfS in combination with SIP and a block matching stereo algorithm to add range data for a sparse set of points. The Morel patent WO 2007057578 discloses an apparatus for SfP of highly reflective objects.
The Koshikawa paper, “A Model-Based Recognition of Glossy Objects Using Their Polarimetrical Properties,” is generally considered to be the first paper disclosing the use of polarization information to determine the shape of dielectric glossy objects. Later, Wolff showed in his paper, “Polarization camera for computer vision with a beam splitter,” the design of a basic polarization camera. The Miyazaki paper, “Determining shapes of transparent objects from two polarization images,” develops the SIP method for transparent or reflective dielectric surfaces. The Atkinson paper, “Shape from Diffuse Polarization,” explains the basic physics of surface scattering and describes equations for determining shape from polarization in the diffuse and specular cases. The Morel paper, “Active Lighting Applied to Shape from Polarization,” describes an SfP system for reflective metal surfaces that makes use of an integrating dome and active lighting. It explains the basic physics of surface scattering and describes equations for determining shape from polarization in the diffuse and specular cases. The d'Angelo Thesis, “3D Reconstruction by Integration of Photometric and Geometric Methods,” describes an approach to 3D reconstruction based on sparse point clouds and dense depth maps.
While MVS systems are mostly based on resolving surfaces, improvements have been found by increasing the dimensionality of the modeling using dense methods. Newcombe explains this advancement in his paper “Live Dense Reconstruction with a Single Moving Camera” (2010). Another method is explained by Wurm in the paper “OctoMap: A Probabilistic, Flexible, and Compact 3D Map Representation for Robotic Systems” (2010).
When applying MVS methods to real-world scenes the computational requirements can quickly become impractical for many applications, especially for mobile and low-power operation. In areas outside MVS such as medical imaging where such computational issues have been addressed in the past, the use of octree and quadtree data structures and methods have been found effective. This is especially the case when implemented in modest, specialized processors. This technology is expected to allow for the use of a very large number of simple, inexpensive, low-power processors to be applied to computationally difficult situations. The basic octree concepts where introduced by Meagher in paper “Geometric Modeling Using Octree Encoding” and the Thesis “The Octree Encoding Method for Efficient Solid Modeling.” It was later extended for orthographic image generation in U.S. Pat. No. 4,694,404.