An image can be considered a projection from a three-dimensional (3D) scene onto a two-dimensional (2D) plane. Although a 2D image does not provide depth information, if two images of the same scene are available from different vantage points, the position (including the depth) of a 3D point can be found using known techniques.
For example, stereo matching is a process in which two images (a stereo image pair) of a scene taken from slightly different viewpoints are matched to find disparities (differences in position) of image elements which depict the same scene element. The disparities provide information about the relative distance of the scene elements from the camera. Stereo matching enables disparities (i.e., distance data) to be computed, which allows depths of surfaces of objects of a scene to be determined. A stereo camera including, for example, two image capture devices separated from one another by a known distance, which may be referred to as the baseline distance, can be used to capture the stereo image pair.
Some image capture modules include two grey-scale depth stereo cameras and an RGB camera. This type of module may be used, for example, in a mobile application (e.g., a smart phone) and, thus, the footprint of the module tends to be small. When the module's footprint is small, the baseline distance between the two depth cameras will likewise be small. Small baselines in stereo systems, however, lead to low depth or z-resolution. Further, the disparity map derived from the two depth cameras tends to be sparse. Sparse disparity maps can be a result of scenes with little texture (e.g., a monochromatic wall). Further even if a light projector is used to project texture onto the scene, the resultant disparity map may be sparse if the projected pattern is not very dense.