Stereo vision is the extraction of three-dimensional information from images by comparing information about a scene from two different viewpoints. Stereo vision is one of the most heavily researched areas in computer vision technology. Traditional stereo algorithms are used to generate depth maps from color, or RGB, images. However, these algorithms are limited in their effectiveness by both the quantity of relevant features in the scene as well as assumptions such as the constancy of brightness within the scene. In some cases, it may be difficult to generate a depth map of elements within a scene from solid color objects, such as a shirt that has no pattern or a single-colored wall. Moreover, lighting variations are commonly found in non-studio conditions, such as within living rooms.
In addition, technologies for active depth sensing have improved depth estimation approaches though the use of structured light to extract geometry from a scene. With existing technology, such as found in the Kinect™ system from Microsoft® Corporation, a structured infrared (IR) pattern is projected onto the scene and photographed by a single IR camera. Based on deformations of the light pattern, geometric information about the underlying video scene can be determined and used to generate a depth map. However, despite the advantages of structured light technology, the modules generate interference between the projected patterns when they are used to sample the same scene at the same time. In addition, when multiple modules attempt to sample the same scene at the same time, there may be significant problems associated with the temporal synchronization of various depth maps. Moreover, it may also be difficult to calibrate the structured light projectors or lasers correctly.