Projecting structured light (SL) onto scenes is one of the most reliable techniques for shape measurement and 3D scene reconstruction in computer vision and robotics applications. The correspondence problem present in stereo vision is simplified by projecting known patterns from a projector onto a scene, which is then imaged with a camera. For each pixel in the acquired image, the corresponding projector row or column is obtained by decoding the acquired patterns, followed by a ray-plane intersection to determine a 3D point.
Coded patterns, including Gray codes, are widely used to provide high quality reconstructions for static scenes. SL methods, using an N-bit binary Gray code pattern, project N binary images onto a scene to uniquely encode 2N projector columns or rows. Using N acquired images, each pixel is decoded by identifying the corresponding projector column or row.
Single-shot SL methods project a single pattern that allows per-image reconstruction, and thus can be used for dynamic scenes and deforming objects. However, such methods decrease the spatial resolution and perform poorly near depth discontinuities, e.g., thin structures and edges, because a contiguous spatial neighborhood of pixels is required to perform the decoding for each pixel.
More importantly, prior art single-shot methods project the same pattern repeatedly for each time instant. Even if the scene is static, or if parts of the scene are slowly moving, prior art methods still decrease the spatial resolution as if the entire scene is dynamic. Thus, conventional single-shot methods are not motion-sensitive.
Furthermore, conventional single-shot methods typically reconstruct depths at sparse feature points such as edges, intensity peaks of color stripes, and 2D grid points, by using complex mechanisms and heuristics for decoding.
Single-Shot Structured Light
Single-shot structured light with spatial multiplexing can use 1D and 2D patterns to decode an image. A 1D De Bruijn sequence, having a window uniqueness property, is projected onto a scene to be decoded or reconstructed. The De Bruijn sequence enables unique decoding if a small spatial window of symbols is detected near a pixel. Color stripe patterns can be used to generate De Bruijn sequences when more than two symbols are required. Examples of 2D patterns include grid patterns and M-arrays, and perfect sub-maps using various geometric shapes and colors.
One technique uses a low cost 3D sensor for computer vision and human-computer interaction applications. That technique projects an infrared random dot 2D pattern as the single-shot pattern, which is acquired using an infrared camera. The matching is done per image, and depth maps for a dynamic scene can be obtained in real time. However, the depth maps are noisy especially near depth discontinuities.
All of the above methods project the same pattern for every image, process each image independently, and are not motion-sensitive.
Another method registers the above depth maps and then reconstructs a static 3D scene with a higher quality compared to raw depth maps.
Spatio-Temporal Decoding
Structured light patterns that are spatio-temporally decodable are known. However, conventional methods require disconnected windows of pixels for decoding, and thus do not reduce the effective size of spatial neighborhood required for decoding using all patterns.
The spatial resolution can also be improved by shifting a single-shot color stripe pattern one pixel at a time and analyzing the temporal profile for each pixel using all of the shifted patterns. However, that method is not hierarchical. That method requires the entire scene to be static during the projection of all shifted patterns to reduce the spatial neighborhood to a single pixel.
Flexible voxels enable a spatio-temporal resolution trade-off for reconstructing a video depending on the motion of each pixel.
Adaptive Window Matching
Spatio-temporal windows have been used for stereo processing to improve the matching quality. However, for stereo processing, the size of the window is typically fixed for every pixel, or regular box shaped windows are used.