Solving the correspondence problem is a classic problem in computer vision and image processing literature. It is a central to many 3-D (three-dimensional) related applications including stereopsis, 3-D shape recovery, camera calibration, motion estimation, view interpolation, and others. The correspondence problem involves finding a mapping relating points in one coordinate system to those in a second (or more) coordinate system (e.g., the mapping between pixels in one image of a given 3-D scene to those in a second image).
The traditional method for solving the correspondence problem uses image information directly. The correspondence mapping is determined for every pixel by examining its neighborhood and exploiting color consistency. In this approach an objective function (e.g. maximize correlation or minimize some error metric) is optimized for a given point in one image to find its corresponding match in all the other images. This passive approach works well for scenes that are distinctly textured and have consistent textures across all the images. This approach has difficulty when the scene is more uniform in color or when the illumination varies across the images. Typically, this approach produces only a sparse set of correspondences reliably and becomes more difficult for an arbitrary number of images. The matching algorithms that typically are used in this approach often are designed to establish correspondences between only two frames at a time, in which case ½·K·(K−1) dual-frame matches are required K cameras.
In contrast, structured light scanning algorithms traditionally use a calibrated camera-projector pair to recover 3-D shape information. In these approaches, a single active column in the projector's coordinate system results in a plane of light. The active plane of light is projected onto the scene of interest and the resulting contour is imaged by the camera. Then, for any image pixel p on the contour, its corresponding 3-D point is found by intersecting the 3-D ray passing through p and the 3-D plane equation.
Instead of specifying each plane individually, the planes may be specified in parallel using a set of temporally encoded patterns. Light patterns may be formed simply by considering the bitplanes of the binary representation of the column indices. The spatially varying light patterns form a temporal encoding of the columns in the projector's coordinate system. Every light pattern, consisting of binary vertical stripes of varying widths, is projected in succession. The camera decodes the projected patterns and builds up the appropriate bit sequence at every pixel location. Hence, given a pixel location in the camera, the corresponding column in the projector's coordinate system may be determined, and the corresponding 3-D point then may be calculated. Greater 3-D accuracy may be achieved by considering the transitions between adjacent columns.
The above-mentioned approaches based on temporally encoded patterns work well for arbitrary static scenes with a calibrated setup. There have been many approaches to further improve on the design of the projected light patterns and reduce the total number used. In one approach that requires a labeling process and a particular camera configuration, an uncoded grid pattern with landmark dots is projected. In another approach, a single color pattern is used to handle only neutrally colored scenes. Other approaches use sophisticated binary patterns and matching algorithms to capture dynamic but not very textured scenes.