1. Field of the Invention
This invention generally relates to the field of image processing systems and methods, and more particularly relates to methods of recovering depth information associated with elements in a base image corresponding to multiple reference image views.
2. Description of Related Art
Image processing systems have attempted to process multiple images (views of a scene) to identify common image features across the different images (different views of a scene), such as to create three-dimensional (3-D) digital content by analyzing the multiple views of the scene. A main problem has been how to determine depth information for image feature elements, such as pixels, of a base image (view of a scene). When a scene is viewed from multiple cameras—one of them chosen as the base view, others as reference ones—the depth information of the scene for image features, such as pixels, in the base view's image plane can be recovered, based on the correspondence relationship between pixels in the base view and in the reference views.
The terminology “binocular stereo” refers to the case where two cameras are used, arranged in parallel to one another, as shown in FIG. 1. The distance between the two cameras is often called a baseline. The terminology “multi-baseline stereo” means the usage of multiple horizontally or vertically arranged cameras, also parallel to each other, as shown in FIG. 2.
Recent development in Image-Based Rendering and Modeling has raised the interest on multi-baseline stereo in the vision community. As a result of using multiple cameras arranged in multiple baselines two advantages are gained over binocular stereo methods, i.e., a decrease of matching ambiguity, and an increase of reconstruction precision.
The basic problem of stereo, regardless how many cameras are used or how they are positioned, is to find the depth value of the 3 dimensional (3-D) scene point seen at each pixel of a base image, using other images as references. To accomplish this, for each pixel in the base image, its corresponding pixels (projections of the same scene point) in the reference images need to be identified. This correspondence problem can be very difficult to solve and impractical to implement, especially for more than a very small number of cameras, e.g., more than about 2 or 3 cameras. In the case of binocular or multi-baseline stereo, the task of establishing correspondence may be simplified in the sense that the corresponding pixels locate on the same horizontal or vertical scan line as the pixel in the base image. This makes possible representing the correspondence with a scalar, or disparity.
However, these binocular or multi-baseline stereo methods typically require special camera setup to achieve a common image plane, so that the cameras are necessarily coplanar and parallel. These methods unfortunately are also limited in the amount of coverage area of a particular scene. Further, these methods include restrictions on the camera placement that tend to complicate the overall image capture process, increase the cost of image capture, and generally make these methods impracticable for more complicated set ups, e.g., with more than a small number of cameras. Typically, in these methods, either a mechanical device is used to ensure the cameras are collinear, or a mathematical process called rectification is performed to correct the mechanical misalignment. Lastly, the accuracy and reliability of results using these prior art methods would tend to be undesirable for serious commercial applications.
Therefore a need exists to overcome the problems with the prior art as discussed above, and particularly for a method and apparatus that can more successfully recover depth information for elements in a base image corresponding across multiple reference images.