Stereoscopic imaging is the process of visually combining at least two images of a scene, taken from slightly different viewpoints, to produce the illusion of three-dimensional depth. This technique relies on the fact that human eyes are spaced some distance apart and do not, therefore, view exactly the same scene. By providing each eye with an image from a different perspective, the viewer's eyes are tricked into perceiving depth. Typically, where two distinct perspectives are provided, the component images are referred to as the “left” and “right” images, also know as a reference image and complementary image, respectively. However, those skilled in the art will recognize that more than two viewpoints may be combined to form a stereoscopic image.
In 3D post-production, visual effects (“VFX”) workflow and three-dimensional (“3D”) display applications, an important process is to infer or extract depth information, e.g., a depth map or distance from object to camera, from stereoscopic images consisting of left eye view and right eye view images. Depth map extraction can be used in a variety of film applications, for instance, acquiring the geometry of a scene for film postproduction, depth keying, 3D compression and content generation for 3D displays. For instance, recently commercialized autostereoscopic 3D displays require an image-plus-depth-map input format (2D+Z), so that the display can generate different 3D views to support multiple viewing angles.
Stereo matching is a widely used approach for depth map extraction to estimate depth maps from two images taken by cameras at different locations. Stereo matching obtains images of a scene from two or more cameras positioned at different locations and orientations in the scene. These digital images are obtained from each camera at approximately the same time and points in each of the images are matched corresponding to a 3-D point in space. In general, points from different images are matched by searching a portion of the images and using constraints (such as an epipolar constraint) to correlate a point in one image to a point in another image. Depth values are inferred from the relative distance between two pixels in the images that correspond to the same point in the scene.
A variety of methods have been developed for accurate depth estimation, for instance, dynamic programming, belief propagation, simple block matching, etc. More accurate methods are usually more computationally expensive. Some of the methods are too slow to be useful for practical applications. Scanline algorithms (e.g., scanline dynamic programming or scanline belief propagation) have been found to be relatively efficient algorithms or functions able to give quite accurate results, compared to simple pixel/block matching (too inaccurate) and two-dimensional (“2D”) belief propagation (too slow). Therefore, scanline algorithms or functions could become practical solutions for depth estimation problems. However, the main drawback of the scanline algorithms or functions is that the scanline algorithms or functions often yield horizontal stripe artifacts (see FIG. 5B where stripe artifacts are encircled), because unlike other expensive algorithms such as belief propagation, scanline algorithms only perform optimization one scanline at a time, consequently smoothness constraints are not imposed along vertical directions.
Therefore, a need exists for techniques for fast and efficient depth information extraction methods that minimize discontinuity or stripe artifacts.