The viewing experience of visual displays and communication systems can be enhanced by incorporating multiview and stereoscopic (3D) information that heighten the perceived depth and the virtual presence of objects depicted in the visual scene. Given this desirable feature and with the maturation of digital video technologies, there has been a strong impetus to find efficient and commercially viable methods of creating, recording, transmitting, and displaying multiview and stereoscopic images and sequences. The fundamental problem of working with multiview and stereoscopic images is that multiple images are required, as opposed to a single stream of monoscopic images for standard displays. This means that multiple cameras are required during capture and that storage as well as transmission requirements are greatly increased.
In a technique called depth image based rendering (DIBR), images with new camera viewpoints are generated using information from an original source image and its corresponding depth map. These new images then can be used for 3D or multiview imaging devices. One example is the process disclosed in U.S. Pat. No. 7,015,926 by Zitnick et al for generating a two-layer, 3D representation of a digitized image from the image and its pixel disparity map.
The DIBR technique is useful for stereoscopic systems because one set of source images and their corresponding depth maps can be coded more efficiently than two streams of natural images (that are required for a stereoscopic display), thereby reducing the bandwidth required for storage and transmission. For more details on this approach, see:    K. T. Kim, M. Siegel, & J. Y. Son, “Synthesis of a high-resolution 3D stereoscopic image pair from a high-resolution monoscopic image and a low-resolution depth map,” Proceedings of the SPIE: Stereoscopic Displays and Applications IX, Vol. 3295A, pp. 76-86, San Jose, Calif., U.S.A., 1998; and    J. Flack, P. Harman, & S. Fox, “Low bandwidth stereoscopic image encoding and transmission,” Proceedings of the SPIE: Stereoscopic Displays and Virtual Reality Systems X, Vol. 5006, pp. 206-214, Santa Clara, Calif., USA, January 2003.
Furthermore, based on information from the depth maps, DIBR permits the creation of not only one novel image but also a set of images as if they were captured with a camera from a range of viewpoints. This feature is particularly suited for multiview stereoscopic displays where several views are required.
A major problem with conventional DIBR is the difficulty in generating the depth maps with adequate accuracy, without a need for much manual input and adjustments, or without much computational cost. An example of this is the method disclosed by Redert et al in U.S. Patent Application 2006/0056679 for creating a pixel dense full depth map from a 3-D scene, by using both depth values and derivatives of depth values. Another problem arises with such dense depth maps for motion picture applications, where the depth map is too dense to allow adequately fast frame-to-frame processing.
There are software methods to generate depth maps from pairs of stereoscopic images as described in:    D. Scharstein & R. A. Szeliski, “Taxonomy and evaluation of dense two-frame stereo correspondence algorithms”, International Journal of Computer Vision, Vol. 47(1-3), pp. 7-42, 2002; and    L. Zhang, D. Wang, & A. Vincent, “Reliability measurement of disparity estimates for intermediate view reconstruction,” Proceedings of the International Conference on Image Processing (ICIP'02), Vol. 3, pp. 837-840, Rochester N.Y., USA, September 2002.However, the resulting depth maps are likely to contain undesirable blocky artifacts, depth instabilities, and inaccuracies, because the problem of finding matching features in a pair of stereoscopic images is a difficult problem to solve. For example, usually these software methods assume that the cameras used to capture the stereoscopic images are parallel.
To ensure reasonable accuracy of the depth maps would typically require (a) appreciable amount of human intervention and steady input, (b) extensive computation, and/or (c) specialized hardware with restrictive image capture conditions. For example, Harman et al describe in U.S. Pat. Nos. 7,035,451 and 7,054,478 two respective methods for producing a depth map for use in the conversion of 2D images into 3D images from an image. These examples involve intensive human intervention to select areas within key frames and then tag them with an arbitrary depth or to apply image pixel repositioning and depth contouring effects.
Two approaches have been attempted for extracting depth from the level of sharpness based on “depth from focus” and “depth from defocus”. In “depth from focus,” depth information in the visual scene is obtained from only a single image by modeling the effect that a camera's focal parameters have on the image, as described in    J. Ens & P. Lawrence, “An investigation of methods for determining depth from focus,” IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 15, pp. 97-108, 1993.
In “depth from defocus,” depth information is obtained based on the blur information contained in two or more images that have been captured with different camera focal or aperture settings from the same camera viewpoint, i.e., location, as described in    Y. Xiong & S. Shafer. “Depth from focusing and defocusing,” In Proceedings of the International Conference of Computer Vision and Pattern Recognition, pp. 68-73, 1993.In both cases, camera parameters are required to help convert the blur to the depth dimension.
Others have attempted to generate depth maps from blur without knowledge of camera parameters by assuming a general monotonic relationship between blur and distance and arbitrarily setting the minimum and maximum ranges of depth as described in:    S. A. Valencia & R. M. R. Dagnino, “Synthesizing stereo 3D views from focus cues in monoscopic 2D images,” Proceedings of the SPIE: Stereoscopic Displays and Virtual Reality Systems X, Vol. 5006, pp. 377-388, Santa Clara, Calif., U.S.A., January 2003.However, the main problem with these attempts is that depth within object boundaries is still difficult to determine and, for the described methods, attempts are made to fill the regions which tend to be inaccurate, as well as computationally complex and intensive.
Another major problem with DIBR concerns the rendering of newly exposed regions that occur at the edges of objects where the background was previously hidden from view, and no information is available in depth maps on how to properly fill in these exposed regions or “holes” in the rendered images. Although not perfect, a common method is to fill these regions with the weighted average of luminance and chrominance values of neighboring pixels. However, this solution often leads to visible distortions or annoying artifacts at edges of objects. In general, there is a consensus in prior art against smoothing to reduce such distortions, especially smoothing across object boundaries with sharp depth transitions, as this has been presumed to reduce the depth between the object and its background. See for example:    J. Yin, & J. R. Cooperstock, “Improving depth maps by nonlinear diffusion”, Short Communication Papers of the 12th International Conference on Computer Graphics, Visualization and Computer Vision, Plzen, Czech Republic, Vol. 12, pp. 305-311, Feb. 2-6, 2004.
Contrary to this consensus, we have provided empirical evidence of an ameliorative effect of a rather simple ‘uniform’ smoothing of depth maps, including smoothing across object boundaries, on image quality as given in our report:    G. Alain, “Stereo vision, the illusion of depth,” Co-op term report, April 2003.This was subsequently confirmed in a published suggestion in the following two publications by Fehn to use 2D uniform Gaussian smoothing of depth maps at object boundaries:    C. Fehn, “A 3D-TV approach using depth-image-based rendering (DIBR)”, Proceedings of Visualization, Imaging, and Image Processing (VIIP'03), pp. 482-487, Benalmadena, Spain, September 2003; and    C. Fehn, “Depth-image-based rendering (DIBR), compression and transmission for a new approach on 3D-TV”, Proceedings of SPIE Stereoscopic Displays and Virtual Reality Systems XI, Vol. 5291, pp. 93-104, CA, U.S.A., January 2004.More recently, however, we found that uniform smoothing of depth maps causes undesirable geometrical distortion in the newly exposed regions as further described below.
Another limitation of conventional methods in DIBR, in general, is likely to occur when applied to motion pictures entailing a sequence of image frames. Any sharp frame-to-frame transitions in depth within a conventional depth map, often result in misalignment of a given edge depth between frames thereby producing jerkiness when the frames are viewed as a video sequence.
Based on the above described shortcoming is prior art, there is clearly a need for an affordably simple solution for deriving sparse depth maps from a single 2D source image without requiring knowledge of camera parameters, to meet the purpose of creating with DIBR higher quality virtual 3D images having negligible distortions and annoying artifacts, and minimized frame-to-frame jerkiness in motion pictures, particularly at object boundaries.