Stereoscopic or three-dimensional (3D) television (3D-TV) is expected to be a next step in the advancement of television. Stereoscopic images that are displayed on a 3D TV are expected to increase visual impact and heighten the sense of presence for viewers. 3D-TV displays may also provide multiple stereoscopic views, offering motion parallax as well as stereoscopic information.
A successful adoption of 3D-TV by the general public will depend not only on technological advances in stereoscopic and multi-view 3D displays, but also on the availability of a wide variety of program contents in 3D. One way to alleviate the likely lack of program material in the early stages of 3D-TV rollout is to find a way to convert two-dimensional (2D) still and video images into 3D images, which would also enable content providers to re-use their vast library of program material in 3D-TV.
In order to generate a 3D impression on a multi-view display device, images from different view points have to be presented. This requires multiple input views consisting of either camera-captured images or rendered images based on some 3D or depth information. This depth information can be either recorded, generated from multi-view camera systems or generated from conventional 2D video material. In a technique called depth image based rendering (DIBR), images with new camera viewpoints are generated using information from an original monoscopic source image and its corresponding depth map containing depth values for each pixel or groups of pixels of the monoscopic source image. These new images then can be used for 3D or multi-view imaging devices. The depth map can be viewed as a gray-scale image in which each pixel is assigned a depth value representing distance to the viewer, either relative or absolute. Alternatively, the depth value of a pixel may be understood as the distance of the point of the three-dimensional scene represented by the pixel from a reference plane that may for example coincide with the plane of the image during image capture or display. It is usually assumed that the higher the gray-value (lighter gray) associated with a pixel, the nearer is it situated to the viewer.
A depth map makes it possible to obtain from the starting image a second image that, together with the starting image, constitutes a stereoscopic pair providing a three-dimensional vision of the scene. Examples of the DIBR technique are disclosed, for example, in articles K. T. Kim, M. Siegel, & J. Y. Son, “Synthesis of a high-resolution 3D stereoscopic image pair from a high-resolution monoscopic image and a low-resolution depth map,” Proceedings of the SPIE: Stereoscopic Displays and Applications IX, Vol. 3295A, pp. 76-86, San Jose, Calif., U.S.A., 1998; and J. Flack, P. Harman, & S. Fox, “Low bandwidth stereoscopic image encoding and transmission,” Proceedings of the SPIE: Stereoscopic Displays and Virtual Reality Systems X, Vol. 5006, pp. 206-214, Santa Clara, Calif., USA, January 2003; L. Zhang & W. J. Tam, “Stereoscopic image generation based on depth images for 3D TV,” IEEE Transactions on Broadcasting, Vol. 51, pp. 191-199, 2005.
Advantageously, based on information from the depth maps, DIBR permits the creation of a set of images as if they were captured with a camera from a range of viewpoints. This feature is particularly suited for multi-view stereoscopic displays where several views are required.
One problem with conventional DIBR is that accurate depth maps are expensive or cumbersome to acquire either directly or from a 2D image. For example, a “true” depth map can be generated using a commercial depth camera such as the ZCam™ available from 3DV Systems, Israel, that measures the distance to objects in a scene using an infra-red (IR) pulsed light source and an IR sensor sensing the reflected light from the surface of each object. Depth maps can also be obtained by projecting a structured light pattern onto the scene so that the depths of the various objects could be recovered by analyzing distortions of the light pattern. Disadvantageously, these methods require highly specialized hardware and/or cumbersome recording procedures, restrictive scene lighting and limited scene depth.
Although many algorithms exist in the art for generating a depth map from a 2D image, they are typically computationally complex and often require manual or semi-automatic processing. For example, a typical step in the 2D-to-3D conversion process may be to generate depth maps by examining selected key frames in a video sequence and to manually mark regions that are foreground, mid-ground, and background. A specially designed computer software may then be used to track the regions in consecutive frames to allocate the depth values according to the markings. This type of approach requires trained technicians, and the task can be quite laborious and time-consuming for a full-length movie. Examples of prior art methods of depth map generation which involve intensive human intervention are disclosed in U.S. Pat. Nos. 7,035,451 and 7,054,478 issued to Harman et al.
Another group of approaches to depth map generation relies on extracting depth from the level of sharpness, or blur, in different image areas. These approaches are based on realization that there is a relationship between the depth of an object, i.e., its distance from the camera, and the amount of blur of that object in the image, and that the depth information in a visual scene may be obtained by modeling the effect that a camera's focal parameters have on the image. Attempts have also been made to generate depth maps from blur without knowledge of camera parameters by assuming a general monotonic relationship between blur and distance. However, extracting depth from blur may be a difficult and/or unreliable task, as the blur found in images can also arise from other factors, such as lens aberration, atmospheric interference, fuzzy objects, and motion. In addition, a substantially same degree of blur arises for objects that are farther away and that are closer to the camera than the focal plane of the camera. Although methods to overcome some of these problems and to arrive at more accurate and precise depth values have been disclosed in the art, they typically require more than one exposure to obtain two or more images. A further disadvantage of this approach is that it does not provide a simple way to determine depth values for regions for which there is no edge or texture information and where therefore no blur can be detected.
A recent U.S. patent application 2007/0024614, which is assigned to the assignee of the current application, discloses the use of sparse depth maps for DIBR applications. These sparse depth maps, also referred to as so-called “surrogate” depth maps, can be obtained using edge analysis of the monoscopic image followed by asymmetrical smoothing, and contain depth information that is concentrated mainly at edges and object boundaries in the 2D images. Although these surrogate depth maps can have large regions with missing and/or incorrect depth values, the perceived depth of the rendered stereoscopic images using the surrogate depth maps has been judged to provide enhanced depth perception relative to the original monoscopic image when tested on groups of viewers. It was speculated that the visual system combines the depth information available at the boundary regions together with pictorial depth cues to fill in the missing areas. One drawback of this approach is that this technique can introduce geometric distortions in images with vertical lines or edges. The lack of depth information within object's boundaries might also negatively affect perceived depth quality rating.
Accordingly, there is a need for methods and systems for generating depth maps from monoscopic images that provide accurate object segregation, are capable of resolving depth information within objects boundaries, and are computationally simple requiring only small amount of processing.
An object of the present invention is to overcome at least some shortcomings of the prior art by providing relatively simple and computationally inexpensive method and apparatus for depth map generation from a 2D image using color information comprised in said 2D image.
Another object of the present invention is to provide relatively simple and computationally inexpensive method and apparatus for rendering stereoscopic and multi-view video and still images from 2D video and still images utilizing color information contained in said 2D images.