Embodiments here relate generally to the field of 2D to 3D video and image conversion performed either in real time or offline. More particularly, the embodiments relate to a method and apparatus for enhancing and/or exaggerating depth and negative parallax and adjusting the zero-parallax plane, also referred to as the screen plane, for 3D-image rendering on different 3D display technologies and formats.
With the rising sale of 3D-enabled TVs and personal devices in the consumer segment, the need to release new and old movies in 3D is increasing. In the commercial application space, the use of large screen electronic billboards which can display attention grabbing 3D-images for advertising or informational purposes has increased. Because of the increasing demand for creating 3D-content, the demand for automatically or semi-automatically convert existing 2D-contents to 3D contents increases. Enhancing the 3D-experience of the consumers and viewers can produce further growth of 3D entertainment and advertisement market. A demand exists for tools and services to generate stunning 3D-image effects.
Traditionally, converting 2D videos to 3D for professional application starts with generating a depth map of the image for each video frame using a very labor intensive manual process of roto-scoping, where objects in each frame are manually and painstakingly traced by the artist and depth information for each object is painted by hand. For consumer applications such as built-in automated 2D to 3D function in 3D-TV or game consoles, the converted 3D-image suffers from extremely poor depth and pop-out effects. Moreover, there is no automated control to modify the zero-parallax plane position and artificially exaggerate pop-out or depth of selective objects for enhanced special-effects.
Numerous research publications exist on methods of automatically generating depth map from a mono-ocular 2D-image for the purpose of converting the 2D-image to 3D-image. The methods range from very simplistic heuristics to very complicated and compute intensive image analysis. Simple heuristics may be suitable for real time conversion application but provides poor 3D quality. On the other hand, complex mathematical analysis may provide good 3D-image quality but may not be suitable for real time application and hardware implementation.
A greyscale image represents the depth map of an image in which each pixel is assigned a value between and including 0 and 255. A value of 255 (100% white level) indicates the pixel is in the front most and a value of 0 represents the pixel is in the back most. The depth value of a pixel is used to calculate the horizontal (x-axis) offset of the pixel for left and right eye view images. In particular, if the calculated offset is w for pixel at position (x,y) in the original image, then this pixel is placed at position (x+w, y) in the left image and (x−w, y) in the right image. If the value of the offset w for a pixel is positive, it creates a negative parallax where the pixel appears to pop out of the screen. Alternatively, if the value of the offset w for a pixel is negative, it creates a positive parallax where the pixel appears to be behind the screen plane. If the offset w is zero, the pixel appears on the screen plane. The larger the offset, the greater the disparity between the left and right eye view and hence larger the depth inside the screen or pop out of the screen. Hence, given a depth map for a 2D, or monocular, image, by selectively manipulating the offsets the pixels for 3D rendering, it is possible to artificially enhance or exaggerate 3D effects in a scene and this transformations can be done in real time or offline.