Stereoscopic 3D display technologies are now becoming widely available to the consumer. However, the amount of 3D content available for consumption on such devices is still extremely limited due to the high cost of acquiring and processing stereoscopic content. Consumer electronics companies therefore have a strong demand for technology that can automatically convert existing 2D media into stereoscopic 3D in real-time or near real-time either within consumer playback devices such as TVs, Blu-Ray players or a set-top box.
There is a significant body of academic research focused on the extraction of 3D information from one or more 2D images in a video sequence. This process is useful in various fields of endeavor such as autonomous robotic navigation and is a central concept in the field of Computer Vision. However, the requirements for applications such as robotic navigation and computer vision are different to the requirements for applications in 3D entertainment devices. In Computer Vision the emphasis is on extracting physically accurate distance measurements whereas in applications for 3D entertainment the emphasis is on extracting depth information that provides a visually attractive 3D model that is consistent with 2D perspective cues in the image. The current invention falls into the latter category.
The fundamental problem facing 2D to 3D conversion techniques is that the problem is ill-posed meaning that given the information in the 2D image there are multiple different possible 3D configurations. In particular, automated 2D to 3D conversion algorithms operate on the basis of image characteristics such as color, position, shape, focus, shading and motion to name a few. They do not perceive “objects” within a scene as the human eye does. A number of techniques have been devised to extract 3D information from 2D images and review of these techniques can be found in “Converting 2D to 3D: A survey” research report XP-002469656 TU Delft by Qingqing Wei, December 2005. The analysis of motion and perspective provides the most promising techniques for analyzing video images which has led to the development of “structure-from-motion” techniques that are best summarized in reference “Multiple View Geometry in computer vision” by Richard Hartley and Andrew Zisserman, Cambridge University Press, 2000.
One of the methods of dealing with the ill-posed nature of 2D to 3D conversion is to attempt to combine depth information from multiple different analysis techniques. These depth fusion techniques for example may seek to combine depth-from-focus with motion segmentation. There is some prior art in this area, in particular “Learning Depth from single monocular images”, A. Saxena et al, 2005, MIT Press, describes a method of using a probabilistic model to combine depth information from neighboring patches. Another approach is described in “Depth-Map Generation by Image Classification” by S. Battiato et al, Proceedings of SPIE, Vol. 5302, 95, 2004. This approach uses the classification of the image (e.g. indoor/outdoor) to determine which source of depth information to select. Finally, a method of prioritizing depth fusion by weighting motion results with methods such as depth from geometry is described in “Priority depth fusion for the 2D to 3D conversion system”, Yun-Lin Chang, et al, Proc. SPIE, Vol. 6805, 680513, 2008.
This prior art suffers from several problems. For example, the method of combining multiple source of depth does not provide a consistent depth range. In general most, if not all, 2D to 3D conversion processes do not result in a good stereoscopic effect. The present invention seeks to provide an improved or enhanced depth map which is particularly suitable for use in the entertainment industry, although other industries and applications may also benefit.