Rapidly emerging 3D technologies, in the form of 3D cinemas, 3D home entertainment devices, and 3D portable electronics, has created increasing demand for 3D content. One popular way of creating 3D content is to leverage and convert the large existing database of 2D media into 3D. The conversion of image data from 2D to 3D, a fast way to obtain 3D content from existing 2D content, has been extensively studied. One of the methods to convert 2D into 3D is to first generate a depth map, and create left and right eye images from the depth map. This 3D rendering method based on depth map is useful for multi-view stereoscopic system, and is also well-suited for efficient transmission and storage.
In converting 2D images into 3D images, most conventional technologies apply a same method to different images, regardless of the type of content in the images. The lack of a customized method in these technologies may either create unsatisfactory 3D effects for certain content, or significantly increase the computational complexity required.
To use customized methods for different types of scenes, a classification-based algorithm has been proposed that seeks to improve over conventional 2D to 3D image conversion technologies. This algorithm classifies the image into different categories, and uses different methods to generate the depth map for different image categories. In this algorithm, known as “hard classification,” each image is assigned a fixed class label which possesses a unique property, and the depth map is generated using the method that is suitable only for that class.
However, the hard classification method may lead to several problems. First, some images may not be strictly classified as belonging to a single class, and therefore the depth map generated according to the property of a single class for these images may not be optimal. Second, the non-optimally generated depth map in a misclassified image may lead to 3D image distortion. Lastly, misclassification of images may lead to temporal flickering of depth maps during the conversion of individual frames in video sequences, which may result in visually unpleasant 3D perception.