I. Field
The present disclosure relates generally to stereoscopic video technology, and more specifically to techniques for complexity-adaptive 2D-to-3D image and video conversion.
II. Background
The development of stereoscopic video technology enables the three-dimensional (3D) perception of realistic scenes via the binocular disparity between left and right views. This mimics the human vision system that can obtain two separate views seen by our left and right eyes. Stereoscopic video technology is based on an assumption that a significant portion of the human brain is solely dedicated to the processing of binocular information. In other words, these stereo systems take advantage of the capability of our brains to measure the disparity of views and gauge the relative distances of the objects in the scene from the observer.
Currently, tremendous efforts are made in developing approaches for capturing, processing, compressing, transmitting, and displaying of stereo videos. Some other efforts are made in standardizing these technologies. However, most of the current multimedia devices deployed are implemented within the monoscopic infrastructure, and most of the videos created and sold in the market are two-dimensional (2D) movies. Therefore the 2D-to-3D video conversion techniques are expected to expand the 3D media consumer market.
Recently, there has been some attempts in converting images from 2D to 3D. In one approach, a real time method computes the depth of a number of separated areas of the 2D images from their contrast, sharpness and chrominance, and from the motion information. Thereafter, conversion is conducted based on the obtained depth information. In another approach, a facial feature based parametric depth map generation scheme converts 2D head-and-shoulder images to 3D. Similarly in a still further approach, both kinematics and 3D human walking motion models are used as sources of prior knowledge to estimate 3D gait of the monocular image sequences.
In another attempt to convert images from 2D to 3D, the depth maps are extracted based on a mixed set of automatic and manual technologies, where a manual processing is called when the automatic data correlation analysis fails. In a still further attempt, an unsupervised method for depth-map generation was proposed, however, some steps in the approach, for example the image classification in preprocessing, are not trivial and may be very complicated to implement. Accordingly, implementation would not be practical. In a still further attempt, a real-time 2D to 3D image conversion algorithm uses motion detection and region segmentation; however, the artifacts are not avoidable due to the inaccuracy of object segmentation and object depth estimation. Segmented objects are used to avoid the object segmentation artifacts.
In a still further approach to convert images from 2D to 3D, the camera motion analysis is conducted on the motion vector data of VOPs (Video Object Planes) and the objects are horizontally shifted differently according to the camera motion type. In a still further approach, the typical SfM (Structure from Motion) methods, for example extended Kalman filters, are extended to the object-level processing. In a still further approach, a new on-line ICA mixture model is used for image segmentation, and then the system goes through depth estimated and pixel shifting algorithm to generate a 3D-effect image.
As can be readily seen, the existing approaches for 2D-to-3D image and/or video conversion are not complexity-adaptive. Furthermore, the known approaches for 2D-to-3D image and/or video conversion are not generic for both real-time and offline 2D-to-3D video conversion or used to enhance the 3D effect of some previous recorded 2D movies.
There is therefore a need in the art for techniques for complexity-adaptive 2D-to-3D image and video conversion.