Field of the Invention
One or more embodiments of the invention are related to the field of image processing. More particularly, but not by way of limitation, one or more embodiments of the invention enable a method of converting 2D video to 3D video using machine learning. Embodiments of the invention train a machine learning system to perform one or more 2D to 3D conversion steps. The machine learning system is trained on a training set that includes 2D to 3D conversion examples; it derives generalized 2D to 3D transformation functions from this training set. Embodiments of the invention may also obtain 3D models of objects, such as characters, and process the frames of a video to locate and orient these models in the frames. Depth maps and stereoscopic 3D video may then be generated from the 3D models. Embodiments of the invention may convert 3D scanner data to a set of rotated planes associated with masked areas of the data, thus forming a 3D object model. This enables the planes to be manipulated independently or as part of a group, and eliminates many problems associated with importing external 3D scanner data including minimization of errors that frequently exist in external 3D scanner data.
Description of the Related Art
Two-dimensional images contain no depth information and hence appear the same to an observer's left and right eye. Two-dimensional images include paper photographs or images displayed on a standard computer monitor. Two-dimensional images however may include shading and lighting that provide the observer a sense of depth for portions of the image, however, this is not considered a three-dimensional view of an image. Three-dimensional images on the other hand include image information that differs for each eye of the observer. Three-dimensional images may be displayed in an encoded format and projected onto a two-dimensional display. This enables three-dimensional or stereoscopic viewing for example with anaglyph glasses or polarized glasses. Other displays may provide different information based on the orientation with respect to the display, e.g., autostereoscopic displays that do not require special glasses for viewing three-dimensional images on a flat two-dimensional display. An example of such as display is a lenticular display. Alternatively, two images that are shown alternately to the left and right eyes may be viewed with shutter glasses. Regardless of the type of technology involved, conversion of two-dimensional images to stereoscopic images requires the addition of depth information to the two-dimensional input image.
Current solutions for conversion of two-dimensional images to stereoscopic images generally require large amounts of manual labor for highly accurate results. These manual masking systems generally operate by accepting manually created masks in order to define areas or regions in the image that have different depths and which generally represent different human observable objects. Depth information is then accepted by the system as input from artists for example, which results in nearer objects being shifted relatively further horizontally to produce left and right eye viewpoints or images, or Red/Blue anaglyph single image encodings, either of which may be utilized for stereoscopic viewing. By shifting objects in the foreground, hidden or background information may be exposed. If the missing image data is not shown in any other images in a scene, then the “gap” must be filled with some type of image data to cover the artifact. If the hidden image data does not exist in any other image in a scene, then this prohibits borrowing of pixels from the areas in other images that do contain the missing information. Various algorithms exist for filling gaps, which are also known as occlusion filling algorithms, to minimize the missing information with varying success. Generally, the depth artist gains visual clues from the image and applies depth to masks using artistic input.
The 2D to 3D conversion processes described above require large amount of manual labor. There are no known systems that automate the conversion process. However, because some organizations have performed hundreds or thousands of 2D to 3D conversions, there is a considerable database of conversion examples. In principle, machine learning techniques can be applied to develop generalized 2D to 3D conversion methods from such a historical database of conversion examples. Machine learning techniques are known in the art, but they have not been applied to 2D to 3D conversion. There are no known systems that apply machine learning techniques to develop 2D to 3D conversion methods using a database of conversion examples.
For at least the limitations described above there is a need for a method to convert 2D video to 3D video using machine learning.