1. Field of the Invention
One or more embodiments of the invention are related to the image processing. More particularly, but not by way of limitation, one or more embodiments of the invention enable a method of converting 2D video to 3D video using 3D object models for one or more objects in the 2D video. Embodiments of the invention obtain 3D models of objects, such as characters, and process the frames of a video to locate and orient these models in the frames. Depth maps and stereoscopic 3D video may then be generated from the 3D models. Embodiments of the invention may convert 3D scanner data to a set of rotated planes associated with masked areas of the data, thus forming a 3D object model. This enables the planes to be manipulated independently or as part of a group, and eliminates many problems associated with importing external 3D scanner data including minimization of errors that frequently exist in external 3D scanner data.
2. Description of the Related Art
Two-dimensional images contain no depth information and hence appear the same to an observer's left and right eye. Two-dimensional images include paper photographs or images displayed on a standard computer monitor. Two-dimensional images however may include shading and lighting that provide the observer a sense of depth for portions of the image, however, this is not considered a three-dimensional view of an image. Three-dimensional images on the other hand include image information that differs for each eye of the observer. Three-dimensional images may be displayed in an encoded format and projected onto a two-dimensional display. This enables three-dimensional or stereoscopic viewing for example with anaglyph glasses or polarized glasses. Other displays may provide different information based on the orientation with respect to the display, e.g., autostereoscopic displays that do not require special glasses for viewing three-dimensional images on a flat two-dimensional display. An example of such as display is a lenticular display. Alternatively, two images that are shown alternately to the left and right eyes may be viewed with shutter glasses. Regardless of the type of technology involved, conversion of two-dimensional images to stereoscopic images requires the addition of depth information to the two-dimensional input image.
Current solutions for conversion of two-dimensional images to stereoscopic images fall into two broad categories.
The first category involves systems that convert two-dimensional images into three-dimensional images wherein the two-dimensional images have no associated depth maps or other depth information. Systems in this category may be automated to provide depth information based on colors or areas of the picture, but these systems have had limited success. Other systems in this category require large amounts of manual labor for highly accurate results. These manual masking systems generally operate by accepting manually created masks in order to define areas or regions in the image that have different depths and which generally represent different human observable objects. Depth information is then accepted by the system as input from artists for example, which results in nearer objects being shifted relatively further horizontally to produce left and right eye viewpoints or images, or Red/Blue anaglyph single image encodings, either of which may be utilized for stereoscopic viewing. By shifting objects in the foreground, hidden or background information may be exposed. If the missing image data is not shown in any other images in a scene, then the “gap” must be filled with some type of image data to cover the artifact. If the hidden image data does not exist in any other image in a scene, then this prohibits borrowing of pixels from the areas in other images that do contain the missing information. Various algorithms exist for filling gaps, which are also known as occlusion filling algorithms, to minimize the missing information with varying success. Generally, the depth artist gains visual clues from the image and applies depth to masks using artistic input.
The main problems with this first category of conversion are time of conversion based on the large amount of manual labor and the expense of the conversion process.
The second category involves systems that convert two-dimensional images that have associated depth maps or other depth information, into three-dimensional images. The depth information may be obtained by the system from an external “time of flight” system, where light from a laser for example is sent towards the subject and timed to determine the distance after the light reflects back from the subject. The depth information may also be obtained by the system from a “triangulation” system, which determines the angles to a subject, for example from two sensors that are a known distance away from one another. Another apparatus that may obtain depth is a light-field or plenoptic camera having multiple lenses. A recent development has been the three camera system that includes a high resolution camera and two lower resolution side cameras or “witness cameras” mounted next to the high resolution camera. A depth map may be calculated from the disparity between the two side camera images and applied to the image obtained from the high-resolution camera to generate stereoscopic images. Any missing information may be filled with image data from the side cameras to minimize artifacts such as missing or hidden information, even if not at the same resolution. Another advantage of the trifocal system is the elimination of heavy and expensive stereo camera systems that have two large and optically identical and perfectly aligned lenses.
However, there are many problems that occur when using an externally generated depth map to a Z-depth. This includes any depth map created from a disparity map that is generated from a stereoscopic pair of images, for example captured with a two-lens stereo-camera or with the witness cameras of the trifocal system. One of the main problems is that depth maps provided by external systems are noisy, may include inaccurate edges, spikes and spurious values, all of which are common with Z-depths generated from external systems. Another problem is that since the depth maps correspond either on a pixel-by-pixel basis or at least generally fairly high resolution with the associated two-dimensional image, manipulating depth on this fine granularity is extremely difficult and time consuming. These types of systems are generally directed at automatically converting video or movies for stereoscopic viewing for example without masking objects and with the labor associated therewith. Artifacts on edges of objects are common in some systems limiting their overall automation capabilities.
In addition, in many cases it is impractical to obtain externally generated depth information for every frame in a 2D video. Technologies to generate 3D information, such as 3D scanners, are expensive and time-consuming to use. It may be practical in some cases to obtain 3D information for selected objects, such as characters. However, this 3D information is static rather than integrated into each frame of a 2D video. There are no known systems that take static 3D models of objects and propagate them across multiple frames of a 2D video to generate a 3D video. This propagation may also in some cases need to take into account degrees of freedom in an object where parts of the object move relative to one another. There are no known systems that generate rigged 3D models from external depth information, and propagate these models across all frames of a 2D video scene.
For at least the limitations described above there is a need for a method to convert 2D video to 3D video using 3D object models.