1. Field of the Invention
One or more embodiments of the invention are related to the image processing. More particularly, but not by way of limitation, one or more embodiments of the invention enable an external depth map transformation method for conversion of two-dimensional images to stereoscopic images that provides increased artistic and technical flexibility and rapid conversion of movies for stereoscopic viewing. Embodiments of the invention convert a large set of highly granular depth information inherent in a depth map associated with a two-dimensional image to a smaller set of rotated planes associated with masked areas in the image. This enables the planes to be manipulated independently or as part of a group, and eliminates many problems associated with importing external depth maps including minimization of errors that frequently exist in external depth maps.
2. Description of the Related Art
Two-dimensional images contain no depth information and hence appear the same to an observer's left and right eye. Two-dimensional images include paper photographs or images displayed on a standard computer monitor. Two-dimensional images however may include shading and lighting that provide the observer a sense of depth for portions of the image, however, this is not considered a three-dimensional view of an image. Three-dimensional images on the other hand include image information that differs for each eye of the observer. Three-dimensional images may be displayed in an encoded format and projected onto a two-dimensional display. This enables three-dimensional or stereoscopic viewing for example with anaglyph glasses or polarized glasses. Other displays may provide different information based on the orientation with respect to the display, e.g., autostereoscopic displays that do not require special glasses for viewing three-dimensional images on a flat two-dimensional display. An example of such as display is a lenticular display. Alternatively, two images that are shown alternately to the left and right eyes may be viewed with shutter glasses. Regardless of the type of technology involved, conversion of two-dimensional images to stereoscopic images requires the addition of depth information to the two-dimensional input image.
Current solutions for conversion of two-dimensional images to stereoscopic images fall into two broad categories.
The first category involves systems that convert two-dimensional images into three-dimensional images wherein the two-dimensional images have no associated depth maps or other depth information. Systems in this category may be automated to provide depth information based on colors or areas of the picture, but these systems have had limited success. Other systems in this category require large amounts of manual labor for highly accurate results. These manual masking systems generally operate by accepting manually created masks in order to define areas or regions in the image that have different depths and which generally represent different human observable objects. Depth information is then accepted by the system as input from artists for example, which results in nearer objects being shifted relatively further horizontally to produce left and right eye viewpoints or images, or Red/Blue anaglyph single image encodings, either of which may be utilized for stereoscopic viewing. By shifting objects in the foreground, hidden or background information may be exposed. If the missing image data is not shown in any other images in a scene, then the “gap” must be filled with some type of image data to cover the artifact. If the hidden image data does not exist in any other image in a scene, then this prohibits borrowing of pixels from the areas in other images that do contain the missing information. Various algorithms exist for filling gaps, which are also known as occlusion filling algorithms, to minimize the missing information with varying success. Generally, the depth artist gains visual clues from the image and applies depth to masks using artistic input.
The main problems with this first category of conversion are time of conversion based on the large amount of manual labor and the expense of the conversion process.
The second category involves systems that convert two-dimensional images that have associated depth maps or other depth information, into three-dimensional images. The depth information may be obtained by the system from an external “time of flight” system, where light from a laser for example is sent towards the subject and timed to determine the distance after the light reflects back from the subject. The depth information may also be obtained by the system from a “triangulation” system, which determines the angles to a subject, for example from two sensors that are a known distance away from one another. Another apparatus that may obtain depth is a light-field or plenoptic camera having multiple lenses. A recent development has been the three camera system that includes a high resolution camera and two lower resolution side cameras or “witness cameras” mounted next to the high resolution camera. A depth map may be calculated from the disparity between the two side camera images and applied to the image obtained from the high-resolution camera to generate stereoscopic images. Any missing information may be filled with image data from the side cameras to minimize artifacts such as missing or hidden information, even if not at the same resolution. Another advantage of the trifocal system is the elimination of heavy and expensive stereo camera systems that have two large and optically identical and perfectly aligned lenses.
However, there are many problems that occur when using an externally generated depth map to a Z-depth. This includes any depth map created from a disparity map that is generated from a stereoscopic pair of images, for example captured with a two-lens stereo-camera or with the witness cameras of the trifocal system. One of the main problems is that depth maps provided by external systems are noisy, may include inaccurate edges, spikes and spurious values, all of which are common with Z-depths generated from external systems. Another problem is that since the depth maps correspond either on a pixel-by-pixel basis or at least generally fairly high resolution with the associated two-dimensional image, manipulating depth on this fine granularity is extremely difficult and time consuming. These types of systems are generally directed at automatically converting video or movies for stereoscopic viewing for example without masking objects and with the labor associated therewith. Artifacts on edges of objects are common in some systems limiting their overall automation capabilities.
For at least the limitations described above there is a need for a method for an external depth map transformation method for conversion of two-dimensional images to stereoscopic images.