1. Field of the Invention
One or more embodiments of the invention are related to the field of image analysis and image enhancement and computer graphics processing of two-dimensional images into three-dimensional images. More particularly, but not by way of limitation, one or more embodiments of the invention enable a rapid workflow system and method for image sequence depth enhancement that enables high quality conversion of a large number of two-dimensional images into corresponding stereoscopic image pairs, or other three-dimensional viewing enabled images, such as an anaglyph through local modification of images that eliminates computationally expensive iterative ray tracing of light paths through each pixel in large format left and right eye images each time a minor change is made to depth or to fix artifacts.
2. Description of the Related Art
Known methods for the colorizing of black and white feature films involves the identification of gray scale regions within a picture followed by the application of a pre-selected color transform or lookup tables for the gray scale within each region defined by a masking operation covering the extent of each selected region and the subsequent application of said masked regions from one frame to many subsequent frames. The primary difference between U.S. Pat. No. 4,984,072, System And Method For Color Image Enhancement, and U.S. Pat. No. 3,705,762, Method For Converting Black-And-White Films To Color Films, is the manner by which the regions of interest (ROIs) are isolated and masked, how that information is transferred to subsequent frames and how that mask information is modified to conform with changes in the underlying image data. In the U.S. Pat. No. 4,984,072 system, the region is masked by an operator via a one-bit painted overlay and operator manipulated using a digital paintbrush method frame by frame to match the movement. In the U.S. Pat. No. 3,705,762 process, each region is outlined or rotoscoped by an operator using vector polygons, which are then adjusted frame by frame by the operator, to create animated masked ROIs. Various masking technologies are generally also utilized in the conversion of 2D movies to 3D movies.
In both systems described above, the color transform lookup tables and regions selected are applied and modified manually to each frame in succession to compensate for changes in the image data that the operator detects visually. All changes and movement of the underlying luminance gray scale is subjectively detected by the operator and the masks are sequentially corrected manually by the use of an interface device such as a mouse for moving or adjusting mask shapes to compensate for the detected movement. In all cases the underlying gray scale is a passive recipient of the mask containing pre-selected color transforms with all modifications of the mask under operator detection and modification. In these prior inventions the mask information does not contain any information specific to the underlying luminance gray scale and therefore no automatic position and shape correction of the mask to correspond with image feature displacement and distortion from one frame to another is possible.
Existing systems that are utilized to convert two-dimensional images to three-dimensional images may also require the creation of wire frame models for objects in images that define the 3D shape of the masked objects. The creation of wire frame models is a large undertaking in terms of labor. These systems also do not utilize the underlying luminance gray scale of objects in the images to automatically position and correct the shape of the masks of the objects to correspond with image feature displacement and distortion from one frame to another. Hence, great amounts of labor are required to manually shape and reshape masks for applying depth or Z-dimension data to the objects. Motion objects that move from frame to frame thus require a great deal of human intervention. In addition, there are no known solutions for enhancing two-dimensional images into three-dimensional images that utilize composite backgrounds of multiple images in a frame for spreading depth information to background and masked objects. This includes data from background objects whether or not pre-existing or generated for an occluded area where missing data exists, i.e., where motion objects never uncover the background. In other words, known systems gap fill using algorithms for inserting image data where none exists, which causes artifacts.
Current methods for converting movies from 2D to 3D that include computer-generated elements or effects, generally utilize only the final sequence of 2D images that make up the movie. This is the current method used for conversion of all movies from two-dimensional data to left and right image pairs for three-dimensional viewing. There are no known current methods that obtain and make use of metadata associated with the computer-generated elements for a movie to be converted. This is the case since studios that own the older 2D movies may not have retained intermediate data for a movie, i.e., the metadata associated with computer generated elements, since the amount of data in the past was so large that the studios would only retain the final movie data with rendered computer graphics elements and discard the metadata. For movies having associated metadata that has been retained, (i.e., intermediate data associated with the computer-generated elements such as mask, or alpha and/or depth information), use of this metadata would greatly speed the depth conversion process.
In addition, typical methods for converting movies from 2D to 3D in an industrial setting capable of handling the conversion of hundreds of thousands of frames of a movie with large amounts of labor or computing power, make use of an iterative workflow. The iterative workflow includes masking objects in each frame, adding depth and then rendering the frame into left and right viewpoints forming an anaglyph image or a left and right image pair. If there are errors in the edges of the masked objects for example, then the typical workflow involves an “iteration”, i.e., sending the frames back to the workgroup responsible for masking the objects, (which can be in a country with cheap unskilled labor half way around the world), after which the masks are sent to the workgroup responsible for rendering the images, (again potentially in another country), wherein rendering is accomplished by ray tracing the path of light through each pixel in left and right images to simulate the light effects the path of light interacts with and for example bounces off of or through, which is computationally extremely expensive. After ray tracing, the rendered image pair is sent back to the quality assurance group. It is not uncommon in this workflow environment for many iterations of a complicated frame to take place. This is known as “throw it over the fence” workflow since different workgroups work independently to minimize their current work load and not as a team with overall efficiency in mind. With hundreds of thousands of frames in a movie, the amount of time that it takes to iterate back through frames containing artifacts can become high, causing delays in the overall project. Even if the re-rendering process takes place locally, the amount of time to re-render or ray-trace all of the images of a scene can cause significant processing and hence delays on the order of at least hours. Elimination of iterations such as this would provide a huge savings in wall-time, or end-to-end time that a conversion project takes, thereby increasing profits and minimizing the workforce needed to implement the workflow.
Hence there is a need for a rapid workflow system and method for image sequence depth enhancement.