1. Field of the Description
The present description relates, in general, to stereoscopic or three dimensional (3D) image generation, and, more particularly, to systems and methods for producing stereoscopic images or 3D content. The described systems and methods are useful for providing 3D content with images being generated with enhanced depth rendering, without discontinuities or other artifacts being produced during compositing, for objects or scene elements at differing depths (e.g., such enhancements may include use of camera settings or parameters that are set differently for foreground characters and for background scene elements).
2. Relevant Background
Computer animation has become a standard component in the digital production process for animated works such as animated films, television animated shows, video games, and works that combine live action with animation. The rapid growth in this type of animation has been made possible by the significant advances in computer graphics (CG) software and hardware that is utilized by animators to create CG images. Producing computer animation generally involves modeling, rigging, animation, and rendering. First, the characters, elements, and environments used in the computer animations are modeled. Second, the modeled virtual actors and scene elements can be attached to the motion skeletons that are used to animate them by techniques called rigging. Third, computer animation techniques range from key framing animation, where start and end positions are specified for all objects in a sequence, to motion capture, where all positions are fed to the objects directly from live actors whose motions are being digitized. Fourth, computer rendering is the process of representing visually the animated models with the aid of a simulated camera.
There is a growing trend toward using 3D projection techniques in theatres and in home entertainment systems including video games and computer-based displays. To render CG images for 3D projection (e.g., stereoscopic images), a pair of horizontally offset, simulated cameras is used to visually represent the animated models. More specifically, by using 3D projection techniques, the right eye and the left eye images can be delivered separately to display the same scene or images from separate perspectives so that a viewer sees three dimensional object positioning or a stereo setup, e.g., certain characters or objects appear nearer than the screen and others appear farther away than the screen. Stereoscopy, stereoscopic imaging, and 3D imaging are labels for any technique capable of retaining 3D visual information for producing the illusion of depth in an image. The illusion of depth in a photograph, movie, or other two-dimensional image is created by presenting a slightly different image to each eye. In most animated 3D projection systems, depth perception in the brain is achieved by providing two different images to the viewer's eyes representing two perspectives of the same object with a minor deviation similar to the perspectives that both eyes naturally receive in binocular vision.
The images or image frames used to produce such a 3D output are often called stereoscopic images or a stereoscopic image stream because the 3D effect is due to stereoscopic perception by the viewer. A frame is a single image at a specific point in time, and motion or animation is achieved by showing many frames per second (fps) such as 24 to 30 fps. The frames may include images or content from a live action movie filmed with two cameras or a rendered animation that is imaged or filmed with two camera locations. Stereoscopic perception results from the presentation of two horizontally offset images or frames, with one or more object slightly offset, to the viewer's left and right eyes, e.g., a left eye image stream and a right eye image stream of the same object. The amount of offset between the elements of left and right eye images determines the depth at which the elements are perceived in the resulting stereo image. An object appears to protrude toward the observer and away from the neutral plane or screen when the position or coordinates of the left eye image are crossed with those of the right eye image (e.g., negative parallax). In contrast, an object appears to recede or be behind the screen when the position or coordinates of the left eye image and the right image are not crossed (e.g., a positive parallax results).
With the recent growing surge in development and sale of 3D projection systems and devices, there is an increased demand for high quality stereoscopic images that provide high quality and pleasant viewing experiences. One challenge facing stereographers or 3D animators is how to create an aesthetically appealing image while avoiding the phenomenon of “cardboarding,” which refers to a stereoscopic scene or image that appears to include a series of flat image planes arrayed at varying depths (e.g., similar to a pop-up book). Rendering of left and right eye images is generally performed using linear depth processing using ray casting or ray tracing techniques that involve following a straight line, through a given pixel, connecting objects, light sources, and the simulated stereo cameras. CG images rendered with linear depth variation throughout the scene provides a real world view, but such rendering can produce cardboarding due to various combinations of lens focal lengths selected for the cameras and staging of the scene being imaged by the cameras. For example, there are generally trade offs between a viewer's comfort (e.g., limiting parallax to acceptable ranges) and cardboarding problems.
Another problem that arises in the staging and later rendering of a stereoscopic image is wasted space. The storytelling space for a stereographer includes the screen plane (i.e., at zero pixel shift), screen space into or behind the screen, and theater space toward the viewer or audience from the screen plane. The theater space is used by creating crossed or negative parallax while the screen space is used by creating divergent or positive parallax in the stereoscopic images. The total display space may be measured in pixels and is often limited to less than about 70 pixels in total depth. Wasted space occurs when a long lens is used for the cameras or when a foreground figure is ahead of an object with a normal lens. In these cases, there often is a relatively large amount of depth (e.g., large percentage of the 70 available pixels) located between a foreground figure and objects or environment elements located behind the foreground figure or object. Thus, the objects cardboard due to the limited depth precision available to them.
Some efforts to eliminate or limit the wasted storytelling space have included multi-rigging or using multiple camera pairs for each or several select objects to give better depth or volume to the CG image. For example, one camera rig or pair may be focused on a foreground figure while another is focused on a background object, and the resulting CG image levels are composited or combined to form the final CG image. The result can be a better rounded foreground figure (e.g., more depth in foreground and less cardboarding), flatter background images (e.g., similar to that experienced by an observer of a real-life scene where objects that are farther away appear to have less volume), and less wasted space.
Complex animation shots are, therefore, often not limited to a single stereo setup or pair of cameras as this allows an animator to assign different stereo depths for different groups of objects. Such differing camera parameters and settings allow greater artistic flexibility and control over the 3D effect. These requirements or desires have been addressed by using multi-rigging, which involves separately rendering a modeled scene with different pairs of stereo cameras and then combining or compositing the separately rendered layers of images (or output of the pairs of stereo cameras) together to form a 3D image or animation shot.
Unfortunately, using multiple camera pairs often has proven to be relatively complex with compositing being a tedious process. Additionally, multi-rigging is not always a useful solution because it does not produce acceptable results if there is a physical connection between the two objects that are the focus of the camera pairs. If both objects are shown to be touching the ground, disconnects or unwanted visual artifacts are created during compositing and rendering of the CG image such as where the ground contacts one or both of the objects. Multi-rig techniques depend upon being able to divide the scene into non-interconnected image levels since the depth tailoring offered by this technique creates a discrete set of linear depth functions and does not allow for seamless transitions blending between the depth functions. In other words, multi-rigging may be limited to shots where there is no interconnecting flooring or base.
As a result of these issues, multi-rigging has important artistic limitations as it requires objects rendered with different stereo parameters or camera settings to be clearly separable such as with an empty space between them. There are presently no reliable and practical techniques for producing seamless and visually pleasing transitions between stereoscopic settings along the viewing direction. For example, if one simply composites the foreground and background stereo camera outputs of a multi-rig setup by removing the distance or space between these outputs, a visual discontinuity or other visually apparent or rough disconnect is present in the rendered or output stereoscopic image.
Further, in regard to use of multi-rigging, the main purpose of the multi-rigging technique is not so much to reduce the “waste” but to combine stereoscopic representation (i.e., assign a particular stereo depth to objects that are located at a given distance from the cameras). However, traditional stereo camera multi-rigging has the significant limitation in that there should be a gap between the portions of the scene. For example, if there is a ground plane, most likely one could not use traditional multi-rigging because it would be in both portions and the product would include artifacts (i.e., discontinuities).