Three-dimensional (3D) graphics rendering is the process of converting 3D models in a scene to a two-dimensional (2D) image consisting of an array of picture elements or "pixels." In real time 3D graphics, the position of the 3D models and the viewing perspective of the scene (the camera or viewpoint) vary with time, and the rendering system has to repeatedly sample the models and compute new output images to animate the objects depicted in the display image. Performed during the rendering process, lighting and shading operations enhance realism by modeling real world visual effects such as shadows, surface shading, and illumination from different types of light sources. Unfortunately, sophisticated shading operations consume additional rendering resources and are difficult to implement in real time graphics systems where new output images need to be generated repeatedly in only fractions of a second.
FIG. 1 is a high level diagram illustrating a conventional frame buffer architecture 20. A conventional graphics pipeline processes the entire scene database to produce each output image. The scene database (represented as the 3D scene 22) includes 3D graphical models, their attributes such as surface colors, translucency and textures, and any shading models applied to graphical models. The quality parameters 24 of geometry level of detail and texture level of detail can be set independently for each object. However, other quality parameters 24 such as the sampling resolutions in time and space are global, with fixed values for the entire scene.
To generate each new output image, the renderer 26 process the entire scene database to compute an output image comprising an array of pixel values. As it produces pixel values, it places them in a frame buffer 28, which is a large, special purpose memory used to store pixel values for each pixel location in the output image. These pixel values can include a color triplet such as RGB or YUV color, translucency (alpha), and depth (z). The size of the pixel array in the frame buffer is consistent with the resolution of the display device. More concretely, each pixel location in the frame buffer usually corresponds to a screen coordinate of pixel on the display screen of a display device.
In contrast to the conventional frame buffer approach, a fundamentally different approach for generating images is to build parts of an image in separate layers and then composite or superimpose the image layers with each other to construct an output image. Animated cartoons, video games and movie special effects have used a similar approach to construct images. For example, to create animated cartoons, an artist draws a cartoon character in different positions to simulate the character's motion from frame to frame. The drawing of the character can be superimposed on a static background that remains the same for several frames. Some video games use image compositing to superimpose an image or "sprite" onto a static image representing the background of a scene. The movie industry has used image compositing to combine images into a final output image. Porter and Duff have described how to combine images using image operators. See Compositing Digital Images, Thomas Porter and Tom Duff, Siggraph 1984, pp. 253-259.
The rendering of scene elements to independent layers can also be extended to real time computer graphics. Specifically, parts of an animated 3D graphics scene can be rendered independently at different update rates and composited to compute frames of animation. See Co-pending patent application Ser. No. 08/671,412 by Nathan P. Myhrvold, James T. Kajiya, Jerome E. Lengyel, and Russell Schick, entitled Method and System for Generating Images Using Gsprites (filed on Jun. 27, 1996), now issued as U.S. Pat. No. 5,867,166, which is hereby incorporated by reference. This patent application describes how to simulate motion of 3D objects by transforming an initial rendering of an object to a new location on the display screen. This can be thought of as a form of interpolation because it approximates the change in position of an object in between renderings of the object. A general 2D transform such as an affine or perspective warp can be used to approximate more complex 3D motion.
While 2D image warps reduce rendering overhead, they can introduce noticeable errors in the output image. One way to measure this error is to use characteristic points on an object to compare the distance in screen coordinates between points in a warped image and the same points from the object's model, projected into the view space. The distances between these points are a measure of the geometric error of the warped image. The geometric error provides some information about the fidelity of the warped image layer and can be used to determine when the object should be re-rendered, rather than approximated using an image warp.
The geometric error does not accurately reflect the fidelity of the warped image in all cases, however. For example, it is possible to have almost no geometric error yet still have noticeable distortion. An image layer representing a rendering of an object can be scaled to simulate the object moving closer or farther from the view point. The geometric error may be negligible in this case, but the samples may become so large that they become blurry or so small that the rendering system is incapable of filtering them. In addition, it is possible for an object and the viewpoint to remain stationary while the light source moves or changes over time. In this case, the geometric error will be zero yet the image layer will not accurately reflect the change in lighting. It is also possible for some portion's of an object's surface to become hidden or for the object to move onto the viewing frustum with little or no geometric error. While the geometric error may be small, the changes in visibility can cause significant changes in the fidelity of the output image.