Lighting and shading operations are used in graphics rendering to enhance visual realism of computer-generated animation. Three-dimensional (3D) graphics rendering is the process of converting 3D models in a scene to a two-dimensional (2D) image consisting of an array of picture elements or "pixels." In real time 3D graphics, the position of the 3D models and the viewing perspective of the scene (the camera or viewpoint) vary with time, and the rendering system has to repeatedly sample the models and compute new output images to animate the objects depicted in the display image. Performed during the rendering process, lighting and shading operations enhance realism by modeling real world visual effects such as shadows, surface shading, and illumination from different types of light sources. Unfortunately, sophisticated shading operations consume additional rendering resources and are difficult to implement in real time graphics systems where new output images need to be generated repeatedly in only fractions of a second.
Conventional 3D rendering systems perform lighting and shading based on a shading model, specified by the author of the animation. In this context, the term "shading model" generally encompasses a variety of expressions used to represent visual effects such as lighting (sometimes referred to as illumination), shading, shadows, reflections, and texture maps. The shading model tells the rendering system how to modify pixel values, and more specifically, the color values at pixel locations throughout an image to achieve realistic visual effects such as lighting from multiple light sources, surface shading, and shadows.
While shading models can enhance realism, they can also consume a significant amount of processing resources, especially when evaluated across the entire scene for each new output image in an animation sequence. The computational burden is easier to understand in terms of a specific example. The surfaces of the 3D objects in a scene are typically modeled using a mesh of surface elements called polygons. A typical scene can easily include over ten thousand polygons. The rendering system transforms these polygons to a view space, removes hidden surfaces, and converts polygons into pixel values to compute an output image. For some lighting and shading operations, the rendering system has to make more than one rendering pass through all of the polygons in the entire scene. Now consider a display device with a refresh rate of 60 Hz and a spatial resolution of 1024.times.1024 pixels. While the rate at which the rendering system computes a new output image or "frame" does not have to be the same as the refresh rate of the display device, it should be around 60 Hz to produce high quality results. In this example, the rendering system has to process over ten thousand polygons, possibly multiple times, to compute pixel values at over million pixel locations in 1/60 of a second.
As is apparent from this example, it is difficult to perform sophisticated lighting and shading operations within the constraints of a real time system. In view of the rate with which new images need to be computed, rendering resources are severely limited. High-end graphics workstations have the computing power and bandwidth to re-render the entire scene at the same rate and resolution. However, these workstations are quite expensive and still have limitations in the extent to which they can make multiple rendering passes. One significant problem with conventional rendering architectures, even on high-end workstations, is that they render each of the objects in a scene at a fixed rate and spatial resolution. This tends to waste rendering resources because some aspects of the geometric or shading models do not need to be re-rendered at the same resolution and update rate to achieve high quality animation.
FIG. 1 is a high level diagram illustrating a conventional frame buffer architecture 20. A conventional graphics pipeline processes the entire scene database to produce each output image. The scene database (represented as the 3D scene 22) includes 3D graphical models, their attributes such as surface colors, translucency and textures, and any shading models applied to graphical models.
The quality parameters 24 of geometry level of detail and texture level of detail can be set independently for each object. However, other quality parameters 24 such as the sampling resolutions in time and space are global, with fixed values for the entire scene.
To generate each new output image, the renderer 26 process the entire scene database to compute an output image comprising an array of pixel values. As it produces pixel values, it places them in a frame buffer 28, which is a large, special purpose memory used to store pixel values for each pixel location in the output image. These pixel values can include a color triplet such as RGB or YUV color, translucency (alpha), and depth (z). The size of the pixel array in the frame buffer is consistent with the resolution of the display device. More concretely, each pixel location in the frame buffer usually corresponds to a screen coordinate of pixel on the display screen of a display device.
In frame buffer architectures, shading models can be rendered using multi-pass rendering techniques. Multi-pass rendering is a rendering technique in which the renderer 26 makes multiple passes through the scene database 22, using at least one pass to compute each term in the shading model. With each pass, the renderer computes new pixel values at the pixel locations and then combines the results with results from a previous pass, accumulated in the frame buffer 28.
Consider an example of a multi-pass rendering, in which a scene illuminated by a light source has shadows and a reflection. To compute the shadows, the scene is rendered from the perspective of the light source to create a depth map, and then rendered from the perspective of the view point to compute the extent to which each pixel is in shadow. The result of the shadowing passes is an array of shadow attenuation coefficients at pixel locations defining the extent to which each pixel's colors are attenuated due to the scene's shadowing. To compute the fully illuminated scene, the renderer renders the scene from the perspective of the view point, with the objects fully illuminated by the light source. The renderer generates the reflection by rendering the scene separately with a reflected camera or with an environment map to create a texture map. In a later pass, this texture map has to be mapped to the surface of an object or objects such as a mirror, window, or lake in the scene. With traditional architectures, the shadow attenuation coefficients, the rendering of the fully illuminated scene, and the texture map of the reflection can be combined into the frame buffer using pixel blend operations supported by the 3D hardware, as described by Mark Segal, Carl Korobkin, Rolf van Widenfelt, Jim Foran, and Paul Haeberli in Fast Shadows and Lighting Effects Using Texture Mapping, in proceedings of SIGGRAPH '92.
The problem with this form of multi-pass rendering is that it is performed using a fixed spatial resolution and update rate. Each new output image is computed using multiple rendering passes to create an image at the screen resolution. Specifically, in conventional architectures, each rendering pass computes pixels at the same spatial resolution. This is inefficient because some rendering passes do not need to be rendered at this resolution. In addition, each rendering pass is performed for each new output image. This is also inefficient because some rendering passes do not need to be updated this frequently, based on their relative importance to the quality of the output image.
Others have studied shading expressions extensively and have proposed techniques for factoring shading expressions. However, these techniques suffer from the same disadvantages as the technique of Segal et al., namely, they render scene elements at full resolution and do not use warping to reuse rendered images. Dorsey et al. factor only over the lights. See, Interactive Design of Complex Time Dependent Lighting, Julie Dorsey, Jim Arvo, Donald P. Greenberg, IEEE Computer Graphics and Application, March 1995, Volume 15, Number 2, pp. 26-36. Guenter et al. factor into pre-computed cached terms. See, Specializing Shaders, Brian Guenter, Todd B. Knoblock, and Erik Ruf, SIGGRAPH 95, pp. 343-350. Meier factors as a post-process. See, Painterly Rendering for Animation, Barbara J. Meier, SIGGRAPH 96, pp. 477-484.
Dorsey et al. factors shading expressions by light source and linearly combines the resulting images in the final display. Guenter et al. cache intermediate results. Meier uses image processing techniques to factor shadow and highlight regions into separate layers which are then re-rendered using painterly techniques and finally composited. None of these techniques render layers at varying spatial resolution, nor re-use rendered layers. Moreover, none of these techniques assign rendering resources based on the relative importance of a layer to the quality of the output image.