A major objective in graphics rendering is to produce images that are so realistic that the observer believes the image is a real. A fundamental difficulty in achieving total visual realism is the complexity of accurately representing real world visual effects. A scene can include a wide variety of textures, subtle color gradations, reflections, translucency, shadows, etc. One way to make images more realistic is to determine how objects in a scene cast shadows and then represent these shadows in the rendered image. Shadows enhance the realism of an image because they give a two-dimensional image a three-dimensional feel. Below, we begin with a brief introduction of graphics rendering, and then we highlight some of the problems associated with accurately computing shadows.
In 3-D graphics applications, objects in a scene are represented by 3-D graphical models, which include geometric data used to model the surface and position of the objects, and visual attributes used to model their appearance. There are a number of ways that a geometric model can represent a 3-D object, including polygon meshes, parametric surfaces, or quadratic surfaces. Using a polygon mesh, for example, the surface of an object is modeled with several interconnected polygons. The surface elements, in this case polygons, are referred to as geometric primitives. Visual attributes such as red, green, and blue color data, and possibly other model data are typically stored in a data structure representing the vertices or polygons.
In the rendering process, the geometric primitives corresponding to objects in a scene are processed to generate a display image. In the context of 3-D graphics, the rendering process includes transforming the objects to display device coordinates, and rasterizing the geometric primitives in the models to generate pixel values for the pixel elements of a display image. Potentially visible objects in a particular scene are identified by transforming objects into a common three-dimensional coordinate system and then determining whether the objects overlap a view volume, a three-dimensional space defining the bounds of a scene. The geometric primitives of potentially visible objects are then transformed to display device coordinates, and rasterized into pixel data. Before rasterizing the primitives, it is common to eliminate surfaces that face away from the viewpoint in a process known as "backface culling."
Rasterizing generally refers to the process of computing a pixel value for a pixel based on data from the geometric primitives that project onto or "cover" the pixel. Rasterizing is sometimes referred to as "tiling" because of the analogy to tiling a floor. Imagine that the pixels are square elements or tiles, and that a polygon is the floor plan. The rasterizing step includes tiling this floor plan by computing pixel values for the pixels or "tiles" within the polygon.
As part of the rendering process, hidden surface removal is performed on the potentially visible objects in a scene. Objects are referred to as potentially visible because they reside in or overlap the view volume. However, some of the objects or parts of objects in the view volume will not be represented in the rendered image because they are blocked by other objects. Hidden surface removal refers to the process of determining which objects or portions of objects are, and conversely, are not visible in the scene. During this process, the graphics system determines which objects or portions are visible from the viewpoint.
One approach to hidden surface removal is referred to as the Z-buffer algorithm. In this approach, a "Z-buffer" is used to perform hidden surface removal on pixel data generated as primitives are rasterized. The letter "z" refers to a depth value and originates from the common practice of expressing distance from the viewpoint using the Z axis in a three-dimensional coordinate system. The Z-buffer is used to store pixels closest to the viewpoint for each pixel location in an image. As a primitive is rasterized, pixel data including a depth value is generated. The depth of a newly generated pixel is compared with a pixel stored in the Z-buffer for the same pixel location. If the newly generated pixel is further from the view point than the stored pixel, it is rejected. If not, it replaces the pixel stored in the Z-buffer. This process continues until an entire frame of pixels is generated.
Just as objects can occlude other objects from the perspective of the view point, some objects can occlude other objects from the perspective of a light source. In this case, objects closer to the light source can cast a shadow on other objects in the scene. Shadowing refers to the process of determining which objects are shadowed and representing shadows in a rendered image.
FIG. 31 is a simple example illustrating the concept of shadowing. In shadowing an object 1230, the graphics system determines which surfaces of the object are visible from the perspective of a light source 1234. Surfaces of an object 1230 that are visible from a light source 1234 are illuminated 1236 (i.e. not in shadow), while surfaces that are not visible from the light source are shadowed 1238.
One approach to perform shadowing is to use the Z-buffer to calculate shadows as set forth in W. T. Reeves, D. Salesin, and R. L. Cook in "Rendering Antialiased Shadows with Depth Maps", SIGGRAPH '87 Proceedings, July 1987, 21(4), pp. 283-291, and L. Williams in "Casting Curved Shadows on Curved Surfaces" in Computer Graphics, August 1978, 12(3), pp. 270-274.
Reeves, Salesin and Cook use the Z-buffer to compute shadows in three rendering passes. In the first pass, the geometric primitives are rendered from the point of view of the light source 1234 to produce a depth map. In this pass, the Z-buffer is used to compute a Z-value for each element in the depth map representing the object closest to the light source at each element. In the second pass, the same primitives are rendered from the viewpoint 1232. In the third pass, each time a pixel is generated, it is transformed back into light source coordinates and compared against the value retrieved from the depth map. If the Z-value of the pixel in light coordinates (z.sub.P) is less than the Z-value from the depth map (z.sub.L) then the point is illuminated; otherwise, it is in shadow.
This Z-buffer algorithm can cause self-shadowing artifacts because only a single Z-value is stored for each element in the depth map. Assume for example that a Z-value in the depth map represents the distance of a single point on a curved object. In affect, the shadow depth map improperly represents the curved surface because it assumes that a region around the point has a constant Z-value. Actually, pixels around this single point have slightly higher or lower Z-values. Thus, when these pixels are transformed into light space coordinates and compared with the single depth value in the depth map, they may be erroneously determined to be shadowed.
A solution to this problem is to add a small value, called the bias, to each Z-value stored in the depth map. This bias value is as small as possible so that the bias does not remove shadows. Shadows are improperly removed if the bias is too big because the value in the depth map is pushed behind the second closest object to the light source. Thus, the bias must be carefully chosen so that it is not too big. At the same time, the bias must also be big enough to prevent the problem of self-shadowing artifacts.
A few hardware/software systems which implement the three pass Z-buffer scheme use a global constant for the bias. See W. T. Reeves, D. Salesin, and R. L. Cook in "Rendering Antialiased Shadows with Depth Maps", SIGGRAPH '87 Proceedings, July 1987, 21(4), pp. 283-291, S. Upstill, in The RenderMan Companion, Addison-Wesley Publishing Company, Reading, Massachusetts, 1989, and M. Segal, C. Korobking, R. Van Widenfelt, J. Foran, and P. Haeberli, in "Fast Shadows and Lighting Effects using Texture Mapping," Computer Graphics (SIGGRAPH '92 Proceedings), July 1992, 26(2), pp. 249-252.
Using a global constant for the bias is problematic for real-time applications because it is not possible to predict an accurate global bias that will apply for each frame of an animation. Even within a single frame, a global bias may be insufficient to eliminate all self-shadowing artifacts without incorrectly eliminating desired shadows. The problem is particularly acute for low-resolution depth maps as shown by A. Woo in "The Shadow Depth Map Revisited," in Graphics Gems, edited by D. Kirk, Academic Press, Boston, Mass., 1992, pp. 338-342.
To combat the self-shadowing artifacts, Woo suggests that the depth value should be determined by averaging Z-values from the two closest objects to the light source. If only one object projects onto an element of the depth map, Woo suggests that the Z-value should be set to a large number to guarantee that the object surface is always illuminated. Woo does not describe how to implement such an approach.
Another problem that can arise in Z-buffer schemes including Woo's scheme is caused by aliasing in the light depth map. Aliasing in the light depth map causes light to "leak" through occluding geometry near object silhouettes with respect to the light source (also called the terminator region). This problem also occurs for very small objects that do not cover any pixel centers. An example will help illustrate the problem. Consider a sphere which is entirely blocked from a light source by another object. Because the sphere is only coarsely represented in the shadow depth map, it is very likely that a portion of the sphere's surface at the terminator region will not be represented in the shadow depth map. In Woo's approach, the shadow depth map would store a very large Z-value for this location because it cannot represent the sphere's terminator region. As a result, the sphere would be illuminated improperly at its edge.
Additional shadowing artifacts can result if only a single sample in the shadow depth map is used to determine whether a pixel should be shadowed. For instance, if the Z-value for a pixel transformed into light space coordinates is compared only against a single Z-value in the depth map, aliasing will occur in the transition from shadowed and un-shadowed regions.
A simple box filter can be used to make the transition between full illumination and full shadowing smoother. See W. T. Reeves, D. Salesin, and R. L. Cook in "Rendering Antialiased Shadows with Depth Maps", Computer Graphics (SIGGRAPH '87 Proceedings), July 1987, B. A. Barsky, Ed. 21(4), pp. 28-291. In this article, the authors suggest that a box filter can be used to determine how much of a pixel neighborhood surrounding a pixel is in shadow. The box filter is computed by determining how many discrete elements in a neighborhood around a pixel are in shadow using neighboring samples in the depth map. The number of elements in shadow are summed and this sum is used to determine how to shadow the pixel.
Using a box filter for the shadow modulation filter results in a piecewise constant reconstruction. While this is an improvement over making a single comparison to a sample in the depth map to determine how to shadow a pixel, it does not produce high quality images, especially with low resolution depth maps.