As is known, the art and science of three-dimensional (“3-D”) computer graphics concerns the generation, or rendering, of two-dimensional (“2-D”) images of 3-D objects for display or presentation onto a display device or monitor, such as a Cathode Ray Tube (CRT) or a Liquid Crystal Display (LCD). The object may be a simple geometry primitive such as a point, a line segment, a triangle, or a polygon. More complex objects can be rendered onto a display device by representing the objects with a series of connected planar polygons, such as, for example, by representing the objects as a series of connected planar triangles. All geometry primitives may eventually be described in terms of one vertex or a set of vertices, for example, coordinate (x, y, z) that defines a point, for example, the endpoint of a line segment, or a corner of a polygon.
To generate a data set for display as a 2-D projection representative of a 3-D primitive onto a computer monitor or other display device, the vertices of the primitive are processed through a series of operations, or processing stages in a graphics-rendering pipeline. A generic pipeline is merely a series of cascading processing units, or stages, wherein the output from a prior stage serves as the input for a subsequent stage. In the context of a graphics processor, these stages include, for example, per vertex operations, primitive assembly operations, pixel operations, texture assembly operations, rasterization operations, and fragment operations.
In a typical graphics display system, an image database (e.g., a command list) may store a description of the objects in the scene. The objects are described with a number of small polygons, which cover the surface of the object in the same manner that a number of small tiles can cover a wall or other surface. Each polygon is described as a list of vertex coordinates (X, Y, Z in “Model” coordinates) and some specification of material surface properties (i.e., color, texture, shininess, etc.), as well as possibly the normal vectors to the surface at each vertex. For three-dimensional objects with complex curved surfaces, the polygons in general must be triangles or quadrilaterals, and the latter can always be decomposed into pairs of triangles.
A transformation engine transforms the object coordinates in response to the angle of viewing selected by a user from user input. In addition, the user may specify the field of view, the size of the image to be produced, and the back end of the viewing volume so as to include or eliminate background as desired.
Once this viewing area has been selected, clipping logic eliminates the polygons, (i.e., triangles) which are outside the viewing area and “clips” the polygons, which are partly inside and partly outside the viewing area. These clipped polygons will correspond to the portion of the polygon inside the viewing area with new edge(s) corresponding to the edge(s) of the viewing area. The polygon vertices are then transmitted to the next stage in coordinates corresponding to the viewing screen (in X, Y coordinates) with an associated depth for each vertex (the Z coordinate). In a typical system, the lighting model is next applied taking into account the light sources. The polygons with their color values are then transmitted to a rasterizer.
For each polygon, the rasterizer determines which pixel positions the polygon and attempts to write the associated color values and depth (Z value) into frame buffer cover. The rasterizer compares the depth values (Z) for the polygon being processed with the depth value of a pixel, which may already be written into the frame buffer. If the depth value of the new polygon pixel is smaller, indicating that it is in front of the polygon already written into the frame buffer, then its value will replace the value in the frame buffer because the new polygon will obscure the polygon previously processed and written into the frame buffer. This process is repeated until all of the polygons have been rasterized. At that point, a video controller displays the contents of a frame buffer on a display a scan line at a time in raster order.
With this general background provided, reference is now made to FIG. 1, which shows a functional flow diagram of certain components within a graphics pipeline in a computer graphics system. It will be appreciated that components within graphics pipelines may vary from system to system, and may also be illustrated in a variety of ways. As is known, a host computer 10 (or a graphics API running on a host computer) may generate a command list 12, which comprises a series of graphics commands and data for rendering an “environment” on a graphics display. Components within the graphics pipeline may operate on the data and commands within the command list 12 to render a screen in a graphics display.
In this regard, a parser 14 may retrieve data from the command list 12 and “parse” through the data to interpret commands and pass data defining graphics primitives along (or into) the graphics pipeline. In this regard, graphics primitives may be defined by location data (e.g., x, y, z, and w coordinates) as well as lighting and texture information. All of this information, for each primitive, may be retrieved by the parser 14 from the command list 12, and passed to a vertex shader 16. As is known, the vertex shader 16 may perform various transformations on the graphics data received from the command list. In this regard, the data may be transformed from World coordinates into Model View coordinates, into Projection coordinates, and ultimately into Screen coordinates. The functional processing performed by the vertex shader 16 is known and need not be described further herein. Thereafter, the graphics data may be passed onto rasterizer 18, which operates as summarized above.
Thereafter, a z-test 20 is performed on each pixel within the primitive being operated upon. As is known, comparing a current z-value (i.e., a z-value for a given pixel of the current primitive) in comparison with a stored z-value for the corresponding pixel location performs this z-test. The stored z-value provides the depth value for a previously rendered primitive for a given pixel location. If the current z-value indicates a depth that is closer to the viewer's eye than the stored z-value, then the current z-value will replace the stored z-value and the current graphic information (i.e., color) will replace the color information in the corresponding frame buffer pixel location (as determined by the pixel shader 22). If the current z-value is not closer to the current viewpoint than the stored z-value, then neither the frame buffer nor z-buffer contents need to be replaced, as a previously rendered pixel will be deemed to be in front of the current pixel.
Again, for pixels within primitives that are rendered and determined to be closer to the viewpoint than previously-stored pixels, information relating to the primitive is passed on to the pixel shader 22 which determines color information for each of the pixels within the primitive that are determined to be closer to the current viewpoint. Color information includes whether or not pixels are within a shadow. As known in the prior art, one method for determining shadowed regions in a scene is through the use of shadow volumes.
Reference is now made to FIG. 2, which illustrates the shadow volume approach of generating a shadow effect in a computer graphics system. The shadow volume 34, as is known, defines the space in the shadow of a particular occluder 32 for a particular light source 30. Each polygon facing a light source 30 is an occluder 32 and therefore generates a shadow volume 34. A pixel 38 that falls within a shadow volume is rendered as being located in a shadow. The shadow volume method determines whether a pixel 38, 39 falls within a shadow volume 34 by counting the number times the ray 35 between the pixel 38, 39 and the viewer 36 enter 33 and exit 37 shadow volumes 34. If the number of times a ray enters 33 shadow volumes 34 is the same as the number of times the ray exits 37 shadow volumes 34 then the pixel 38, 39 is not in a shadow. For example, the ray 35 from the viewer 36 to pixel A 38 has one entry 33 into the shadow volume 34 and no exits 37 from the shadow volume 34. Thus, pixel A 38 is in a shadow. Similarly, since the ray 31 from the viewer 36 to pixel B 39 enters 33 the shadow volume 34 one time and exits 37 the shadow volume 34 one time, pixel B 39 is not in a shadow.
Since the ray tracing technique is very time consuming, especially with multiple occluders and multiple light sources, the stencil shadow volume method simplifies the operation by performing a simple in/out counting method using a stencil buffer, sometimes referred to as a stencil buffer level2 or SL2. The stencil buffer, SL2, stores and processes data for each pixel to perform a variety of functions including the stencil shadow volume method. Whether the pixel is in the shadow is determined by performing a z-test on the front-facing and back-facing polygons of shadow volumes relative to either the viewer or a maximum depth plane. For example, in one implementation of the stencil shadow volume approach, the stencil buffer value would be incremented if the front-facing polygon passes the z-test and the stencil buffer value would be decremented if the back-facing polygon passes the z-test. Thus, if the final stencil value is zero, the pixel is not in a shadow.
Referring now to FIG. 3, the stencil shadow volume method begins by clearing the stencil buffer 40 and rendering the scene with diffuse colors 42. This rendering provides data for the color buffer and the depth buffer 43, also referred to as the z-buffer. The z-buffer and color buffer updates are turned off 44 except for the stencil value that may reside in the z-buffer. For each light, the shadow volume is generated for each occluder and the front-facing polygons of the shadow volume are rendered 46. The stencil buffer value is incremented 47 for each pixel on which a front-facing polygon is drawn. The same operation is performed with the back-facing polygons 48, except the stencil buffer value is decremented 49 for each pixel on which a back-facing polygon is drawn. The pass where the stencil value is incremented and decremented is referred to as the stencil shadow volume pass. Objects in the shadow will be those having a non-zero stencil value 50 and are rendered accordingly. Objects not in the shadow will have a stencil value 50 of zero and are rendered with specular color 52. The pass where the pixels outside a shadow are rendered with specular color is referred to as the specular color pass. Referring back to FIG. 1, once color information is computed by the pixel shader 22, the information is stored within the frame buffer 24.
Referring back to FIG. 2, for example, the stencil buffer value for pixel A 38 is incremented one time for the front-facing shadow volume polygon that would be rendered at the entry 33 and not decremented because there are no back-facing shadow volume polygons for pixel A 38. The non-zero value remaining in the stencil buffer for pixel A 38 indicates that pixel A 38 is in a shadow. Similarly, the stencil buffer value for pixel B 39 is incremented one time for the front-facing shadow volume polygon that would be rendered at the entry 33 and decremented one time for the back-facing shadow volume polygon that would be rendered at the exit 37. Since the stencil buffer value is zero, pixel B 39 is not in a shadow and would be rendered with specular color. Although the example in FIG. 2 has a single occluder and a single light source, the stencil shadow volume approach works for multiple shadows created by multiple occluders and multiple light sources.
Reference is now made to FIG. 4, which illustrates a common implementation of a compressed z-data processing unit, sometimes referred to as ZL1. As is known, system performance is improved through the use of ZL1, which processes the z-data for a block or tile of multiple pixels. For pixels within a tile in which the z-data exceeds the range of the compression format associated with ZL1, the z-data must be processed at the pixel level in a pixel z-data processing unit, sometimes referred to as ZL2.
The ZL1 and ZL2 terminology generally stand for Z Buffer Level1 and Z Buffer Level2. There are several names for this type of algorithm including Hyper Z and Heirarchy Z Buffer. The two levels of Z Buffers allow the storage of higher level depth information for a larger processing unit, such as a tile, and the storage of depth information for the smallest granularity, such as an individual pixel in a screen. One advantage of ZL1 is to reduce the computing complexity of depth data in the rendering pipeline.
A tile generator 60 generates tile data for the tile of pixels, eight-by-eight for example, and sends a request to a cache 64, called the ZL1 cache. The tile data is sent to ZL1 62, which in turn communicates with the ZL1 cache 64. For the pixels having z-data that cannot be processed in ZL1 62, the z-data is processed in the pixel z-data processing unit 66, ZL2, in coordination with a ZL2 cache 68. In this configuration ZL1 62 can reject up to sixty-four pixels in one cycle and the non-rejected pixels are marked as accepted or retested to reduce the ZL2 66 memory traffic.
Although ZL1 62 reduces the memory read traffic for ZL2 66, the current solution cannot perform the stencil operation very efficiently. In this configuration, when the stencil operation is performed, ZL1 62 just marks all pixels as retest to ensure that the stencil operation will not leak. The rejected pixels will also have a stencil operation requiring access to ZL2 66. Thus during the stencil operation, ZL1 62 will be essentially by-passed resulting in significant memory traffic.
This is especially true when a ZL1 tile (subtile) is accepted or rejected after a z-compare function. Since the stencil operation will happen even if the subtile passes the z-test, ZL1 62 has to change the subtile from the ACCEPT state to the RETEST state and pass it down to ZL2 66. Currently ZL2 66, and the stencil buffer, SL2, may be combined such that the format of the ZL2/SL2 processing unit is thirty-two bits having a twenty-four-bit z-value and eight bits of stencil value. In the ACCEPT/REJECT states, the entire thirty-two-bit z/stencil value has to be read just to use the eight bit stencil value. This results in significant inefficiencies in terms of memory bandwidth. Although one solution would be to use separated stencil buffer and z-buffer, this scheme would result in a very small memory request. For example, for eight pixels, the memory request for an eight-bit stencil value would only be sixty-four bits, resulting in a great waste of memory traffic.
Although the foregoing has only briefly summarized the operation of the various processing components and techniques for generating shadows, persons skilled in the art recognize that processing the graphics data is quite intense. Consequently, it is desired to improve processing efficiency wherever possible.