Video graphics circuit generates pixel information for objects to be displayed on a computer screen, monitor or television. The source for the object may be a television broadcast, a cable television transmission, satellite transmission, computer generated program, a web-based image generator, or any other suitable image generation source. For computer screens, video graphics circuits partition each of the objects to be displayed into primitives. Each primitive is stored as a plurality of vertices of the corresponding display parameters for each vertex. Moreover, video graphics circuits also group a plurality of pixels into tiles, wherein a tile may be a specific number of pixels, for example, an 8×8 matrix of pixels.
For both the primitives and the tiles, each rendering element contains corresponding display parameters. Corresponding display parameters include, but are not limited to, color parameters, display or pixel locations and texture parameters. For corresponding display parameters, a video graphics circuit calculates slope and associated display parameters for each part within the primitive, based on the slopes and corresponding display parameters of other vertices.
When more than one object is to be displayed on a visual output, the objects may potentially overlap and the graphics processing may include unnecessary steps due to pixel information being calculated for an occluded object. When all of the pixel information for each primitive is calculated, a comparison is performed to determine which object is in the foreground. For the object that is in the background with respect to another object, the pixel information for the portion of the occluded object is discarded. As the calculation of such pixel information is unnecessary, it adversely affects the efficiency of the video graphics system. If only a small portion of an object is overlapped, the amount of unnecessary pixel information calculations are minimal, therefore there is a minimal adverse affect on the video graphics circuit efficiency. If, however, the object has a substantially overlapping portion, then the number of unnecessary calculations increases and the efficiency of the video graphics circuit are adversely affected. This may be compounded where several objects have overlapping portions and only one object will be visible in the foreground.
Another inefficiency arises when a stencil buffer is used during the render of an output image. One use of this is to do a first pass render which sets a stencil bit based on the ‘projection’ of a shadow, whereupon all pixels having a location within the stencil are potentially occluded. A second pass then renders the actual objects. Pixels that fall within the shadow are not visible and therefore may be unnecessarily rendered. Simply because a pixel has a common x,y coordinate, it must further be determined whether the pixel is visible in the z plane. Therefore, since the stencil may block out all pixels in the same x,y address, for example, such as hidden by the shadow, it is inefficient to render the pixels which are not visible due to the shadow and/or depth occlusion. A second algorithm used to render shadows, called ‘Shadow Volumes’ uses the Stencil Buffer instead to maintain a ‘count’ as the polygons that compose the boundaries of a shadow are rendered. If a pixel is in back of a shadow boundary, its count is incremented. If a pixel is in front of a shadow boundary its count is decremented. After all the shadow boundaries are rendered, only pixels that whose stencil (count) are 0 are truly in shadow. A final render pass is then done that ‘lights’ those pixels that are not in shadow. On this final pass, it is inefficient to process those pixels whose stencil value is 0, as they will ultimately not be written.
To overcome these inefficiencies, some video graphics circuits perform a hierarchical z buffering technique. Comparing multiple pixels having the same x, y location, wherein the z value of a pixel is compared to a stored z value, where the stored z value represents the outermost visible pixel, performs this operation, assuming that larger z values represent positions closer to the viewpoint. If the pixel to be rendered has a z value that is greater than the stored z value, the pixels may be rendered as these pixels may be visible. Also, the z value is updated to represent the value of the rendered pixel, as any other pixels of the same location having a smaller z value will be therein occluded by the rendered pixel.
Due to the amount of processing required to determine potential occlusion prior to rendering, hierarchical z determinations may be made on a tile having multiple pixels. Previous hierarchical z algorithms store a minimum z value per tile. Therefore, it can be determined if a pixel will fail a greater-than depth test but it cannot be determined if a pixel will fail or pass a less-than or equal-to depth comparison. Moreover, the tile having a minimum z value does not account for the stencil test. Therefore, the hierarchical z determination must be turned off from many operations, providing an inefficient graphics processing system.
FIG. 1 illustrates a prior art graphics rendering system 100 having a scan converter 102 that includes a setup engine 104, a coarse walker 106 with a tile cache 108, and the scan converter 102 further including a detail walker 112. The processing system 100 further includes a tile hierarchical z engine 116 coupled to the coarse walker 106, a pixel shader 114 coupled to the scan converter 102 and a memory 120, such as a first-in-first-out (FIFO).
The pixel shader 114 and the memory 120 are coupled to a depth and stencil test processor 122, which is coupled to a depth cache 124. The depth and stencil test engine 122 is coupled to a color blend 126 which is coupled to a color cache 128, wherein the depth cache 124 and the color cache 128 are coupled to a frame buffer memory bus 130.
In accordance with prior art rendering techniques, the scan converter 102 receives plurality of graphics information 140 and generates a plurality of pixels 142 provided to the coarse walker 106. The coarse walker 106 generates a tile, such as a matrix of pixels and provides the tile 144 to the tile hierarchical z engine 116. A tile with depth value 148 is provided to the quad hierarchical z engine 118, wherein the engine 118 utilizes depth information to determine a quad depth value. The tile HiZ logic 116 compares a range of depth values associated with the tile to a depth value stored for that tile, typically representing the most extreme depth value in the tile. It then sends a mask 154 of pixels that are guaranteed to fail the HiZ test to the detail walker 112, which combines this with tile information 156 to produce a list of pixels and associated information 160, which is processed by pixel shader 114, and a corresponding list of pixels and associated information 120 that bypasses the pixel shader and is stored in FIFO memory 120.
The pixel shader 114 operates in accordance with known pixel shading technology and provides shaded pixel information 162 to the depth and stencil test engine 122, wherein the engine 122 also receives corresponding information 160 from the buffer 120. Thereupon, the prior art depth and stencil test performed by the engine 122 compares a z value per tile to determine only if a pixel will fail a greater-than depth test. The depth cache 124 stores z and stencil values for each pixel being rendered. Thereupon, once the depth and stencil tests are performed on a tile, the tile may then proceed to other processing elements such as the color blend 126 to the color cache 128 such that the tiles of pixels, which are to be rendered, are therein provided to the render backend 136 across the frame buffer memory bus 130 to the memory 132 using known data transfer means.
One problem with existing hierarchical depth tests is that the minimum value of a tile is compared to the z range of the entire primitive. The z range of a large primitive may be much larger than the z range of the primitive within a particular tile. Therefore, inefficiencies exist based on system settings of primitives with regard to settings for number of pixels within a tile. Furthermore, tiles may be too large to allow an accurate minimum z value and the number of bits used to store the minimum z value may be insufficient. These above-noted limitations may result in pixels passing the hierarchical z test and being shaded, wherein these pixels are then later culled by depth and stencil tests.
As such, there exists a need for a method and apparatus for a rendering system which combines hierarchical stencil buffering and a more effective means of hierarchical z buffering with a plurality of pixels disposed in tiles and primitives.