The present invention relates to computer graphics, and more particularly to performing depth clear operations in the context of a computer graphics pipeline.
Prior Art FIG. 1A is a block diagram of a digital processing system embodying the method and apparatus, in accordance with one embodiment. With reference to Prior Art FIG. 1A, a computer graphics system is provided that may be implemented using a computer 100. The computer 100 includes one or more processors, such as processor 101, which is connected to a communication bus 102. The bus 102 can be implemented with one or more integrated circuits, and perform some logic functions; for example, a typical personal computer includes chips known as north bridge and south bridge chips. The computer 100 also includes a main memory 104. Control logic (software) and data are stored in the main memory 104 which may take the form of random access memory (RAM). The computer also includes a hardware graphics pipeline 106 and a display 108, i.e. a computer monitor.
The computer 100 may also include a secondary storage 110. The secondary storage 110 includes, for example, a hard disk drive and/or a removable storage drive, representing a floppy disk drive, a magnetic tape drive, a compact disk drive, etc. The removable storage drive reads from and/or writes to a removable storage unit in a well known manner. Computer programs, or computer control logic algorithms, are stored in the main memory 104 and/or the secondary storage 110. Such computer programs, when executed, enable the computer 100 to perform various functions. Memory 104 and storage 110 are thus examples of computer-readable media.
In one embodiment, the techniques to be set forth are performed by the hardware graphics pipeline 106 which may take the form of hardware. Such hardware implementation may include a microcontroller or any other type of custom or application specific integrated circuit (ASIC). In yet another embodiment, the method of the present invention may be carried out in part on the processor 101 by way of a computer program stored in the main memory 104 and/or the secondary storage 110 of the computer 100. One exemplary architecture for the hardware graphics pipeline 106 will be set forth during reference to FIG. 1B.
Prior Art FIG. 1B illustrates a more detailed diagram showing the internal structure of one exemplary embodiment of the hardware graphics pipeline 106 of FIG. 1A. As shown, a geometry stage 153 is provided which transforms primitives into a screen-aligned coordinate system. Other computations may be performed by the geometry stage 153 such as lighting to determine the visual properties (e.g., color, surface normal, texture coordinates) of each vertex describing the primitives. The transformed vertices form the input for a rasterizer 154. The rasterizer 154 computes a fragment for each pixel covered by each of the primitives. A coverage mask stored with the fragment indicates which portions of the pixel the fragment covers.
With continuing reference FIG. 1B, after optional multi-sampling, individual samples are sent to a raster-processor (ROP) 155 as if they were regular fragments. The raster-processor 155 performs various operations on the fragments, including z/stencil testing and color or alpha blending. This may require the raster-processor 155 to read a frame buffer memory 156 in order to retrieve the destination z-value or the destination color. To this end, the final pixel color and z-value are written back to the frame buffer memory 156.
When all primitives in the scene have been rendered in this manner, the contents of the frame buffer memory 156 are scanned out by a video refresh unit 157 and sent to the display 108.
Prior Art FIG. 1C illustrates an architecture for performing stencil and z-value functions in the context of the ROP 155 of Prior Art FIG. 1B. As shown, a stencil value function module 180 and a z-value function module 182 are provided for performing various operations involving stencil values and z-values, respectively.
Associated with the stencil value function module 180 is a stencil state register 184 for storing information relating to pertinent stencil functions, stencil operations, the stencil reference value(s), etc. In use, the stencil value function module 180 is adapted to receive a stencil value from the frame buffer memory 156 and the information from the stencil state register 184 for conditionally enabling a stencil value write to the frame buffer memory 156.
On the other hand, the z-value function module 182 is capable of receiving a z-value from the frame buffer memory 156 and a z-value associated with a particular pixel from the rasterizer 154. With these inputs, the z-value function module 182 is adapted to conditionally enable a z-value write to the frame buffer memory 156. In use, the output of the stencil value function module 180 and the z-value function module 182 may be combined with an AND function 186 for conditionally enabling a depth and color value write to the frame buffer memory 156.
One operation carried out by the foregoing architecture is xe2x80x9cz-value buffering,xe2x80x9d whereby the z-values of the pixels are checked to ensure that the nearest object to the viewer is the one which is visible. To do this, each attempt to write to a pixel during rendering is checked against a stored frame buffer depth value in the existing data for that pixel, and the new data is written only if its depth value is less. In addition to these operations, there is a significant performance overhead associated with clearing the z-value buffer to infinity for each new frame. The impact of this operation can be substantial.
There are various techniques of utilizing the foregoing architecture and other various frameworks in order to reduce the number of depth clear operations in the hardware graphics pipeline 106.
In one prior art system, a depth range of [0,1] is split into two pieces, [0,0.5] and [0.5,1]. First, a frame is rendered into [0,0.5] in the normal fashion, but with the maximum z-value scaled to 0.5 instead of 1.0. Then, the z-value function module 182 is reversed to render into [0.5,1]. This provides the same results (in most cases) as if one had cleared the z-value buffer to a maximum value. While this technique gets rid of all z-value clears and no hardware support is required, 1 bit of z-value precision is lost and it does not work for all applications (i.e. only those that touch every pixel of the z-value buffer every frame).
In a variant to the foregoing system, a depth range of [0,1] is split into N ranges: [0,1/N], [1/N,2/N], . . . , [(Nxe2x88x921)/N,1], and rendered into the farthest range. Every time an application clears the z-value buffer to the maximum value, it moves down one range. A clear operation may be performed when one needs to wrap. While this technique reduces the number of clears by factor of N, works for all applications, and requires no hardware support; it unfortunately loses log2(N) bits of z-value precision, which becomes prohibitively large. Further, the present technique does not work exactly right if an EQUAL or NOTEQUAL comparison function is used.
Still yet another prior art solution for reducing depth clears involves tag clears. Such method keeps a buffer on-chip that indicates what pixels in the z-value buffer have been cleared. One can use 1 bit per pixel if desired, but one can also get by with 1 bit per tile (where a tile can be whatever size desired). Having a bit set for a tile would mean, xe2x80x9call z-values in this tile equal 1.0xe2x80x9d in a simple implementation. The present technique offers numerous advantages such as the fact that clears are almost free, it works for all applications, it works for color clears in some implementations, there is no loss in z-value precision, and reading the z-value of a cleared pixel is almost free in terms of resource usage. Unfortunately, however, die area for synchronous random access memory is needed, requiring a moderate amount of design effort and silicon area increase.
With z-value compression, clears may run fast because the buffer is compressed. Using compression techniques such as this can make a system run faster and can be combined with the foregoing techniques. Unfortunately, tag random access memory (RAM) is expensive and very complicated in design, costing many gates in chip design.
A system and method are provided for reducing the number of depth clear operations in a hardware graphics pipeline. Initially, a frame count is stored into a frame buffer associated with the hardware graphics pipeline. The stored frame count is associated with a pixel. A depth clear operation is then performed based at least in part on the frame count utilizing the hardware graphics pipeline.
In one embodiment, the frame count may be stored in a stencil state register associated with a stencil value function module. The frame count may also be stored in a frame count register. Further, the frame count register may be separate from the stencil state register associated with the stencil value function module.
In another embodiment, a pixel frame count may be stored in a stencil value in the frame buffer. Still yet, the pixel frame count may be stored in a stencil value for each pixel written into a surface. Moreover, the storage of the stencil value may be conditional upon on a mode bit.
In still another embodiment, a z-value to be cleared by the depth clear operation may be stored in a clear register. Further, either the z-value of the clear register or a z-value of a frame buffer may be selectively inputted to a z-value function module for conditionally executing the depth clear operation. Further, whether the z-value of the clear register or the z-value of the frame buffer is inputted to the z-value function module may be controlled based on a comparison involving the frame count and the pixel frame count represented by the stencil value received from the frame buffer.
In still yet another embodiment, the frame count may be conditionally written to a frame buffer utilizing a stencil value function module in the hardware graphics pipeline based on a mode bit. The storing and the executing may be performed for a plurality of portions of a surface on a xe2x80x9cregion-by-regionxe2x80x9d basis.
An associated system and method are provided for reducing the number of depth clear operations in a hardware graphics pipeline. Initially, it is determined whether a hardware graphics pipeline is operating in a first mode of operation or a second mode of operation. If the hardware graphics pipeline is operating in the first mode of operation, a frame count may be written to a frame buffer associated with the hardware graphics pipeline. On the hand, if the hardware graphics pipeline is operating in the second mode of operation, a conventional stencil value is written to the frame buffer associated with the hardware graphics pipeline.
An associated system is provided for reducing the number of depth clear operations in a hardware graphics pipeline. Included is a stencil value function module for writing a frame count in a frame buffer of a hardware graphics pipeline. Associated therewith is a z-value function module coupled to the stencil value function module for executing a depth clear operation based at least in part on the frame count utilizing the hardware graphics pipeline.
In one embodiment, a z-value to be cleared by the depth clear operation may be stored in a clear register. A multiplexer may be coupled to the clear register for selectively inputting either the z-value of the clear register or a z-value of the frame buffer to the z-value function module for conditionally executing the depth clear operation. As an option, the multiplexer may be controlled by a comparator for selectively inputting either the z-value of the clear register or the z-value of the frame buffer to the z-value function module.
In still another embodiment, the comparator controls the multiplexer based on the frame count and a pixel frame count represented by a stencil value received from the frame buffer. Moreover, the frame count may be conditionally stored utilizing the stencil value function module associated with the hardware graphics pipeline based on a mode bit.
These and other advantages of the present invention will become apparent upon reading the following detailed description and studying the various figures of the drawings.