1. Field of the Invention
The present invention generally relates to the field of computer graphics systems. More particularly, the present invention relates to rasterization and fill rate optimization within computer graphics systems.
2. Description of the Related Art
Modern graphics systems have been rapidly increasing their performance as the result of ever higher clock speeds and improved levels of integration. Smaller feature sizes on integrated circuits and higher clock frequencies have led to significant increases in the both number of triangles that may be rendered per frame and the number of frames that may be rendered per second.
However, new applications such as three-dimensional (3D) modeling, virtual reality, and 3D computer games continue to demand even greater performance from graphics systems. Thus, system designers have continued to improve performance throughout the entire graphics system pipeline to try and meet the performance needs of these new applications.
FIG. 1 illustrates one example of a generic graphics system, but numerous variations are possible and contemplated. As shown in the figure, the system is a pipeline in which graphics data is initially read from a computer system""s main memory into the graphics system. The graphics data may include geometric primitives such as polygons, NURBS (Non-Uniform Rational B-Splines), sub-division surfaces, voxels (volume elements) and other types of data. The various types of data are typically converted into triangles (e.g., three vertices having at least position and color information). Then, transform and lighting calculation units 50 receive and process the triangles. Transform calculations typically include changing a triangle""s coordinate axis, while lighting calculations typically determine what effect, if any, lighting has on the color of triangle""s vertices. The transformed and lit triangles are then conveyed to a clip test/back face culling unit 52 that determines which triangles are outside the current parameters for visibility (e.g., triangles that are off screen). These triangles are typically discarded to prevent additional system resources from being spent on non-visible triangles.
Next, the triangles that pass the clip test and back-face culling are translated into screen space 54. The screen space triangles are then forwarded to the set-up and draw processor 56 for rasterization. Rasterization typically refers to the process of generating actual pixels by interpolation from the vertices. In some cases samples are generated by the rasterization process instead of pixels. A pixel typically has a one-to-one correlation with the hardware pixels present in a display device, while samples are typically more numerous than the hardware elements and need not have any direct correlation to the display device. Regardless of whether pixels or samples are used, once drawn they are stored into a frame buffer 58.
Next, the pixels are read from frame buffer 58 and converted into an analog video signal by digital-to-analog converters 60. If samples are used, the samples are read out of frame buffer 58 and filtered to generate pixels, which are stored and later conveyed to digital to analog converters 60. The video signal from converters 60 is conveyed to a display device 62 such as a computer monitor, LCD display, or projector.
As noted above, many applications place great demands on graphics systems. In some graphics systems, the rasterization algorithm is configured to calculate multiple pixels/samples per clock cycle called xe2x80x9ctilesxe2x80x9d. Unfortunately, this can lead to less than ideal datapath utilization due to an effect called fragmentation. Fragmentation occurs when a portion of the rasterization hardware is assigned to areas outside of the geometry currently being rasterized. For example, a rasterization algorithm that calculates tiles of two horizontally adjacent pixels per cycle may experience fragmentation when the geometry being rasterized has an odd width in pixels. The last cycle of rasterization on an odd width will have only one pixel to calculate. The adjacent pixel, being outside of the current geometry, will not be rendered. This causes an inefficiency as subsequent hardware in the pipeline will be unused for this tile""s missing or disabled pixel. Thus, a system and method capable of improving fill rate performance with respect to fragmentation is desired.
The problems set forth above may at least in part be solved or reduced in some embodiments by a system and method that are configured to select tiles of sample bins, wherein the tiles are two-dimensional arrays of bins of samples. Advantageously, by selecting one sample from each sample bin in the tile of bins per cycle, improved utilization of the rasterization and rendering pipeline may potentially be achieved in some implementations.
In one embodiment, the method for rendering graphics data may include receiving a geometric primitive and selecting an Nxc3x97M tile of sample bins at least partially intersecting the geometric primitive. N and M are positive integers, and at least one of N and M are greater than one. Next, one sample is selected from each sample bin in the Nxc3x97M tile of bins for a first cycle. The selected samples are then forwarded for rendering. The rendered samples may be stored and then filtered into pixels. The pixels may be stored until they are output for display on a display device. Additional sets of samples may be selected from the tile in subsequent cycles until all samples in the tile have been selected and rendered.
In some embodiments, the method may also include determining whether each of the selected samples are inside the particular geometric primitive, and tagging the samples as being either inside or outside the particular geometric primitive. Furthermore, in some embodiments the method may include storing the selected samples to a FIFO (first in first out) memory. The stored samples may then be read from the FIFO memory and rendered. Once rendered, the samples may be filtered to form pixels which are displayable to form an image (e.g., on a display device such as a computer monitor). While each implementation may vary, in some embodiments N may be set to equal 2 and M may be set to equal 1. Similarly, in other embodiments N may be set to equal 2 or 4, and M may be set to equal 2 or 4. Depending on the implementation, the samples may include color, depth, and transparency (i.e., alpha) information.
In another embodiment, the method for rendering may include receiving a set of vertices, and selecting a tile of sample bins that overlap and edge joining at least two of the vertices. Next, one sample may be selected from each sample bin in the selected tile of bins. Each selected sample may advantageously be from a different memory bank to prevent blocking of memory resources in the rendering pipeline. Next, the selected samples may be rendered (e.g., to form pixels) in order to form an image that is displayable on a display device. The selecting and rendering may be repeated a number of times until all of the samples in the selected tile of bins have been rendered. For each selection cycle, however, the samples may be constrained such that they correspond to different memory banks. In one embodiment, each selection cycle may correspond to one clock cycle. In other embodiments, multiple clock cycles may be utilized for each selection and/or rendering cycle.
In some embodiments, the selected samples may be examined to determine whether or not they are in a geometric primitive (e.g., a triangle) formed by a set of vertices. The samples may be tagged to indicate whether or not they are inside or outside the primitive. Advantageously, the tagged samples may be stored to a FIFO memory that is configured to collapse or compact out samples that are invalid or empty (e.g., samples that are tagged as being outside the primitive). As noted above, in some embodiments the tile may be a two-dimensional array of bins, with each bin storing two or more samples. For example, in one embodiment the tiles may each comprise a 2xc3x972, 4xc3x974, or 5xc3x975 array of bins, with each bin storing 16 samples. Note, in some embodiments having high pixel resolutions, pixel bins (i.e., bins of pixels) and tiles of pixel bins may be used in lieu of sample bins in the embodiments described above.
A graphics system for rendering graphics data is also contemplated. In one embodiment, the graphics system may comprise a memory configured to store graphics data including one or more geometric primitives (e.g., triangles, polygons, or other 2D shapes or 3D volumes). The graphics system may include set-up logic that is configured to select a tile of bins that at least partially intersect the geometric primitive. The tile of bins may be a two-dimensional array of bins, and each bin may correspond to a number of samples. The graphics system may also include a number of sample evaluation units configured to evaluate the selected samples from the set-up logic. The sample evaluation units may be configured to determine whether the selected samples from the set-up logic are within the geometric primitive, and thus worthy of rendering. The graphics system may also include, in some embodiments, a number of FIFO memories in a frame buffer. The sample evaluation units may be connected to the frame buffer and the FIFO memories. The frame buffer may include a number of memory banks, with each FIFO memory corresponding to one sample evaluation unit and one memory bank. The sample evaluation unit may also include a rendering unit configured to render selected samples that fall within the geometric primitive. Advantageously, the FIFO memories may be configured to shift out or collapse out samples that are outside the geometric primitive, thereby preventing the sample evaluation units from wasting resources for clock cycles on samples that are not going to be rendered.