1. Field of the Invention
The present invention relates to the field of data processing in a computer graphics system. More specifically, the present invention pertains to an anti-aliasing buffer architecture for processing fragments in a computer graphics system.
2. Background Art
Computer graphics generally consists of instructions implemented via a graphics system executed on a computer system. The instructions are used to specify the calculations and operations needed to produce rendered images that have a three-dimensional appearance.
The computer graphics system can be envisioned, in part, as a pipeline through which pixel data pass. The data are used to define the image to be produced and displayed. At various points along the pipeline, various calculations and operations are specified by the graphics designer, and the data are modified accordingly.
In the initial stages of the pipeline, the desired image is composed using geometric shapes such as lines and polygons, referred to in the art as geometric xe2x80x9cprimitives.xe2x80x9d The derivation of the vertices for an image and the manipulation of the vertices to provide animation entail performing numerous geometric calculations in order to project the three-dimensional world being designed to a position in the two-dimensional world (the xe2x80x9cview planexe2x80x9d) of the display screen.
Primitives are then decomposed into xe2x80x9cfragments,xe2x80x9d and these fragments are assigned attributes such as color, perspective, and texture. In order to enhance the quality of the image, effects such as lighting, fog, and shading are added, and anti-aliasing and blending functions are used to give the image a smoother and more realistic appearance. In the final image generation stage, the fragments and their associated attributes are stored in the frame buffer as pixels. The pixel values can be later read from the frame buffer, and can be used to display images on the computer screen.
The entire process, from projecting the primitives onto the view plane through formation of the output image, is known as rendering. The specific process of decomposing individual primitives and determining per-pixel or per-fragment values from those geometric primitives is known as rasterization.
With reference now to Prior Art FIG. 1, process 130 exemplifies one embodiment of a graphics design process implemented using a graphics program on a computer system. Process 130 operates on vertex (or geometric) data 131. The blocks within process 130 consist of display list 133, evaluators 134, per-vertex operations and primitive assembly 135, rasterization 138, per-fragment operations 139, and frame buffer 140.
Vertex data 131 are loaded from the computer system""s memory and saved in display list 133; however, in some graphics programs, a display list is not used and, instead, the vertex data are processed immediately. When display list 133 is executed, evaluators 134 derive the coordinates, or vertices, that are used to describe points, lines, polygons, and the like (e.g., primitives). All geometric primitives are eventually described by collections of vertices.
With reference still to Prior Art FIG. 1, in per-vertex operations and primitive assembly 135, vertex data 131 are converted into primitives that are assembled to represent the surfaces to be graphically displayed. Some vertex data (for example, spatial coordinates) are transformed, typically using matrix multiplication, to project the spatial coordinates from a position in the three-dimensional world to a position on the display screen.
In addition, advanced features are also performed in per-vertex operations and primitive assembly 135. Texturing coordinates may be generated and transformed. Lighting calculations are performed using the transformed vertex, the surface normal, material properties, and other lighting information to produce a color value. Perspective division, which is used to make distant objects appear smaller than closer objects in the display, also occurs in per-vertex operations and primitive assembly 135.
Rasterization 138 is the conversion of vertex data into xe2x80x9cfragments.xe2x80x9d Each fragment corresponds to a single element (e.g., a xe2x80x9cpixelxe2x80x9d) in the graphics display, and typically includes data defining color, shading, and texture. Per-fragment operations 139 consist of additional operations that may be enabled to enhance the detail of the fragments, such as blending, dithering and other like operations. After completion of these operations, the processing of the fragment is complete and it is written as a pixel to frame buffer 140.
Part of the process of anti-aliasing is performed during rasterization 138. Anti-aliasing is a technique for correcting the problem of aliasing, which can cause the edges of an object to appear jagged when the object is rendered. For example, a polygon may only partially cover a number of pixels; that is, the edge of the polygon may pass through a number of adjacent pixels. If these pixels are approximated as being fully covered by the polygon (and are colored the color of the polygon), the edge the polygon would likely appear as jagged when it is rendered. In addition, the fragments that correspond to each screen pixel must be kept in a sorted order according to their distance from the view plane.
In general, there are two common approaches for implementing this sorted ordering: the xe2x80x9cZ-bufferxe2x80x9d approach and the xe2x80x9cA-bufferxe2x80x9d approach. In the Z-buffer approach, pixel data including a depth value (e.g., a z-dimension indicating distance from a view plane) are stored for every pixel location in a display image. As geometric primitives are rasterized, the depth values for newly generated pixel data are compared to depth values for pixel data in the Z-buffer. If the newly generated pixel data are closer to the view plane (e.g., a smaller value of z), then these data are written over the current pixel data in the Z-buffer. If the newly generated pixel data have a larger value of z, then the new data are disregarded.
The Z-buffer approach will always result in aliasing because it does not adequately address partially covered pixels. The A-buffer approach improves the Z-buffer approach by addressing anti-aliasing for partially covered pixels.
In the A-buffer approach, polygons are clipped into fragments at the boundaries of a pixel. For example, consider a square-shaped pixel partially covered by a portion of a polygon. A pixel fragment would be generated to represent the portion of the polygon covering the pixel. A bit mask representing the edges of the polygon is used to describe how the polygon partially covers the pixel. When there are multiple polygons contributing to the color of a pixel, multiple pixel fragments are generated, and the colors of the fragments are then resolved within the A-buffer to compute a final color for the pixel. The multiple fragments corresponding to a pixel location are stored in memory as a xe2x80x9cfragment stack.xe2x80x9d
Prior Art FIG. 2 is a data flow diagram showing one embodiment of an anti-aliasing buffer (A-buffer) architecture 200 used in a computer graphics system. In A-buffer architecture 200, the fragment data flow in a loop from fragment memory 210 to fragment manager 220, through fragment evaluation pipeline 230a or 230b, then back to fragment memory 210. The process may also be performed using parallel paths, each path with distinct fragment memory, fragment manager, evaluation pipeline(s), and result router.
A new fragment 202 for a particular pixel location causes fragment stack 212 for that pixel location to be read from fragment memory 210. Fragment manager 220 feeds fragment stack 212 to one of the fragment evaluation pipelines 230a or 230b (it is appreciated that more than two pipelins can be used in A-buffer architecture 200).
After evaluation in the pipeline, the final pixel color for the pixel location is determined, and pixel data 246 (comprising the pixel color and pixel location) are sent to frame buffer 140 via evaluation result router 240. Also, evaluation result router 240 writes the processed fragment stack 235 back to fragment memory 210, so that fragment stack 235 can be accessed the next time a new fragment is received for the pixel location. Thus, in the prior art, the data that are output from pipelines 230a and 230b are written to fragment memory 210. Smaller memory caches are sometimes used in conjunction with fragment memory 210 to speed up access to repetitively used fragment stacks.
In most implementations, fragment memory 210 uses a paging scheme to read and write fragment stacks (e.g., fragment stacks 212 and 235, respectively). These implementations take into account that the average polygon size is fifty pixels or less. As such, the average width of the polygon is eight pixels or less. Because of the amount of data stored for each fragment, and the need to keep memory pages small for quick access to prevent visual artifacts, most polygons will cross at least one memory page boundary. That is, the page position is fixed but the polygon position is not (in an animated image), and therefore it is probable that polygons will cross the page boundary.
Prior Art FIG. 3 illustrates a polygon 310 spanning multiple memory pages 320, 322, 324 and 326 in a fragment memory 210 used with a computer graphics system. Fragment memory 210 is aligned with the display screen coordinates. Accordingly, a fragment stack (exemplified by fragment stack 315) is associated with a particular pixel location on the display screen.
Each fragment in fragment stack 315 typically is associated with the following information that must be stored in fragment memory 210: a transparency flag (one bit); a transparency value (eight bits); a stencil flag (one bit); a stencil value (eight bits); a fragment color (red, green or blue, typically at least eight bits each; some systems use ten bits each); a fragment mask (typically, at least 16 bits); a fragment z-dimension (typically, 24 bits); various stack control flags (typically, 6 bits); an offset to the next fragment in the stack (at least eight bits); the total number of bits required for each fragment in a stack (at least 96 bits, or 12 bytes); and in some implementations, a header word that holds the pixel location, current final color, touched flag, and an offset to the first entry in the stack.
With reference back to Prior Art FIG. 2, A-buffer architecture 200 has several disadvantages. One disadvantage to the prior art is associated with the paging scheme used by fragment memory 210. Swapping memory pages is time-consuming in the context of a processing pipeline (e.g., fragment evaluation pipeline 230a) that is trying to keep up with a high performance graphics system that may be producing tens of millions fragments (or more) per second. The size of the memory page must be tuned to the graphics system in order to optimize performance. If the page size is made too small, then memory pages must be swapped more often. On the other hand, if the page size is made too large, then it takes more time to swap them, and this may cause visible artifacts in the display.
Also associated with this disadvantage is the depth (number of stages) in fragment evaluation pipelines 230a and 230b. To gain the necessary data throughput, fragment evaluation pipelines 230a and 230b must include multiple stages. Generally, there are at least eight stages to a pipeline: pixel ownership test, scissors test, alpha test, stencil test, depth test, blend, dither, and logic operations. Each stage of fragment evaluation pipelines 230a and 230b will be operating on a different fragment stack. Typically, a fragment stack moves from one stage in the pipeline to the next stage in the pipeline every xe2x80x9cNxe2x80x9d clock cycles. Each stage in the pipeline has a set number, N, of clock cycles to complete its function. The slowest stage in the pipeline (the stage with the maximum N) controls the speed of the pipeline.
With reference to Prior Art FIGS. 2 and 3, it is common for a fragment stack to be in the process of being read from one page in fragment memory 210 while another fragment stack waits to be written to a different page in fragment memory 210. That is, fragment stack 212 may be in the process of being read from memory page 322; in the meantime, fragment stack 235 has been processed in fragment evaluation pipeline 230a and needs to be written to a different memory page 320. This causes xe2x80x9cmemory (or page) thrashingxe2x80x9d or xe2x80x9cpage misses,xe2x80x9d in which one memory page is open in order to read data and another memory page needs to be opened in order to write data (and vice versa). As a result of memory thrashing, processing is delayed until the necessary read or write can be accomplished. The processing delay may be manifested as lag or a loss in detail in the rendered image.
Consequently, another problem with the prior art A-buffer architecture 200 occurs when the data that are output from pipelines 230a and 230b are written to fragment memory 210. A different memory page may be open, and thus the post-pipeline write operation is delayed until the proper memory page can be opened and the processed data (e.g., fragment stack 235) written to fragment memory 210. This delay (due to memory thrashing) will be propagated back through pipelines 230a and 230b, thereby reducing the processing efficiency and data throughput of the computer graphics system.
Similarly, if data (e.g., fragment stack 212) are to be read from one page in fragment memory 210, but a different page is open because the post-pipeline write operation is ongoing, then the read will be delayed until the proper memory page can be opened. As a result, pipelines 230a and 230b may be starved for data, with processing delayed until data are received.
In addition, fragment evaluation pipelines are being increased in length in order to add stages that provide effects, such as shadows, that increase the realism of the rendered images. Longer pipelines can exacerbate the problem of memory thrashing by increasing the likelihood that one memory page will be open when another is needed. As described above, increasing the size of the memory page to alleviate this problem will result in processing delays because of the additional time needed to read (or write) the larger pages. Adding a cache between the end of fragment evaluation pipeline 230a and fragment memory 210, or between fragment memory 210 and the beginning of fragment evaluation pipeline 230a, also does not help, because the cache will fill up and thus may contribute to the memory thrashing problem (in effect, the cache simply adds another stage to the pipeline).
The use of multiple fragment evaluation pipelines can also aggravate the memory thrashing problem. One pipeline may be attempting to read a memory page, and another may be attempting to write to a different page. With multiple pipelines attempting to access fragment memory 210 at different times, the frequency of memory thrashing will get worse.
Yet another disadvantage associated with A-buffer architecture 200 is that the entire fragment stack (e.g., fragment stack 212) is carried all the way through fragment evaluation pipeline 230a. In some instances, the entire fragment stack 212 is not required. As described above, a significant amount of information is associated with each fragment in a fragment stack. As a result, the data throughput in A-buffer architecture 200, in particular in fragment evaluation pipeline 230a, is reduced because of the additional time needed to read/write as well as process the data.
With reference still to Prior Art FIG. 2, another problem associated with A-buffer architecture 200 is that status memory 205 is a part of fragment memory 210. Status memory 205 is assigned to hold a set of three status bits for each fragment stack. The status bits include an xe2x80x9cexistxe2x80x9d bit that is used to indicate whether the fragment stack currently holds any fragments, a xe2x80x9cvalid/invalidxe2x80x9d bit to indicate whether the fragment stack at this location is valid or has been invalidated (used to erase areas of A-buffer memory), and a xe2x80x9csizexe2x80x9d bit to indicate whether the fragment stack has reached the defined maximum number of fragments per stack. For a typical 1280 by 1024 pixel screen, the three status bits can require almost one-half megabyte of memory.
Because status memory 205 is within fragment memory 210, it needs to be accessed with the same memory paging mechanism. This means that A-buffer architecture 200 requires a caching scheme to hold the status memory page while the fragment memory page is being worked on (e.g., read). However, the amount of space available on an integrated circuit is limited, and therefore the space that can be used for the caching scheme is also limited. In addition, as fragment evaluation pipelines grow in length and complexity, there may be more effective uses of the space assigned to the caching scheme.
An additional disadvantage associated with the prior art is that fragments in a fragment stack are depth-sorted (by z-dimension) in fragment memory 210. Prior art systems such as A-buffer architecture 200 rely on storing the fragment stack in sorted order to more easily identify where to insert a new fragment 202 into a fragment stack 212. Depth-sorting always increases the number of operations that need to be performed on the fragment stack 212. That is, this technique makes it unnecessary to perform operations to determine that the fragment stack 212 is in sorted order, and to perform operations to sort the fragment stack. However, the mechanism to complete re-sort the fragment stack must still be available.
Typically, fragment stack 212 is depth-sorted in fragment evaluation pipeline 230a. There are, however, several factors that require fragment stack 212 to be re-sorted prior to the depth test. These factors can include such things as the results of the stencil test, or a change in depth sorting order. Because fragment stack 212 is depth-sorted in fragment evaluation pipeline 230a, it is desirable to reduce the number of operations associated with determining whether fragment stack 212 is already in sorted order.
Accordingly, what is needed is a method and/or system that can be used in a computer graphics system to address the memory thrashing problem that occurs when reading and writing fragment stacks from fragment memory. What is also needed is a method and/or system that address the above need and that can increase the speed at which fragment stacks are read from and written to fragment memory. In addition, what is needed is a method and/or system that address the above needs and that can increase the data throughput in a computer graphics system. Furthermore, what is needed is a method and/or system that address the above needs and that can reduce the number of operations associated with depth-sorting the fragments in a fragment stack. What is also needed is a method and/or system that address the above needs and that can allow status memory to operate independently of fragment memory.
The present invention includes a method and system thereof that satisfy the above needs. These and other advantages of the present invention not specifically mentioned above will become clear within discussions of the present invention presented herein.
In one embodiment, a method and system thereof for processing data in a computer graphics system are described. In the present embodiment, when a new fragment for a particular pixel location is received, the fragment stack for that pixel location is read from fragment memory. The new fragment is appended to the fragment stack, and the resultant fragment stack is written back to fragment memory before it is processed in a computer graphics pipeline. Thus, in accordance with the present invention, the write back to fragment memory after processing in the computer graphics pipeline is eliminated.
In accordance with the present invention, fragments stored in fragment memory are not sorted according to their distance from the view plane (the z-dimension); instead, z-ordered depth sorting is performed in the computer graphics pipeline. Using an occlude command, occluded (blocked) fragments can be deleted from the fragment stack before the fragment stack is passed to the computer graphics pipeline. The computer graphics pipeline calculates a pixel color for each pixel location. Multiple computer graphics pipelines can be executed in parallel, and the pixel colors determined in each pipeline can be interleaved to improve processing efficiency.
In one embodiment, an anti-aliasing buffer architecture for processing fragments in a computer graphics system is described. The anti-aliasing buffer architecture comprises a fragment memory, one or more fragment evaluation pipelines, and a frame buffer. In one embodiment, the anti-aliasing buffer architecture further comprises a fragment memory manager and a fragment evaluation pipeline manager. In this embodiment, the fragment evaluation pipeline manager receives a new fragment for a particular pixel location and writes the new fragment to the fragment memory manager. The fragment memory manager reads the fragment stack for that pixel location and appends the new fragment to the fragment stack. In accordance with the present invention, the fragment stack (including the new fragment) is written back to fragment memory before further processing in a fragment evaluation pipeline. Thus, the fragment stack can be written to its memory page in fragment memory while that page is still open (that is, before a page swap occurs), and therefore page thrashing can be prevented.
In the present embodiment, the fragment memory manager also sends the fragment stack (including the new fragment) to the fragment evaluation pipeline manager, and in turn the fragment stack is processed in a fragment evaluation pipeline to determine a pixel color for the pixel location being evaluated. The pixel color is then stored in the frame buffer; when multiple fragment evaluation pipelines are being used, the pixel colors determined in each pipeline are interleaved to improve efficiency. Significantly, in accordance with the present invention, fragment stacks are not written to fragment memory after processing in the fragment evaluation pipeline(s).
In the present embodiment, each fragment evaluation pipeline performs z-ordered depth sorting of the fragment stack as a pipeline stage (where the z-dimension indicates distance to a view plane). Thus, in accordance with the present invention, fragments can be stored in fragment memory in arbitrary z-dimension order within a fragment stack (that is, fragments in a fragment stack do not need to be sorted according to the z-dimension before the fragment stack is stored in fragment memory). Therefore, the operations associated with depth-sorting the fragments in a fragment stack is relegated to one or more pipeline stages and do not affect overall throughput.
In the present embodiment, when a fragment evaluation pipeline determines that a fragment is blocked from view (and thus will not contribute to the pixel color at a pixel location), an occlude command is sent to the fragment memory manager. The fragment memory manager sets an occlude status bit for that pixel location. In one embodiment, when the fragment stack for the pixel location is next read from fragment memory, the fragment memory manager deletes the occluded fragments from the fragment stack. Thus, when the occlude status bit is set, the fragment stack is xe2x80x9ctrimmed;xe2x80x9d therefore, only visible fragments are forwarded to the fragment evaluation pipeline(s). Deleting occluded fragments from a fragment stack can increase throughput through an anti-aliasing buffer architecture. Making a fragment stack shorter means that it can be more quickly read from and written to fragment memory.
In one embodiment, the anti-aliasing buffer architecture further comprises a status memory for storing an additional bitxe2x80x94the occlude status bitxe2x80x94for each of the pixel locations. In this embodiment, the status memory is placed on a separate memory bus or in a memory bank separate from fragment stack memory. This allows the status memory (specifically, a status memory page) to operate independently of a fragment memory page without affecting access to fragment data.
In summary, in accordance with the present invention, fragment stacks are read from and written back to fragment memory before processing in the computer graphics pipeline(s). It is not necessary for the entire fragment stack to be passed all the way through a fragment evaluation pipeline before it is written back to fragment memory. Fragment stacks being processed in the fragment evaluation pipeline(s) are not held up waiting for a memory page swap to write back to fragment memory, and vice versa. Also, pixel writes to the frame buffer are not held up because of page misses in fragment memory. As a result, processing performance in the fragment evaluation pipeline(s) can be improved and data throughput in the pipeline(s) can be increased, improving the overall performance of a computer graphics system.