Graphics processing units (GPUs) are used in graphics system to process graphics data. GPUs are used, for example, to process three-dimensional (3-D) graphical images. A GPU is commonly implemented as a chip that receives a command stream from a central processing unit (CPU). GPUs typically have a pipeline architecture in which the processing operations are performed in a sequence of stages. These pipeline stages perform, for example, a predetermined sequence of vertex processing, geometry processing, and pixel processing. The individual stages are typically implemented in hardware and may also be programmable.
A conventional graphics pipeline offers a number of benefits that are derived from dedicating each stage to performing a specific function for one type of data (e.g., primitive data). There are, however, a number of disadvantages associated with the conventional pipelined GPU architecture. In particular, the conventional pipelined architecture constrains the manner in which data flows between stages and the manner in which data is interpreted and utilized by individual stages. For example, “choke points” can emerge in one or more stages of a graphics pipeline. A choke point may, for example, be a portion of a stage that is not functioning well enough to permit data from a previous stage to smoothly flow to a subsequent stage. Additionally, the conventional pipelined architecture places many restraints on data flow and data usage. Moreover, each stage in a conventional graphics pipeline typically interprets and utilizes data in narrowly defined ways determined by the dedicated function of the stage. For example, a triangle setup stage is typically responsible for calculating the slope of a triangle edge using the vertex information of the edge's two end points. Thus, the triangle setup expects to receive vertex information and outputs triangle information, typically in the form of a constant, an x-gradient, and a y-gradient.
For example, FIG. 1 illustrates one version of a prior art graphics pipeline 100. In this pipeline 100, a computer application 105, such as a game program, passes graphical data to the pipeline 100. Typically, this graphical data is in the form of geometric shapes known as primitives. Basic primitives include points, lines, triangles, and quadrangles. Note that for the case of a point, a point can be rasterized as a triangle (or triangles), a quad, or a square. Headers included with each primitive indicate what type of primitive the application is passing to the pipeline. Typically, the application passes graphical data to the pipeline as a set of triangle vertices.
Graphical data is initially received at a front end stage 110 of the graphics pipeline 100. The front end stage 110 could be an interface between the GPU and other computer components and can be responsible for identifying the type of primitive received from the application. Once the front end stage 110 receives a primitive, it can pass the primitive to a geometry stage 115 that is responsible for basic transformations (e.g., rotation, translation, and scaling), space-to-space transformations, culling, clipping, etc. Depending upon the hardware implementation of the graphics pipeline 100, some of these operations could be performed in other stages of the pipeline, and other operations could be performed in the geometry stage 115.
After a received primitive has been manipulated by the geometry stage 115, the processed data is passed down the pipeline to the triangle setup stage 120. The triangle setup stage 120 is generally responsible for calculating the slope of a triangle edge using the vertex information of the edge's two end points. Slope information is only meaningful with certain primitives, such as triangles and lines. Nonetheless, because modern 3-D graphics pipelines assume that all graphical primitives are associated with slope data, the triangle setup stage 120 will attempt to process the slope for all primitives, even those without slopes. For example, if the primitive is a point, the triangle setup stage 120 will still attempt to process the non-existent slope information for the point and output a constant, an x-gradient, and a y-gradient. Because points have no slope, the slope information is treated as a null.
After the triangle setup stage 120 has completed its manipulation of the graphical data, the data is passed to the rasterization/setup stage 125. This passed data includes three components: a constant, an x-gradient, and a y-gradient. The triangle setup stage 120 can also pass these three components to a cache (not shown), such as a triangle RAM, so that the rasterization/setup stage 125 can later retrieve the data.
The rasterization/setup stage 125 is responsible for converting vectors associated with primitives into x-y coordinate representations. The rasterization/setup stage 125 can also be responsible for identifying the pixels touched by triangle edges and lines as well as for general pixel processing, including shading, shadowing, antialiasing, and depth buffering. As with the triangle setup stage 120, some of the operations performed by the rasterization/rendering stage are not necessary and are, in fact, wasteful for primitives without slope data. When a primitive does not include a slope component, it should not be rasterized. Modern 3-D graphics pipelines, however, assume that all primitives include slope data, and the rasterization/setup stage 125 will attempt to process all primitives accordingly.
Once the rasterization/setup stage 125 has rasterized the graphical data, that data can be further processed on the pixel level. For example, the individual pixels could be shaded. Antialiasing and z-buffering techniques could also be applied. Finally, the processed graphical data could be stored in the frame buffer 130 for further processing or subsequent display.
Referring now to FIG. 2, it illustrates a different implementation of a prior art graphics pipeline 140. The basic operation of this graphics pipeline 140 is similar to that described with relation to FIG. 1. In this pipeline, graphical data is passed to a vertex stage 145 that is generally responsible for manipulating the vertices associated with received primitives and, it is also responsible for caching vertex data for potential reuse. The vertex data is then passed to the geometry/transform and lighting stage 150 that is responsible for basic transformations, space-to-space transformations, and vertex shading. A vertex shader (not shown) may be coupled to geometry/transform and lighting stage 150 to perform the vertex shading. The output of the geometry/transform and lighting 150 stage is then passed to the triangle setup stage 160. As previously described, the triangle setup stage 160 calculates the slopes for triangle edges and lines-even if the passed graphical data is not associated with a slope. After attempting to calculate the slopes, the triangle setup stage 160 stores three entries in the triangle random access memory (RAM) 165: a constant, an x-gradient, and a y-gradient, all of which correspond to the primitive passed through the graphics pipeline 140. If the primitive is not associated with a slope, the triangle RAM 165 stores nulls for the x-gradient and y-gradient. Consequently, for a primitive with slope, the triangle RAM 165 stores three values. Unfortunately, for primitives without slope, the triangle RAM 165 also stores three values—two of which are stored needlessly.
Still referring to FIG. 2, the raster stage 170 can retrieve the values stored in the triangle RAM 165 and calculate the corresponding x-y coordinates. As previously described, rasterization is not necessary on certain types of primitives. Nonetheless, current 3-D graphics pipelines attempt to rasterize all primitives regardless of type.
Finally, the processed graphical data is passed to a pixel shader 175 where color is determined for individual pixels, and then the graphical data is passed to a raster operations stage (ROP) for final processing. The data can then be stored in a frame buffer or routed through the graphics pipeline 140 again.
It can be understood from the examples of FIGS. 1 and 2 that a conventional graphics pipeline imposes very strict limitations on the manner that graphics data is processed. Data typically flows sequentially from one stage to another and each stage tends to have very specific rules for how data is to be interpreted and processed. Moreover, each stage typically performs a comparatively narrow function. For example, conventionally there is a dedicated vertex shader at one point in the graphics pipeline and a separate pixel shader farther down the graphics pipeline. Similarly stages that perform data conversion are also dedicated to a specific data conversion functions such as reasterization and raster operations.
In light of above-described problems there arose the need to develop the inventive apparatus, system, and method described hereinafter.