Graphics processing is conventionally performed in a sequence of pipelined stages. There are a variety of different conventional pipeline architectures, depending on implementation details and the level of granularity that certain operations are described. However, conventionally the pipeline stages perform geometry processing and then pixel processing. The geometry processing stages may include, for example, transform and lighting. Additionally, geometry processing may include vertex shading.
Referring to FIGS. 1 and 2, a prior art graphics processing architecture of the Nvidia Corporation of Santa Clara, Calif. included a rotating vertex cache (V-cache) to improve the efficiency of a vertex/geometry processing stage 118 which performed primitive assembly, vertex shading, and other geometry processing operations. As illustrated in FIG. 1, input vertices (VTX IN) were processed by a vertex processing engine 105 and the resulting vertices (VTX out) were buffered in slots (e.g., 32 slots) of a common vertex buffer 110. The common vertex buffer 110 provided vertex information for all of the geometry processing units 115 of geometry processing stage 118. As illustrated in FIG. 2, the common vertex buffer was implemented as a rotating vertex cache (V-cache) 200 included slot assignments (indicated by lines around the perimeter of rotation) for an upstream unit to write vertices into the V-cache.
New vertices were added to slots in V-cache 200 in a serial fashion to fill the rotating V-cache. A rolling history was maintained to support the V-cache. Primitives were then assembled and processed by geometry processing units 115 using entries in the V-cache. A benefit of the rotating V-cache of FIG. 2 is that any triangle being processed by the geometry processing units 115 could reference any slot in the V-cache. The rotating V-cache permitted the re-use of vertices between primitives and therefore improves efficiency. Thus, for example, if the slot assignment was A B C D E F G H; then, primitives could be assembled reusing any of the entries in the V-cache. For example, in this example, primitives (triangles) ABC and DEF could be assembled from entries in the V-cache.
The rotating V-cache architecture is a vertex-oriented architecture designed to facilitate the use (and reuse) of vertices for primitive assembly, vertex shading, and other geometry processing operations. However, one drawback of a rotating V-cache architecture is that it limits scalability. In the rotating V-cache architecture the slot entries of the V-cache 200 are a shared resource for all of the geometry processing units 115 to assemble primitives and perform geometry processing. However, there are practical limits on the size of a single V-cache. Additionally, since the V-cache is designed as a shared resource, it imposes a constraint that units using the V-cache 200 be communicatively coupled in a manner consistent with using a shared resource. However, coordination of different units with a shared V-cache resource creates problems with scalability.
Therefore, in light of the above described problem the apparatus, method, and system of the present invention was developed.