FIG. 1 is a block diagram illustrating a prior art graphics pipeline 100 as set forth in OpenGL, a well-known graphics application programming interface (“API”). Persons skilled in the art will recognize that FIG. 1 depicts only the logical relationships among the various elements of graphics pipeline 100 and does not necessarily show a hardware implementation of graphics pipeline 100.
As shown, imaging path 102 receives image data 104. Image data 104 is processed to have pixels unpacked at 106. Geometry path 112 receives geometry data 114, namely, geometry primitives. At 116, vertices of geometry data 114 are unpacked. Unpacking of pixels at 106 and unpacking of vertices at 116 are both performed in partial response to display lists 122 input into imaging path 102 and geometry path 112, respectively. Persons skilled in the art will appreciate that conventional user programming in geometry path 112 is based on the stream of unpacked vertices provided to vertex operations 118. Further, evaluators 117 are located in front of vertex operations 118. Tessellation therefore occurs before vertex operations 118 in this particular configuration of graphics pipeline 100.
As FIG. 1 also shows, at 108, pixel operations are performed on unpacked pixels 106. At 118, vertex operations are performed on unpacked vertices 116. Texture data or texels generated from pixel operations 108 are stored in texture memory 124. At 110, image rasterization is done on pixel data following pixel operations 108. Texture data as well as geometry data produced from vertex operations 118 are rasterized with geometric rasterization at 120. Image rasterization 110 and geometric rasterization 120 outputs are combined and processed by fragment operations 126, the output of which is provided to frame buffer 128.
Graphics processors that implement the functionality of graphics pipelines, such as graphics pipeline 100, may have user-programming capability, but such programmability typically is limited to vertex-oriented processing. For example, graphics processors with user-programmability may include one or more processing units, such as vertex engines, that are capable of processing a stream of vertices using various user-developed programs or subroutines. By providing such a programmable vertex engine, the flexibility and functionality of the graphics processor is enhanced. However, a graphics processor with this type of programmable vertex engine limits a user to influencing only how vertex data is manipulated. A more flexible graphics processor would also enable a user to influence how primitives are manipulated in the graphics pipeline.
In addition to the foregoing, in current architectures such as that depicted in graphics pipeline 100, evaluation usually is performed prior to performing many vertex operations such as matrix palette skinning. As is commonly known, an evaluator is used in a graphics pipeline for a variety of functions such as computing geometry defined by bi-variate polynomials and tessellating such geometry. A specific problem with this order of operations is that when a vertex program operates on one of the vertices of a triangle during matrix palette skinning, the vertex program typically selects a subset of matrices and weights from a predefined set of matrices configured for skinning operations and performs the necessary weighted transforms. If, however, that triangle derives from a patch, the vertices of the triangle have no immediately obvious matrices or weights, thereby defeating matrix palette skinning. A more feasible approach would be to apply the skinning operations to the control points of the patch and then tessellate. The current architecture, however, precludes such an approach. Further, to achieve this same effect in the current architecture, all matrices affecting the control points of a patch must influence the final position of the evaluated positions. As all relevant operations are linear, this means that all of the active matrices have to be interpolated and applied to the evaluated positions. This requires the union of all active matrices to be included in the relevant operations, resulting in a far larger number of matrices being applied to each generated vertex, thereby making these skinning calculations largely impractical.
Another limitation of the current architecture pertains to load balancing. In a typical configuration, dedicated, statically portioned hardware units perform the various operations in the graphics pipeline such as evaluation and transform and lighting computations. Further, to the extent that one of these units has a disproportionately greater number of operations to perform, the current architecture has no way of offloading any of those calculations to other operational units in the graphics processor. As a result, one or more such units may act as a bottleneck in the graphics pipeline.