Providing competitive geometry processing performance in a graphics processing unit (GPU) typically involves multiple, parallel, concurrent geometry processing fixed-function pipelines (GPPs). These GPPs (sometimes also called SMMs, Geometry and Setup fixed-function pipelines or pre-tessellation and post-tessellation pipeline) include a mixture of programmable shader and fixed-function stages in the OpenGL rendering pipeline (RP). See Open Graphics Language Specification (“OpenGL”) See Open GL specification 4.2. Tessellation involves subdividing a patch primitive (also called an “object”) and computing vertex values for its vertices. A tessellation control shader may determine how much tessellation to do by specifying a tessellation factor. The number of vertices per patch may be defined at the application level. A patch object may be a triangle or quad (which is a square).
Tessellation involves subdividing parametric domains associated with input patch primitives into a set of triangle primitives and computing vertex values at the tessellated domain points (coinciding with the corners of those triangle primitives). Input patch primitives may be associated with triangle or quad parametric domains. A tessellation control shader may determine how finely the domain is subdivided into triangles by specifying a set of tessellation factors for each patch. A tessellation evaluation shader may subsequently compute vertex values using a set of input control points associated with an input patch primitive as well as the domain parameters at the tessellated domain points. The number of input control points associated with patch primitives may be defined at the application level.
The classic problem in parallel rendering graphics architectures is how to take advantage of parallel GPPs and Rendering or Rasterization Pipelines (RPs) while maintaining the strict in-order three-dimensional (3D) pipeline rendering model. The major issue involved is the arbitrary mapping of application-supplied “object-space” geometry primitives onto the rendered image during the rendering process, where the “Sort-Middle” architecture has been exploited effectively by the industry. In this scheme, the GPU first performs full geometry processing on arbitrarily-distributed subsets (“batches”) of object-space primitives via parallel GPPs. The resulting screen-space primitives are then correctly reordered (i.e., temporally sorted) and distributed to RPs via a rasterization crossbar based on the screen-space regions owned by each RP.
Increasing the number (N) of GPPs in the design will typically require deeper buffers at the output of each GPP in order to provide sufficient GPP output buffering while the GPP “waits for its turn” to output to the rasterization crossbar. Here the GPP output buffer would likely be sized to the average time it takes the other (N−1) GPPs to drain their batches to the crossbar. If sufficient buffering is not provided, the overall geometry throughput tends to degrade to the throughput of a single GPP, as the GPPs “waiting for their turn” will quickly become stalled as they will not be drained while waiting, and when it becomes their turn they will output to the crossbar at the GPP processing rate (which is slower than the crossbar rate).