Field of the Invention
The present invention relates to a graphics processing unit, and a method of operation of such a graphics processing unit, and in particular to techniques for performing tessellation within such a graphics processing unit.
Description of the Prior Art
When seeking to render complex shapes (such as higher-order smooth surfaces) in order to produce a graphics image for display, those complex shapes typically first need to be converted into meshes of standard rendering primitives, an example of such a standard rendering primitive being a triangle. The desired graphics image can then be generated from the resultant mesh data. The process of converting such complex shapes into meshes of standard rendering primitives is referred to as tessellation.
In older graphics processing systems, tessellation was often implemented as a pre-processing step by software executing on a central processing unit (CPU), with the resultant mesh data then being provided directly as an input to a graphics processing unit (GPU). However, in modern graphics processing systems, tessellation is typically performed within the GPU, to enable the computational power of the GPU to be effectively utilised, and to avoid having to transfer large amounts of geometry data to the GPU every frame. In addition, such an approach allows for adaptive tessellation techniques to be performed, where the granularity of the mesh is adapted dependent on the situation, for example the resolution required, the viewing angle, etc.
The article “Fast GPU-based Adaptive Tessellation with CUDA” by M Schwarz et al, Eurographics 2009, Volume 28, Number 2, describes a framework for on-the-fly adaptive tessellation utilising CUDA, CUDA being a non-graphics application programming interface (API) that mainly targets compute-intense data-parallel applications. In accordance with the described technique, all surface primitive instances in the scene are adaptively tessellated in parallel and the resulting triangle meshes are output into vertex and index buffers for rendering. However, one inherent problem with the described technique is that it is not compatible with modern graphics API standards.
In particular, modern versions of popular graphics APIs (such as Microsoft's DirectX 11, or OpenGL 4.X) describe a number of discrete shader operations to be performed by associated shader routines in order to convert the vertex data originally provided by the graphics application into the mesh data to be used by subsequent rendering elements such as a rasteriser in order to produce the final graphics image for display. In accordance with such graphics APIs, the tessellation phase is composed of two programmable shader stages along with a fixed-function tessellator block, as shown schematically in FIG. 1.
The Hull shader stage 15 (using DirectX terminology, but also referred to as the Tessellation Control shader stage in OpenGL terminology) and the Domain shader stage 25 (using DirectX terminology, but also referred to as the Tessellation Evaluation shader stage in OpenGL terminology) are implemented by corresponding shader routines executed by a shader execution unit of the GPU, but defined by the graphics application, whilst typically the fixed-function tessellator 20 is implemented using a hardware block. The vertex data 10 is an ordered list of vertices (which contains, at a minimum, positional data, but may contain many other per-vertex data values) and, as will be understood by those skilled in the art, is typically produced as an output from a vertex shader operation used to perform one or more transformation operations on the originally provided vertex data from the graphics application.
The Hull shader stage 15 specifies a list of vertices (which may or may not be different to the set of vertices in the vertex data 10) to be provided as an input to the Domain shader stage 25, and hence which will be referred to hereafter as “an input list of input vertices”. For each input vertex that the Hull shader stage is to generate, the Hull shader routine is executed once. The Hull shader stage also produces tessellation values that are passed to the fixed-function tessellator 20, and which define the number of mesh vertices to generate. The fixed-function tessellator 20 then generates a series of mesh vertices, and for each mesh vertex that is output from the fixed-function tessellator, a domain shader routine is executed by the Domain shader stage 25, the Domain shader stage performing operations on each mesh vertex output by the tessellator 20, in much the same way as a vertex shader. Hence, the Domain shader stage may potentially transform the vertex's data, with the results then being written out as the mesh vertex data 30 for use in downstream processing.
In addition to generating each mesh vertex input to the Domain shader stage 25, the fixed-function tessellator 20 also generates mesh topology data 35, which is also stored for use in downstream processing.
The fixed-function tessellator block 20 can potentially generate a significant amount of data, and hence through the use of the dedicated hardware block there is the potential for performance to be improved. However, a significant disadvantage is that that hardware block becomes a synchronisation point, creating a pipeline dependency within the shader execution unit. In particular, this pipeline dependency can significantly impact the performance of the Domain shader stage within the shader unit, since the domain shading operation cannot begin until the fixed-function tessellator has generated the required outputs.
Accordingly, it would be desirable to provide an improved technique for performing tessellation within a graphics processing unit, whilst maintaining compatibility with modern graphics APIs.