The rendering of images in computer graphics has become increasingly more realistic with the onset of three-dimensional (3D) scenes. In some graphics applications, such as computer gaming, the level of detail (LOD) of a 3D object may not need to be the same depending on how far away the object is to be viewed. For example, an object farther away relative to the observer need not have the same LOD as the same object when viewed close-up. One way to render 3D detailed surfaces realistically with different LODs is through the use of tessellation. In tessellation, a 3D surface may be divided into surface patches. Surface patches may, in turn, be broken up into primitives for rendering in graphics hardware. By breaking up the 3D surfaces into surface patches and primitives, the same 3D objects can be rendered in greater detail as necessary. Another goal of using 3D surface patches with following on-chip tessellation is to reduce the amount of information and information transfer and processing needed to render smooth surfaces in graphics processors. Meshes based on quadrilateral primitives or triangle primitives may be considered as representations for 3D objects. 3D surface patches may be considered as a compressed representation of this quadrilateral mesh or triangle mesh with a ratio of compression that may range between 10 and 100, depending on required level of detail. In this point of view, each 3D surface patch in a scene object model needs to be decompressed to a quadrilateral mesh or triangle mesh in order to be processed by a rendering pipeline. Such decompression may be referred to as a tessellation stage, and the processing rate of this stage may determine overall 3D rendering performance in graphics systems.
FIG. 1 shows an existing graphics pipeline 10 that includes tessellation. Input assembler (IA) 20 reads vertices out of a buffer 15 using fixed function operations, forming mesh geometry, and creating pipeline work items. Input assembler 20 also generates identifiers, or indices (IDs) for work items. These IDs are to be used for ID-specific processing by other components of pipeline 10, such as vertex shader 25, hull shader 30, domain shader 40, geometry shader 45, and pixel shader 55, as indicated by the dashed lines on the right of FIG. 1.
Vertex shader (VS) 25 outputs one vertex for each one vertex it receives from IA 20. Hull shader (HS) 30 operates on each vertex from VS 25 in two phases. In control point phase, HS 30 outputs one control point per invocation. Its aggregate output is shared as input to both tessellator (TS) 35 and domain shader (DS) 40. In patch constant phase, which is invoked once per patch, HS 30 reads input of all input and output control points and patch constants computed so far. HS 30 outputs edge tessellation factors and other patch constant data.
Tessellator (TS) 35 receives numbers called tessellation factors (TFs) from HS 30 defining how much to tessellate. TS 35 generates domain locations and topology. For example, such tessellation factors may specify how many times a patch is subdivided on each side and in an internal area as well. As non-limiting examples, triangle patches, may have four TFs: three for sides and one for interior, while quadrilateral patches may have six TFs: one for each side and two for the interior. These factors may be fixed or adaptive based on software settings.
Domain shader (DS) 40 inputs one domain location plus shared read-only input of all HS outputs for the patch. DS 40 outputs one vertex.
Geometry shader (GS) 45 inputs one primitive and outputs up to four streams, each independently receiving no primitives or some primitives. As shown, an output stream from GS 45 can provide primitives to rasterizer (RS) 50 while, or alternatively, up to four streams can be concatenated to memory-based buffer 15.
Rasterizer (RS) 50 further prepares data for further pixel processing. RS 50 performs functions of clipping including custom clip boundaries, perspective divide, viewport/scissor selection and implementation, RenderTarget selection, and primitive setup. RenderTarget is a type of displayable frame buffer or any memory surface with pixels addressed via geometry coordinates instead of linear addressing.
Pixel shader (PS) 55 inputs one pixel for processing and outputs either one pixel at the same RenderTarget position or no pixel.
Output merger (OM) 60 provides fixed function RenderTarget blend/depth/stencil operations.
FIGS. 2a, 2b and 2c show an existing iterative tessellation method used in graphics processing units (GPU). FIG. 2a shows a tessellation block that generates primitives iteratively one by one in a pipeline such as that shown in FIG. 1. Being iterative, it takes a previous state Sn-1 (e.g. indices of a previous primitive or some other data), produces a new state Sn and outputs a primitive Tn consisting of a set of vertices (a1, a2, a3)n. In this expression, a1, a2 and a3 are integer indices of vertices used for enumeration of vertex flow items. In FIG. 2b, a vertex generating block is shown. Similarly to the primitive generator block, it reads an old state S′m-1 and produces the new state S′m along with a vertex am represented as a set of coordinates (u,v)m. FIG. 2c shows a set of steps necessary to produce a single primitive represented as a set of coordinates. Primitive and vertex generators have to go through n and m iterations respectively. Furthermore, the vertex generators and primitive generators must interact with each other. For example, as shown in FIG. 2c, they depend on each other through the set of shared vertices (ak, al, am). This creates a dependency that prevents a fully parallel execution.
The existing tessellation solution described above includes some deficiencies which lead to poor tessellation performance, especially with small size primitives (such as quadrilaterals or triangles) in pixel or sub-pixel level subdivision when an output pixel rate is significantly reduced, possibly becoming less than or equal to a primitive rate. Pixel rate may become even lower than primitive rate when subdivision size becomes comparable to, or smaller than, a size of a single pixel. A primitive rate is normally a few times lower than output pixel rate, especially in the case of primitives of larger sizes with several pixels covered. In addition, the use of an iterative tessellation procedure adds another limitation on primitive rate, which results in additional adverse effects on pixel rate.
It may therefore be beneficial to provide a method and apparatus of tessellation to generate a larger amount of pixels and sustain a high pixel rate in case of pixel or subpixel size subdivision.