Graphics processing systems are typically configured to receive graphics data, e.g. from an application running on a computer system, and to render the graphics data to provide a rendering output. For example, the graphics data provided to a graphics processing system may describe geometry within a three dimensional (3D) scene to be rendered, and the rendering output may be a rendered image of the scene. Some graphics processing systems (which may be referred to as “tile-based” graphics processing systems) use a rendering space which is subdivided into a plurality of tiles. The “tiles” are regions of the rendering space, and may have any suitable shape, but are typically rectangular (where the term “rectangular” includes square). To give some examples, a tile may cover a 16×16 block of pixels or a 32×32 block of pixels of an image to be rendered. As is known in the art, there are many benefits to subdividing the rendering space into tiles. For example, subdividing the rendering space into tiles allows an image to be rendered in a tile-by-tile manner, wherein graphics data for a tile can be temporarily stored “on-chip” during the rendering of the tile.
Tile-based graphics processing systems typically operate in two phases: a geometry processing phase and a rendering phase. In the geometry processing phase, the graphics data for a render is analysed to determine, for each of the tiles, which graphics data items are present within that tile. Then in the rendering phase, a tile can be rendered by processing those graphics data items which are determined to be present within that tile (without needing to process graphics data items which were determined in the geometry processing phase to not be present within the particular tile). The graphics data items may represent geometric shapes, which describe surfaces of structures in the scene, and which are referred to as “primitives”. A common primitive shape is a triangle, but primitives may be other 2D shapes or may be lines or points also. Objects can be composed of one or more (e.g. hundreds, thousands or millions) of such primitives.
FIG. 1 shows some elements of a graphics processing system 100 which may be used to render an image of a 3D scene. The graphics processing system 100 comprises a graphics processing unit (GPU) 102 and two portions of memory 1041 and 1042. The two portions of memory 1041 and 1042 may, or may not, be parts of the same physical memory.
The GPU 102 comprises a pre-processing module 106, a tiling unit 108 and rendering logic 110, wherein the rendering logic 110 comprises a fetch unit 112 and processing logic 113 which includes one or more processing cores 114. The rendering logic 110 is configured to use the processing cores 114 to implement hidden surface removal (HSR) and texturing and/or shading on graphics data (e.g. primitive fragments) for tiles of the rendering space.
The graphics processing system 100 is arranged such that a sequence of primitives provided by an application is received at the pre-processing module 106. In a geometry processing phase, the pre-processing module 106 performs functions such as geometry processing including clipping and culling to remove primitives which do not fall into a visible view. The pre-processing module 106 may also project the primitives into screen-space. The primitives which are output from the pre-processing module 106 are passed to the tiling unit 108 which determines which primitives are present within each of the tiles of the rendering space of the graphics processing system 100. The tiling unit 108 assigns primitives to tiles of the rendering space by creating control streams (or “display lists”) for the tiles, wherein the control stream for a tile includes indications of primitives which are present within the tile. The control streams and the primitives are outputted from the tiling unit 108 and stored in the memory 1041.
In a rendering phase, the rendering logic 110 renders graphics data for tiles of the rendering space to generate values of a render, e.g. rendered image values. The rendering logic 110 may be configured to implement any suitable rendering technique, such as rasterisation or ray tracing to perform the rendering. In order to render a tile, the fetch unit 112 fetches the control stream for a tile and the primitives relevant to that tile from the memory 1041. For example, the rendering unit may implement rasterisation according to a deferred rendering technique, such that one or more of the processing core(s) 114 are used to perform hidden surface removal to thereby remove fragments of primitives which are hidden in the scene, and then one or more of the processing core(s) 114 are used to apply texturing and/or shading to the remaining primitive fragments to thereby form rendered image values. Methods of performing hidden surface removal and texturing/shading are known in the art. The term “fragment” refers to a sample of a primitive at a sampling point, which is to be processed for rendering pixels of an image. In some examples, there may be a one to one mapping of sample positions to pixels. In other examples there may be more sample positions than pixels, and this oversampling can allow for higher quality rendering of pixel values, e.g. by facilitating anti-aliasing and other filtering that may be applied to multiple fragments for rendering each of the pixel values. The texturing and/or shading performed on the fragments which pass the HSR stage determines pixel colour values of a rendered image which can be passed to the memory 1042 for storage in a frame buffer. Texture data may be received at the rendering logic 110 from the memory 1041 in order to apply texturing to the primitive fragments, as is known in the art. Shader programs may be executed to apply shading to the primitive fragments. The texturing/shading process may include applying further processing to the primitive fragments (e.g. alpha blending and other processes), as is known in the art in order to determine rendered pixel values of an image. The rendering logic 110 processes primitives in each of the tiles and when the whole image has been rendered and stored in the memory 1042, the rendered image can be outputted from the graphics processing system 100 and used in any suitable manner, e.g. displayed on a display or stored in memory or transmitted to another device, etc.
In some systems, a particular processing core can be used to perform hidden surface removal at one point in time and texturing/shading at another point in time. In some other systems, some of the processing cores are dedicated for performing hidden surface removal whilst others of the processing cores are dedicated for performing texturing and/or shading on primitive fragments.
The graphics processing system 100 described above is a deferred rendering system because the rendering logic 110 is configured to perform the HSR processing on a primitive fragment before the texturing/shading processing is applied to the primitive fragment. Other graphics processing systems are not deferred rendering system in the sense that they are configured to perform the texturing and/or shading of primitive fragments before the HSR is performed on those primitive fragments. Deferred rendering systems avoid the processing involved in applying texturing and/or shading to at least some of the primitive fragments which are removed by the hidden surface removal process.
If the rendering logic 110 includes more than one processing core 114 then the processing cores can process different data in parallel, thereby improving the efficiency of the rendering logic 110. In some systems, the tiles are assigned to processing cores of the rendering logic 110, such that the graphics data for rendering a particular tile is processed in a single processing core. The graphics data for rendering a different tile may be processed by a different, single processing core. Processing a particular tile on a single processing core (rather than spreading the processing of the particular tile across multiple cores) can have benefits such as an improved cache hit rate. Multiple tiles may be assigned to the same processing core, which can be referred to as having “multiple tiles in flight”. When all of the tiles for a render have been processed by the rendering logic 110, the render is complete. Then the results of the render (e.g. a rendered frame) can be used as appropriate (e.g. displayed on a display or stored in a memory or transmitted to another device, etc.), and the rendering logic 110 can process tiles of a subsequent render.