Graphics processing systems are typically configured to receive graphics data, e.g. from an application running on a computer system, and to render the graphics data to provide a rendering output. For example, the graphics data provided to a graphics processing system may describe geometry within a three dimensional (3D) scene to be rendered, and the rendering output may be a rendered image of the scene. Some graphics processing systems (which may be referred to as “tile-based” graphics processing systems) use a rendering space which is subdivided into a plurality of tiles. The “tiles” are regions of the rendering space, and may have any suitable shape, but are typically rectangular (where the term “rectangular” includes square). As is known in the art, there are many benefits to subdividing the rendering space into tiles. For example, subdividing the rendering space into tiles allows an image to be rendered in a tile-by-tile manner, wherein graphics data for a tile can be temporarily stored “on-chip” during the rendering of the tile, thereby reducing the amount of data transferred between a system memory and a chip on which a graphics processing unit (GPU) of the graphics processing system is implemented.
Tile-based graphics processing systems typically operate in two phases: a geometry processing phase and a rasterisation phase. In the geometry processing phase, the graphics data for a render is analysed to determine, for each of the tiles, which graphics data items are present within that tile. Then in the rasterisation phase, a tile can be rendered by processing those graphics data items which are determined to be present within that tile (without needing to process graphics data items which were determined in the geometry processing phase to not be present within the particular tile).
FIG. 1 shows an example of a tile-based graphics processing system 100. The system 100 comprises a memory 102, geometry processing logic 104 and rasterisation logic 106. The geometry processing logic 104 and the rasterisation logic 106 may be implemented on a GPU and may share some processing resources, as is known in the art. The geometry processing logic 104 comprises a geometry fetch unit 108, geometry transform logic 110, a cull/clip unit 112 and a tiling unit 114. The rasterisation logic 106 comprises a parameter fetch unit 116, a hidden surface removal (HSR) unit 118 and a texturing/shading unit 120. The memory 102 may be implemented as one or more physical blocks of memory, and includes a graphics memory 122, a transformed parameter memory 124, a control stream memory 126 and a frame buffer 128.
The geometry processing logic 104 performs the geometry processing phase, in which the geometry fetch unit 108 fetches geometry data from the graphics memory 122 and passes the fetched data to the transform logic 110. The geometry data comprises graphics data items which describe geometry to be rendered. For example, the graphics data items may represent geometric shapes, which describe surfaces of structures in the scene, and which are referred to as “primitives”. A common primitive shape is a triangle, but primitives may be other 2D shapes and may be lines or points also. Objects can be composed of one or more such primitives. Objects can be composed of many thousands, or even millions of such primitives. Scenes typically contain many objects. Some of the graphics data items may be control points which describe a patch to be tessellated to generate a plurality of tessellated primitives.
The transform logic 110 transforms the geometry data into the rendering space and may apply lighting/attribute processing as is known in the art. The resulting data is passed to the cull/clip unit 112 which culls and/or clips any geometry which falls outside of a viewing frustum. The resulting transformed geometric data items (e.g. primitives) are provided to the tiling unit 114, and are also provided to the memory 102 for storage in the transformed parameter memory 124. The tiling unit 114 generates control stream data for each of the tiles of the rendering space, wherein the control stream data for a tile includes identifiers of transformed primitives which are to be used for rendering the tile, i.e. transformed primitives which are positioned at least partially within the tile. The control stream data for a tile may be referred to as a “display list” or an “object list” for the tile. The control stream data for the tiles is provided to the memory 102 for storage in the control stream memory 126. Therefore, following the geometry processing phase, the transformed primitives to be rendered are stored in the transformed parameter memory 124 and the control stream data indicating which of the transformed primitives are present in each of the tiles is stored in the control stream memory 126.
In the rasterisation phase, the rasterisation logic 106 renders the primitives in a tile-by-tile manner. The parameter fetch unit 116 receives the control stream data for a tile, and fetches the indicated transformed primitives from the transformed parameter memory 124, as indicated by the control stream data for the tile. The fetched transformed primitives are provided to the hidden surface removal (HSR) unit 118 which removes primitive fragments which are hidden (e.g. hidden by other primitive fragments). Methods of performing hidden surface removal are known in the art. The term “fragment” refers to a sample of a primitive at a sampling point, which is to be processed to render pixels of an image. In some examples, there may be a one to one mapping of fragments to pixels. However, in other examples there may be more fragments than pixels, and this oversampling can allow for higher quality rendering of pixel values, e.g. by facilitating anti-aliasing and other filtering that may be applied to multiple fragments for rendering each of the pixel values. Primitives which are not removed by the HSR unit 118 are provided to the texturing/shading unit 120, which applies texturing and/or shading to primitive fragments. Although it is not shown in FIG. 1, the texturing/shading unit 120 may receive texture data from the memory 102 in order to apply texturing to the primitive fragments, as is known in the art. The texturing/shading unit 120 may apply further processing to the primitive fragments (e.g. alpha blending and other processes), as is known in the art in order to determine rendered pixel values of an image. The rasterisation phase is performed for each of the tiles, such that the whole image can be rendered with pixel values for the whole image being determined. The rendered pixel values are provided to the memory 102 for storage in the frame buffer 128. The rendered image can then be used in any suitable manner, e.g. displayed on a display or stored in memory or transmitted to another device, etc.
The amount of geometry data used to represent scenes tends to increase as the complexity of computer graphics applications (e.g. game applications) increases. This means that in the system of FIG. 1, the amount of transformed geometry data which is provided from the geometry processing logic 104 to the memory 102 and stored in the transformed parameter memory 124 increases. This transfer of data from the geometry processing logic 104 (which is typically implemented “on-chip”) to the memory 102 (which is typically implemented “off-chip” as system memory) can be a relatively slow process (compared to other processes involved in rendering the geometry data) and can consume large amounts of the memory 102.
Therefore, as described in UK Patent Number GB2458488, some tile-based graphics processing systems can use “untransformed display lists”, such that the control stream data for a tile includes indications to the input geometry data, i.e. the untransformed geometry data rather than the transformed geometry data. This means that the transformed geometry data does not need to be provided from the geometry processing logic to the system memory, or stored in the system memory. These systems implement a transform unit in the rasterisation logic because the geometry data fetched by the rasterisation logic is untransformed, but in some scenarios the benefits of avoiding the delay and memory usage of transferring the transformed primitives to the system memory and storing them in the system memory may outweigh the processing costs of performing a transformation in the rasterisation phase.
FIG. 2 shows an example of a system 200 which uses untransformed display lists, similar to that described in GB2458488. The system 200 is similar to the system 100 shown in FIG. 1, and comprises a memory 202, geometry processing logic 204 and rasterisation logic 206. The geometry processing logic 204 and the rasterisation logic 206 may be implemented on a GPU and may share some processing resources, as is known in the art. The geometry processing logic 204 comprises a geometry data fetch unit 208, geometry transform logic 210, a cull/clip unit 212 and a tiling unit 214. The rasterisation logic 206 comprises a fetch unit 216, rasterisation transform logic 230, a HSR unit 218 and a texturing/shading unit 220. The memory 202 may be implemented as one or more physical blocks of memory, and includes a graphics memory 222, a control stream memory 226 and a frame buffer 228.
The geometry processing logic 204 performs the geometry processing phase, in which the geometry data fetch unit 208 fetches geometry data from the graphics memory 222 and passes the fetched data to the transform logic 210. The fetch unit 208 might fetch only data used to compute position of the graphics data items (e.g. primitives) because other data of the graphics data items (e.g. colour data or texture data to be applied during rendering to the graphics data items, etc.) is not needed by the geometry processing logic 204. This is different to the system 100 in which all of the data for graphics data items is fetched by the fetch unit 108. The transform logic 210 transforms the position data of the graphics data items into the rendering space, and the resulting data is passed to the cull/clip unit 212 which culls and/or clips any graphics data items which fall outside of a viewing frustum. The tiling unit 214 generates control stream data for each of the tiles of the rendering space, wherein the control stream data for a tile includes identifiers of graphics data items which are to be used for rendering the tile, e.g. primitives which, when transformed, are positioned at least partially within the tile. The identifiers in the control stream data identify input graphics data items, i.e. graphics data items stored in the graphics memory 222. This is different to the system 100 shown in FIG. 1 in which the identifiers in the control stream data identify transformed primitives stored in the transformed parameter memory 124. The control stream data for the tiles is provided to the memory 202 for storage in the control stream memory 226.
In the rasterisation phase, the fetch unit 216 of the rasterisation logic 206 receives the control stream data for a tile from the control stream memory 226, and fetches the indicated input graphics data items from the graphics memory 222, as indicated by the control stream data for the tile. The input graphics data items are untransformed. The transform logic 230 transforms the fetched graphics data items into the rendering space. The transformed graphics data items are provided to the HSR unit 218 which performs HSR to remove primitive fragments which are hidden. The texturing and shading unit 220 then performs processing such as texturing and/or shading to primitive fragments which are not removed by the HSR unit 218. The HSR unit 218 and the texturing and shading unit 220 operate in a similar manner to the corresponding units 118 and 120 of the system 100 shown in FIG. 1 and described above. The resulting rendered pixel values are provided to the memory 202 for storage in the frame buffer 228 and can subsequently be used, e.g. displayed on a display or stored in memory or transmitted to another device, etc.
The previous patent GB2458488 describes an optimization for the rasterisation phase in which lighting or attribute processing is deferred until after hidden surface removal has been performed. In this optimization, two transform units are implemented in the rasterisation phase: a first transform unit implemented prior to the HSR unit which transforms only “position data” of primitives (i.e. data for use in computing the position of the primitives), and a second transform unit implemented after the HSR unit which performs lighting or attribute processing for primitives which pass the depth tests of the HSR unit. In this way, non-position attributes of primitives are computed only for primitives which are not culled by the HSR unit.
The previous patent GB2458488 describes a further optimization in which position data for primitives is transformed in the geometry processing phase and then stored in a parameter buffer. The position data for primitives can then be fetched during the rasterisation phase and used by the HSR unit and other processing units. The non-position attribute data for the primitives is fetched from memory and transformed for use by the HSR unit and the other processing units. This optimization avoids the need to re-compute the transformed position data for primitives in the rasterisation phase.