Graphics processing systems are used to process graphics data. For example, an application running on a computing system may need to render an image of a three dimensional (3D) scene for display to a user. The application can send graphics data to a graphics processing system to be rendered, wherein the graphics data describes a plurality of primitives to be rendered. As is known in the art, primitives are usually convex polygons, such as triangles or convex quadrilaterals, wherein a primitive typically has its position in the rendering space of the graphics processing system defined by the position of its vertices, and may have its appearance defined by other attributes such as colour or texture attributes. An object in a scene may be represented by one or more primitives. As graphics processing systems progress, their capability to render complex images improves, and as such applications make use of this and provide more complex images for graphics processing systems to render. This means that the number of primitives in images tends to increase, so the ability of a graphics processing system to process the primitives efficiently becomes more important.
One known way of improving the efficiency of a graphics processing system is to render an image in a tile-based manner. In this way, the rendering space into which primitives are to be rendered is divided into a plurality of tiles, which can then be rendered independently from each other. In order to render primitives, a rendering unit uses memory to store intermediate results (e.g. depth values and primitive identifiers, etc.) for different sample positions in the rendering space. If the rendering unit operates on a tile at a time then most (or all) of this memory can be situated “on-chip”, i.e. on the Graphics Processing Unit (GPU), which might not be possible if the whole rendering space is rendered at once. Therefore, in a tile-based graphics system, the number of read and write operations between the GPU and an off-chip memory (i.e. which may be referred to as “system memory”) is typically reduced compared to a non tile-based graphics system. Since read and write operations between the GPU and the system memory are typically very slow and use a large amount of power (as compared to operations performed within the GPU), tile-based graphics systems are often more efficient (in terms of power and speed) than non tile-based graphics systems. A tile-based graphics system includes a tiling unit to tile the primitives. That is, the tiling unit determines, for a primitive, which of a plurality of tiles of a rendering space the primitive is in. Then, when a rendering unit renders the tile, it can be given information indicating which primitives should be used to render that tile.
For example, FIG. 1 shows some elements of a tile-based graphics processing system 100 which may be used to render an image of a 3D scene. The graphics processing system 100 comprises a graphics processing unit (GPU) 102 and two portions of memory 1041 and 1042. It is noted that the two portions of memory 1041 and 1042 may, or may not, be parts of the same physical memory, and both memories 1041 and 1042 may be situated “off-chip”, i.e. not on the same chip as the GPU 102. Communication between the memories (1041 and 1042) and the GPU 102 may take place over a communications bus in the system 100.
The GPU 102 comprises a pre-processing module 106, a tiling unit 108 and a rendering unit 110. The tiling unit 108 comprises processing logic 112 and a data store 114, and the rendering unit 110 comprises a hidden surface removal (HSR) module 116 and a texturing/shading module 118. The graphics processing system 100 is arranged such that graphics data describing a sequence of primitives provided by an application is received at the pre-processing module 106. The pre-processing module 106 performs functions such as geometry processing including clipping and culling to remove primitives which do not fall into a visible view. The pre-processing module 106 may also project the primitives into screen-space. The pre-processing module 106 outputs primitives to the tiling unit 108.
The tiling unit 108 receives the primitives from the pre-processing module 106 and determines which of the primitives are present within each of the tiles of the rendering space of the graphics processing system 100. A primitive may be completely in one tile or may overlap two or more of the tiles of the rendering space. The tiling unit 108 assigns primitives to tiles of the rendering space by creating display lists for the tiles, wherein the display list for a tile includes indications of primitives (i.e. primitive IDs) which are present in the tile. The display lists and the primitives are outputted from the tiling unit 108 and stored in the memory 1041. The rendering unit 110 fetches the display list for a tile and the primitives relevant to that tile from the memory 1041, and the HSR module 116 performs hidden surface removal to thereby remove fragments of primitives which are hidden in the scene. The remaining fragments are passed to the texturing/shading module 118 which performs texturing and/or shading on the fragments to determine pixel colour values of a rendered image which can be passed to the memory 1042 for storage in a frame buffer. The rendering unit 110 processes primitives in each of the tiles and when the whole image has been rendered and stored in the memory 1042, the image can be outputted from the graphics processing system 100 and, for example, displayed on a display. In the example shown in FIG. 1, the tile-based graphics processing system 100 is a deferred rendering system, meaning that the rendering unit 110 performs hidden surface removal on a primitive fragment prior to performing texturing and/or shading on the primitive fragment in order to render the scene. However, in other examples, graphics processing systems might not be deferred rendering systems, such that texturing and/or shading is performed on a primitive fragment before hidden surface removal is performed on the primitive.
FIG. 2 shows an example of a rendering space 202 which has been divided into an 8×12 array of tiles 204, wherein the tile in the mth row and the nth column is referred to as 204mn. A primitive 206 is illustrated. The tiling unit 108 operates to determine which of the tiles 204mn the primitive 206 is in. The primitive 206 is “in” a tile 204mn if the primitive 206 at least partially overlaps with the tile. The tiling unit 108 determines a bounding box 208 by finding the minimum and maximum x and y coordinates of the three vertices of the primitive 206 and forming the box 208 from those coordinates. The tiling unit 108 can thereby determine that the primitive 206 is not in any of the tiles 204mn which are not in the bounding box 208. A tile 204 is “in” the bounding box 208 if the tile at least partially overlaps with the bounding box 208. In some examples, the bounding box may be determined at tile-resolution, whereby the bounding box may be increased in size such that the edges of the bounding box fall on tile boundaries. In FIG. 2, the tiles which are dotted (i.e. the top and bottom rows of tiles, the first column and the last two columns of tiles of the rendering space 202) are outside of the bounding box 208 and therefore, on that basis, the tiling unit 108 can determine that the primitive 206 is not in those tiles. In a very simple implementation, the tiling unit 108 might simply indicate that the primitive is in all of the tiles in the bounding box 208. However, this means that the primitive is indicated as being in some tiles which it is not actually in. This can lead to additional memory consumption due to the storage of unnecessary primitives and/or primitive IDs in memory 1041, and inefficiencies in the rendering unit 110 as primitives are read from memory 1041 and are processed for tiles in which they are not visible. Therefore, it is generally preferable for the tiling unit 108 to determine which of the tiles in the bounding the box 208 the primitive is in.
For each tile in the bounding box 208 (e.g. each of the white tiles in FIG. 2) tiling calculations can be performed to determine whether the primitive 206 is in the tile. For example, the tiling calculations to determine whether the primitive 206 is in a tile 204mn might include calculations for each edge of the primitive. For example, as illustrated in FIG. 3, equations representing edge lines (3021, 3022 and 3023) defining the edges of the primitive 206 are determined using the locations of the vertices (3041, 3042 and 3043) of the primitive 206. Then for each edge line 302, a test can be performed to determine whether a tile 204 is inside or outside the respective edge line 302 by comparing a position of a test point in the tile with the equation of the edge line 302. The test point in the tile may be different for testing with respect to different edges, i.e. the test point may be edge-specific. For example, for testing whether a tile is inside edge line 3021 the test point is in the bottom left of the tile; for testing whether a tile is inside edge line 3022 the test point is in the top left of the tile; and for testing whether a tile is inside edge line 3023 the test point is in the bottom right of the tile. If it is determined that the tile is inside all of the edge lines 302 then it is determined that the primitive is in the tile. However, if it is determined that the tile is outside any of the edge lines 302 then it is determined that the primitive is not in the tile.
The tiling calculations may be performed for each of the tiles in the bounding box 208 in order to determine whether the primitive is in the respective tiles. For each edge of the primitive, and for each tile in the bounding box, the comparison of the position of the edge-specific test point in the tile with the equation of the appropriate edge line typically involves performing one or more floating point operations. Floating point operations are costly to perform (in terms of time and power consumption). This may cause a problem, particularly due to the tendency for the number of primitives in an image to increase, because the number of floating point operations involved in the tiling process may become large enough to significantly detrimentally affect the performance of the graphics processing system 100. Therefore, it would generally be beneficial to reduce the time and power that is consumed in the tiling process.