The technology described herein relates to graphics processing systems and in particular to tile-based graphics processing systems.
As is known in the art, in tile based rendering, the two dimensional output array or frame of the rendering process (the “render target”) (e.g., and typically, that will be displayed to display the scene being rendered) is sub-divided or partitioned into a plurality of smaller sub-regions, usually referred to as “tiles”, for the rendering process. The tiles (sub-regions) are each rendered separately (typically one after another). The rendered tiles (sub-regions) are then recombined to provide the complete output array (frame) (render target), e.g. for display.
The tiles can therefore be thought of as the sub divisions of the render target area (output frame) that the rendering process operates on. In such arrangements, the render target area (output frame) is typically divided into regularly sized and shaped tiles (they are usually, e.g., squares or rectangles) but this is not essential.
Other terms that are commonly used for “tiling” and “tile based” rendering include “chunking” (the sub-regions are referred to as “chunks”) and “bucket” rendering. The terms “tile” and “tiling” will be used herein for convenience, but it should be understood that these terms are intended to encompass all alternative and equivalent terms and techniques.
As is known in the art, in a tile-based graphics system, a list of drawing commands is usually built for each tile to be rendered (e.g. for each tile in the visible display area), based on which visible elements of the scene being rendered are visible in the tile in question. Then, when a tile is to be rendered, the list of drawing commands for that tile is allocated to the rendering processor for processing.
It is now known to provide tile-based graphics processing systems that include multiple independent tile rendering processors. This offers the opportunity to render plural tiles in parallel, thereby potentially reducing the time taken to render an output frame.
One issue with such multiple rendering processor arrangements is the question of how to allocate the different tiles to the different rendering processors for rendering (i.e. how to distribute the tiles among the rendering processors efficiently). A number of techniques have been proposed in the prior art for this.
For example, if it is assumed that there are n tiles on the screen, and m independent tile rendering processors, then a first known prior art strategy allocates a fixed set of n/m tiles to each processor. For example, with 2 processors, one might assign all tiles in the top half of the screen to processor 0, and all tiles in the bottom half of the screen to processor 1.
However, the Applicants have recognised that this is not optimal because there is no facility for load balancing. For example, if there is much more detail on the ground than in the sky, then processor 0 will stand idle waiting for processor 1 to catch up.
It is known to try to make such fixed allocation schemes more sophisticated. In this case a fixed set of tiles is allocated to each processor, but making a guess at the appropriate load balance, e.g. based on tile list complexity. For example, one might allocate the top ¾ of the tiles to processor 0 in the unbalanced example above, so that even though processor 0 has more tiles to process, the total time taken by each processor will be (it is hoped) approximately the same. However, this requires extra analysis of the tile lists, usage data from the previous frame, etc.
Another known allocation strategy is to order the tiles one after another, and then assign the next tile in the order to whichever processor becomes free first. This can balance the load between processors better.
However, the Applicants have recognised that this strategy will impair the exploitation of any potential spatial coherency between tiles (the Applicants have recognised that it is quite likely that adjacent tiles will share a lot of rendering state—textures used, material settings, etc.). This is because with this allocation strategy each processor will typically pick up a tile that is some distance away from its previous one as its next tile. (The exact number of tiles that will be “leapfrogged” here will be scene dependent, but will be usually be somewhere between √m and m tiles ahead of the previous one (where m is the number of processors), so this gets worse the more processors there are).
Thus, the current mechanisms for allocating tiles (the command lists for tiles) to rendering processors in multi-processor tile-based graphics processing systems each suffer from one or more drawbacks.
The Applicants believe therefore that there remains scope for improved tile allocation strategies in multi-processor, tile-based, graphics processing systems.