Hardware and/or software developers frequently want to monitor, or profile, the performance of a computer system when performing certain processing tasks in order to optimise the operation or design of certain software or hardware components. One field where performance profiling is particularly useful is in graphics processing. Graphics processing concerns the rendering of two-dimensional images for display on a screen from a (typically) three-dimensional model of a scene. The image of the scene can be rendered by performing a series of processing steps referred to as a graphics pipeline.
Application developers may wish to profile the performance of the graphics system when rendering a scene according to the graphics pipeline. Sometimes, it may be sufficient to know only relatively coarse performance information, e.g. the total length of time required to render a scene. However, it is often desirable to obtain a finer granularity of performance information so that bottlenecks in the system's performance can be more readily identified. For example, it may be desirable to obtain performance information related to only part of a scene being rendered, so that the developer can optimise the relevant part of the scene so as to improve the graphics system's performance during rendering.
For certain graphics architectures, it may be relatively simple to obtain finer resolution performance information. One such architecture is pure immediate mode rendering (IMR). In immediate mode rendering, each submitted graphical object of the scene travels through the entire graphics pipeline and is processed in its entirety, ahead of the next graphical object that is submitted to the pipeline. Thus the graphical objects stay in order, i.e. the completion order for the objects is the same as the order in which the objects were submitted to the pipeline. Information on the performance of the graphics system when rendering a particular graphical object can therefore be obtained by beginning the measurement of the particular performance parameter when drawing of the object begins (e.g. when the graphical object is submitted to the beginning of the graphics pipeline) and stopping the measurement when the drawing ends (e.g. once the pixel data for the object has been generated).
However, for other graphics architectures, obtaining the performance data may be less straightforward. One class of such architectures are tile-based renderers. In tile based renderers, an image to be rendered is sub-divided into screen-space tiles, with rasterization then being performed on a per-tile basis instead of rasterizing the entire image as an immediate mode renderer would. Tile-based renderers typically exploit a high level of parallelism. In addition, it is not unusual for objects of the scene to be rendered to span across multiple tiles. For example, an object of the scene (e.g. a person) may span multiple tiles, where the person occupies a relatively large proportion of certain tiles and a relatively little proportion of other tiles. The rendering of two tiles containing parts of the same object may be separated in time. Because of this, it may be difficult to obtain fine granularity performance information for tile-based renderers.