The technology described herein relates to graphics processing systems, and in particular to tile-based graphics processing systems.
As is known in the art, graphics processing is normally carried out by first dividing the output to be generated, such as a frame to be displayed, into a number of similar basic components (so-called “primitives”) to allow the graphics processing operations to be more easily carried out. These “primitives” are usually in the form of simple polygons, such as triangles.
The graphics primitives are usually generated by the applications program interface for the graphics processing system, using the graphics drawing instructions (requests) received from the application (e.g. game) that requires the graphics output.
Each primitive is at this stage usually defined by and represented as a set of vertices. Each vertex for a primitive has associated with it a set of data (such as position, colour, texture and other attributes data) representing the vertex. This data is then used, e.g., when rasterising and rendering the vertex (the primitive(s) to which the vertex relates) in order to generate the desired output of the graphics processing system.
Once primitives and their vertices have been generated and defined, they can be processed by the graphics processing system, in order, e.g., to display the frame.
This process basically involves determining which sampling points of an array of sampling points covering the output area to be processed are covered by a primitive, and then determining the appearance each sampling point should have (e.g. in terms of its colour, etc.) to represent the primitive at that sampling point. These processes are commonly referred to as rasterising and rendering, respectively.
The rasterising process determines the sample positions that should be used for a primitive (i.e. the (x, y) positions of the sample points to be used to represent the primitive in the output, e.g. scene to be displayed). This is typically done using the positions of the vertices of a primitive.
The rendering process then derives the data, such as red, green and blue (RGB) colour values and an “Alpha” (transparency) value, necessary to represent the primitive at the sample points (i.e. “shades” each sample point). This can involve, as is known in the art, applying textures, blending sample point data values, etc.
(In graphics literature, the term “rasterisation” is sometimes used to mean both primitive conversion to sample positions and rendering. However, herein “rasterisation” will be used to refer to converting primitive data to sampling point addresses only.)
These processes are typically carried out by testing sets of one, or of more than one, sampling point, and then generating for each set of sampling points found to include a sample point that is inside (covered by) the primitive in question (being tested), a discrete graphical entity usually referred to as a “fragment” on which the graphics processing operations (such as rendering) are carried out. Covered sampling points are thus, in effect, processed as fragments that will be used to render the primitive at the sampling points in question. The “fragments” are the graphical entities that pass through the rendering process (the rendering pipeline). Each fragment that is generated and processed may, e.g., represent a single sampling point or a set of plural sampling points, depending upon how the graphics processing system is configured.
(A “fragment” is therefore effectively (has associated with it) a set of primitive data as interpolated to a given output space sample point or points of a primitive. It may also include per-primitive and other state data that is required to shade the primitive at the sample point (fragment position) in question. Each graphics fragment may typically be the same size and location as a “pixel” of the output (e.g. output frame) (since as the pixels are the singularities in the final display, there may be a one-to-one mapping between the “fragments” the graphics processor operates on (renders) and the pixels of a display). However, it can be the case that there is not a one-to-one correspondence between a fragment and a display pixel, for example where particular forms of post-processing, such as downsampling, are carried out on the rendered image prior to displaying the final image.)
(It is also the case that as multiple fragments, e.g. from different overlapping primitives, at a given location may affect each other (e.g. due to transparency and/or blending), the final pixel output may depend upon plural or all fragments at that pixel location.)
(Correspondingly, there may be a one-to-one correspondence between the sampling points and the pixels of a display, but more typically there may not be a one-to-one correspondence between sampling points and display pixels, as downsampling may be carried out on the rendered sample values to generate the output pixel values for displaying the final image. Similarly, where multiple sampling point values, e.g. from different overlapping primitives, at a given location affect each other (e.g. due to transparency and/or blending), the final pixel output will also depend upon plural overlapping sample values at that pixel location.)
As is known in the art, graphics processing systems and graphics processors are typically provided in the form of graphics processing pipelines which have multiple processing stages for performing the graphics processing functions, such as fetching input data, geometry processing, vertex shading, rasterisation, rendering, etc., necessary to generate the desired set of output graphics data (which may, e.g., represent all or part of a frame to be displayed).
The processing stages of the graphics processing pipeline may, e.g., be in the form of fixed-function units (hardware), or some or all of the functional units may be programmable (be provided by means of programmable circuitry that can be programmed to perform the desired operation). For example, a graphics processing pipeline may include programmable vertex and/or fragment shaders for performing desired vertex and/or fragment shading operations.
A tile-based graphics processing pipeline will also include one or more so-called tile buffers that store rendered fragment data at the end of the pipeline until a given tile is completed and written out to an external memory, such as a frame buffer, for use. This local, pipeline memory is used to retain fragment data locally before the data is finally exported to external memory.
The data in the tile buffer is usually stored as an array of sample values, with different sets of the sample values corresponding to and being associated with respect output pixels of an array of output pixels. There may, e.g., be one sample per pixel position, but more typically there will be multiple, e.g. 4, samples per pixel, for example where rendering outputs are generated in a multisampled fashion. The tile buffer may store, e.g., a colour buffer containing colour values for the tile in question, and a depth buffer storing depth values for the tile in question.
In order to facilitate the writing back of rendered graphics data from the tile buffers to external memory, such as a frame buffer, a graphics processing pipeline will typically include write out circuitry coupled to the tile buffer pipeline memory for this purpose. The graphics processing pipeline may also be provided with fixed-function downsampling circuitry for downsampling the locally stored data before it is written out to external memory where that is required (as may, e.g., be the case where a frame to be displayed is rendered in a supersampled or multisampled manner for anti-aliasing purposes).
It is becoming increasingly desirable when performing graphics processing to perform so-called “deferred shading”. When doing deferred shading, the application performs multiple render passes, usually using multiple render targets in a first rendering pass to output colour, depth, surface normals, and potentially other attributes, to separate render targets. It then reads in the outputs from the first rendering pass to do complex light calculations and compositions to produce the final result in a second rendering pass. This requires a lot of bandwidth to read and write all of the render targets (as an application will usually, for example, write out multiple render targets in the first pass, and then use render targets as textures in the second pass to generate the final result).
In graphics processors in lower power and portable devices, the bandwidth cost of writing data to external memory from the graphics processing pipeline and for the converse operation of reading data from external memory to the local memory of the graphics processing pipeline for deferred shading can be a significant issue. Bandwidth consumption can be a big source of heat and of power consumption, and so it is generally desirable to try to reduce bandwidth consumption for external memory reads and writes in embedded graphics processing systems.
Various techniques have accordingly already been proposed to try to reduce bandwidth consumption for external memory reads and writes in graphics processing systems. These techniques include, for example, using texture and frame buffer compression to try to reduce the amount of data that must be written/read, and/or trying to eliminate unnecessary external memory (e.g. frame buffer) read and write transactions (operations).
Notwithstanding these known techniques, the Applicants believe that there remains scope for further improvements for improved techniques for performing deferred shading in graphics processing pipelines, and in particular in tile-based graphics processing pipelines.
Like reference numerals are used for like components where appropriate in the drawings.