The technology described herein relates to the processing of computer graphics, and in particular to hidden surface removal in graphics processing.
As is known in the art, graphics processing is normally carried out by first dividing the graphics processing (render) output, such as a frame to be displayed, into a number of similar basic components (so-called “primitives”) to allow the graphics processing operations to be more easily carried out. These “primitives” are usually in the form of simple polygons, such as triangles.
The primitives for an output such as a frame to be displayed are usually generated by the applications program interface for the graphics processing system, using the graphics drawing instructions (requests) received from the application (e.g. game) that requires the graphics processing.
Each primitive is at this stage usually defined by and represented as a set of vertices. Each vertex for a primitive has associated with it a set of data (such as position, colour, texture and other attributes data) representing the vertex. This data is then used, e.g., when rasterising and rendering the vertex (the primitive(s) to which the vertex relates), e.g. for display.
Once primitives and their vertices have been generated and defined, they can be processed by the graphics processing system, in order, e.g., to display the frame.
This process basically involves determining which sampling points of an array of sampling points covering the output area to be processed are covered by a primitive, and then determining the appearance each sampling point should have (e.g. in terms of its colour, etc.) to represent the primitive at that sampling point. These processes are commonly referred to as rasterising and rendering, respectively.
The rasterising process determines the sampling points that should be used for a primitive (i.e. the (x, y) positions of the sample points to be used to represent the primitive in the render output, e.g. frame to be displayed). This is typically done using the positions of the vertices of a primitive.
The rendering process then derives the data, such as red, green and blue (RGB) colour values and an “Alpha” (transparency) value, necessary to represent the primitive at the sample points (i.e. “shades” each sample point). This can involve, as is known in the art, applying textures, blending sample point data values, etc.
(In 3D graphics literature, the term “rasterisation” is sometimes used to mean both primitive conversion to sample positions and rendering. However, herein “rasterisation” will be used to refer to converting primitive data to sampling point addresses only.)
These processes are typically carried out by testing sets of one, or of more than one, sampling point, and then generating for each set of sampling points found to include a sample point that is inside (covered by) the primitive in question (being tested), a discrete graphical entity usually referred to as a “fragment” on which the graphics processing operations (such as rendering) are carried out. Covered sampling points are thus, in effect, processed as fragments that will be used to render the primitive at the sampling points in question. The “fragments” are the graphical entities that pass through the rendering process (the rendering pipeline). Each fragment that is generated and processed may, e.g., represent a single sampling point or a set of plural sampling points, depending upon how the graphics processing system is configured.
(A “fragment” is therefore effectively (has associated with it) a set of primitive data as interpolated to a given output space sample point or points of a primitive. It may also include per-primitive and other state data that is required to shade the primitive at the sample point (fragment position) in question. Each graphics fragment may typically be the same size and location as a “pixel” of the output (e.g. output frame) (since as the pixels are the singularities in the final display, there may be a one-to-one mapping between the “fragments” the graphics processor operates on (renders) and the pixels of a display). However, it can be the case that there is not a one-to-one correspondence between a fragment and a display pixel, for example where particular forms of post-processing, such as downsampling, are carried out on the rendered image prior to displaying the final image.)
(It is also the case that as multiple fragments, e.g. from different overlapping primitives, at a given location may affect each other (e.g. due to transparency and/or blending), the final pixel output may depend upon plural or all fragments at that pixel location.)
(Correspondingly, there may be a one-to-one correspondence between the sampling points and the pixels of a display, but more typically there may not be a one-to-one correspondence between sampling points and display pixels, as downsampling may be carried out on the rendered sample values to generate the output pixel values for displaying the final image. Similarly, where multiple sampling point values, e.g. from different overlapping primitives, at a given location affect each other (e.g. due to transparency and/or blending), the final pixel output will also depend upon plural overlapping sample values at that pixel location.)
In one known technique for graphics processing, which is commonly referred to as “immediate mode” graphics processing or rendering, primitives are processed (rasterised and rendered) as they are generated, one after another.
In this type of system, the primitives (their vertices) are passed to the graphics system on a first-come, first-served basis, and primitives are thus rendered in the order that they are received.
It is also known in graphics processing systems to use so-called “tile-based” or “deferred” rendering. In tile-based rendering, rather than the entire render output, e.g., frame, effectively being processed in one go as in immediate mode rendering, the render output, e.g., frame to be displayed, is divided into a plurality of smaller sub-regions, usually referred to as “tiles”. Each tile (sub-region) is rendered separately (typically one-after-another), and the rendered tiles (sub-regions) are then recombined to provide the complete render output, e.g., frame for display. In such arrangements, the render output is typically divided into regularly-sized and shaped sub-regions (tiles) (which are usually, e.g., squares or rectangles), but this is not essential.
In both immediate mode and tile-based rendering, the input to the rasterisation and rendering processes will typically include a list of graphics commands to be executed by the graphics processor. This “command list” will include, as is known in the art, commands instructing the graphics processor to draw primitives, and commands instructing other graphics processes, such as rendering state changes, start and end tile commands (in a tile-based system), etc.
In immediate mode rendering this command list will simply list the commands to be executed one-after-another, whereas in tile-based rendering the list may be, and typically will be, divided into “tiles” (i.e. will list the commands for each tile separately to the commands for the other tiles).
One drawback of current graphics processing systems is that because primitives are processed sequentially, and typically not in perfect front-to-back order, a given sampling point (and hence fragment and pixel) may be shaded multiple-times as an output is processed, e.g. for display. This occurs when a first received and rendered primitive is subsequently covered by a later primitive, such that the rendered first primitive is not in fact seen at the pixel(s) (and sampling point(s)) in question. Primitives can be overwritten many times in this manner and this typically leads to multiple, ultimately redundant, rendering operations being carried out for each render output, e.g. frame, being rendered. This phenomenon is commonly referred to as “overdraw”.
The consequences of performing such ultimately redundant operations include reduced frame rates and increased memory bandwidth requirements (e.g. as a consequence of fetching data for primitives that will be overwritten by later primitives). Both of these things are undesirable and reduce the overall performance of a graphics processing system. These problems will tend to be exacerbated as render outputs, such as frames to be rendered, become larger and more complex (as there will be more surfaces in the potentially-visible view), and as the use of programmable fragment shading increases (as the cost of shading a given fragment using programmable fragment shading is relatively greater).
The problem of “overdraw” could be significantly reduced by sending primitives for rendering in front-to-back order. However, other graphics processing requirements, such as the need for coherent access to resources such as textures, and the need to minimise the number of API calls per frame, generally mandate other ordering requirements for primitives. Also, a full front-to-back sort of primitives prior to rendering may not be practical while still maintaining a sufficient throughput of primitives to the graphics processing unit. These and other factors mean that front-to-back ordering of primitives for a given render output, e.g., frame, is generally not possible or desirable in practice.
A number of other techniques have therefore been proposed to try to reduce the amount of “overdraw” (the amount of redundant processing of hidden surfaces) that is performed when processing a render output, such as a frame for display (i.e. to avoid rendering non-visible primitives and/or fragments, etc.).
For example, it is known to carry out forms of hidden surface removal before a primitive and/or fragment is sent for rendering, to see if the primitive or fragment etc. will be obscured by a primitive that has already been rendered (in which case the new fragment and/or primitive need not be rendered). Such hidden surface removal may comprise, for example, early occlusion culling, such as early-Z (depth) and/or stencil, testing processes, as is known in the art.
These arrangements try to identify, e.g., fragments that will be occluded by already processed primitives (and therefore that do not need processing) before the later fragments are issued to the rendering pipeline. In these arrangements, the depth value, e.g., of a new fragment to be processed is compared to the current depth value for that fragment position in the depth buffer to see if the new fragment is occluded or not. This can help to avoid sending fragments that are occluded by already processed primitives through the rendering pipeline.
However, these “early” (prior to rendering) hidden surface removal techniques only take account of fragments that have completed their processing (that have already been rendered) at the time the new, e.g., primitive or fragment (the primitive or fragment being “early” tested) is being tested. This is because the relevant test data (such as the Z-buffer) only contains data from fragments that have already been processed.
In a proposal described in “Delay Streams for Graphics Hardware”, by Timo Alia, Ville Miettinen and Petri Nordlund, Siggraph 2003, a graphics processing pipeline that uses an early-Z test is modified to include a delay stream and a second early occlusion test stage. The second early occlusion test takes place before rasterising and rendering, but after a first early occlusion test stage and the delay stream.
The idea here is that by the time a given primitive reaches the second early occlusion test stage, more primitives will have contributed to the, e.g., Z-buffer data (since the deliberate delay stream allows time for more primitives to complete their processing before a primitive reaches the second early occlusion test stage), such that that second occlusion test stage can take account of more primitives than in standard, single early occlusion testing arrangements.
However, this arrangement requires some modifications to the graphics processing pipeline, such as the addition of a second early-occlusion test stage, and only operates on primitives (and so is non-exact and must be very conservative (and so can give little or no benefit in complex meshes, for example)).
It has also been proposed to use a per-sample or per-fragment sorting pass before sending any fragments to the rendering pipeline so as to identify the front-most fragment for each fragment position before the fragments are issued to the rendering pipeline. This can effectively remove all hidden surfaces, regardless of the order the primitives are received in (as it identifies the fragment that needs to be processed for each fragment position before the fragments are sent for rendering).
However, this arrangement has a fixed cost irrespective of the order that the primitives are received in (i.e. irrespective of the rendering order specified by the application), and has to apply a large number of special cases to handle things like transparency, etc., where the application specified rendering order must be preserved. The cost for this sorting also increases with the number of samples being considered (as it essentially has to sort per sample), thereby making rendering using a high number of samples, such as multi-sampled anti-aliasing, very expensive when using this technique.
It is also known for application (e.g. game) developers to configure the application such that each render output, e.g., frame, is, in effect, rendered twice, first of all to draw all the opaque geometry with all rendering states other than the Z-test and Z-write disabled, and then a second time with full render states enabled.
This has the effect that the first rendering pass effectively fills the Z-buffer with the Z (depth) value of the closest opaque primitive for each fragment (sample) position. In the second, full rendering pass, as the Z-buffer is now filled with the Z-value of the closest opaque primitive, any early-Z test on the second pass can more effectively reject occluded fragments, and in particular will take account of all the primitives rendered in the first, “Z-only” rendering pass, not just of primitives that have been rendered ahead of the fragment in question.
This technique therefore can provide a more efficient early-Z test process, but it has the disadvantage that a given render output must be generated by the application and processed by the graphics processing system twice, once for the Z-only pass, and then again for the “full” rendering pass. While this may not be too problematic for higher powered, e.g. desktop, graphics systems, other, lower powered graphics systems, such as for portable and mobile devices, may, e.g., have bandwidth constraints that make generating and rendering each entire potentially-visible render output twice undesirable.
The Applicants believe therefore that there remains scope for improved techniques for hidden surface removal in graphics processing systems.