The technology described herein relates to graphics processors, and in particular to the operation of graphics processors that include one or more programmable processing stages (“shaders”).
Graphics processing is typically carried out in a pipelined fashion, with one or more pipeline stages operating on the data to generate the final render output, e.g. frame that is displayed. Many graphics processing pipelines now include one or more programmable processing stages, commonly referred to as “shaders”. For example, a graphics processing pipeline may include one or more of, and typically all of, a geometry shader, a vertex shader and a fragment (pixel) shader. These shaders are programmable processing stages that execute shader programs on input data values to generate a desired set of output data (e.g. appropriately shaded and rendered fragment data in the case of a fragment shader) for processing by the rest of the graphics pipeline and/or for output. The shaders of the graphics processing pipeline may share programmable processing circuitry, or they may each be distinct programmable processing units.
A shader program to be executed by a given “shader” of a graphics processing pipeline will be provided by the application that requires the processing by the graphics processing pipeline using a high-level shader programming language, such as GLSL, HLSL, OpenCL, etc. This shader program will consist of “expressions” indicating desired programming steps defined in the relevant language standards (specifications). The high-level shader program is then translated by a shader language compiler to binary code for the target graphics processing pipeline. This binary code will consist of “instructions” which are specified in the instruction set specification for the given target graphics processing pipeline. The compilation process for converting the shader language expressions to binary code instructions may take place via a number of intermediate representations of the program within the compiler. Thus the program written in the high-level shader language may be translated into a compiler specific intermediate representation (and there may be several successive intermediate representations within the compiler), with the final intermediate representation being translated into the binary code instructions for the target graphics processing pipeline.
Thus, references to “expressions” herein, unless the context otherwise requires, refer to shader language constructions that are to be compiled to a target graphics processor binary code (i.e. are to be expressed in hardware micro-instructions). (Such shader language constructions may, depending on the shader language in question, be referred to as “expressions”, “statements”, etc. For convenience, the term “expressions” will be used herein, but this is intended to encompass all equivalent shader language constructions such as “statements” in GLSL.) “Instructions” correspondingly refer to the actual hardware instructions (code) that are emitted to perform an “expression”.
A graphics processing pipeline shader thus performs processing by running small programs for each “work item” in an output to be generated, such as a render target, e.g. frame (a “work item” in this case would be usually a vertex or a sampling position (e.g. in the case of a fragment shader)). Where the graphics processing pipeline is being used for “compute shading” (e.g. under OpenCL or DirectCompute) then the work items will be appropriate compute shading work items. This shader operation generally enables a high degree of parallelism, in that a typical render output, e.g. frame, features a rather large number of work items (e.g. vertices or fragments), each of which can be processed independently.
In graphics shader operation, each work item is processed by means of an execution thread which will execute the shader program in question for the work item in question. As there will typically be a large number of work items (e.g. vertices or sampling positions), and thus corresponding threads, to be processed for a given shader program, a graphics processing system can be considered to be a massively multi-threaded system.
The Applicants have recognised that many graphics shader programs will include operations (expressions) that will produce identical values for sets of plural threads to be executed (e.g. for every thread in a draw call).
For example, the OpenGL ES vertex shader:
uniform mat4 a;uniform mat4 b;uniform mat4 c;attribute vec4 d;void main( ){  gl_Position = a * b * c * d;}will produce identical values for the computation of “a*b*c” for each thread (where each thread represents a given vertex), as the data inputs are uniform variables. Thus if this computation could be executed once and the result shared between plural threads, the execution of the shader program could be made more efficient.
The Applicants have previously proposed in their earlier UK patent application no. GB-A-2516358 the use of a “pilot” shader program to execute once expressions that will produce identical values for a set of plural threads (e.g. for a draw call), and then a “main” shader program which executes for each work item, using the results of the “pilot shader” instead of recalculating the common expressions each time.
However, notwithstanding this, the Applicants believe that there remains scope for improvements to execution of shader programs in graphics processing pipelines that include one or more shader stages.
Like reference numerals are used for like components where appropriate in the drawings.