The technology described herein relates to graphics processing systems, and in particular to the operation of graphics processing systems that include one or more programmable processing stages (“shaders”).
Graphics processing is typically carried out in a pipelined fashion, with one or more pipeline stages operating on the data to generate the final render output, e.g. frame that is displayed. Many graphics processing pipelines now include one or more programmable processing stages, commonly referred to as “shaders”. For example, a graphics processing pipeline may include one or more of, and typically all of, a vertex shader and a fragment (pixel) shader. These shaders are programmable processing stages that execute shader programs on input data values to generate a desired set of output data (e.g. appropriately shaded and rendered fragment data in the case of a fragment shader) for processing by the rest of the graphics pipeline and/or for output.
Correspondingly, a graphics processor (a graphics processing unit (GPU)) that executes a graphics processing pipeline that includes one or more shaders will accordingly comprise one or more “shader cores” comprising appropriate programmable processing circuitry for executing the shader stages of the graphics processing pipeline. This programmable processing circuitry may comprise appropriate execution units and execution pipelines, such as one or more arithmetic execution units (arithmetic pipelines), load and store execution units (load and store pipelines), etc. The shaders of the graphics processing pipeline may share programmable processing circuitry and execution units, etc., or they may each be distinct programmable processing units and/or execution units, etc.
A graphics processing pipeline shader performs processing by running small programs for each “work item” in an output to be generated, such as a render target, e.g. frame. A “work item” in this case would usually be a vertex or a fragment (e.g. in the case of a fragment shader). Where the graphics processing pipeline is being used for “compute shading” (e.g. under OpenCL or DirectCompute) then the work items will be appropriate compute shading work items. The shader operation generally enables a high degree of parallelism, in that a typical render output, e.g. frame, will feature a large number of work items (e.g. of vertices or fragments), each of which is to be subjected to similar processing and can be processed independently.
In graphics shader operation, each work item is processed by means of an execution thread which will execute the shader program in question for the work item in question. As there will typically be a large number of work items (e.g. vertices or fragments), and thus corresponding threads, to be processed for a given shader program, the graphics processing system can be considered to be a massively multi-threaded system.
A shader program to be executed by a given “shader” of a graphics processing pipeline will be provided by the application that requires the processing by the graphics processing pipeline using a high-level shader programming language, such as GLSL, HLSL, OpenCL, etc. The shader program will consist of “expressions” indicating desired programming steps defined in the relevant language standards (specifications). The high-level shader program is then translated by a shader language compiler to binary code for the target graphics processing pipeline (for the shader core(s) executing the target graphics processing pipeline). This binary code will consist of “instructions” which are specified in the instruction set specification for the given target graphics processing pipeline. The compilation process for converting the shader language expressions to binary code instructions may take place via a number of intermediate representations of the program within the compiler. The compilation process is typically performed by the driver for the graphics processing unit (GPU) in question (that is, e.g., executing on a host processor of the overall data processing system that the graphics processing unit and graphics processing pipeline is part of), although other arrangements are possible.
It would be desirable as part of the shader compilation process to compile a shader program so as to make the shader program execution in use more efficient. However, because graphics processing systems are massively multi-threaded systems, and may therefore frequently execute threads which relate to different work item (e.g. fragment) output coordinates, and/or to plural work items (e.g. relating to different layers) corresponding to the same output coordinate (e.g. fragment) concurrently, the shader program performance for any single execution thread may not simply be a function of the shader program code for that thread, but may also be dependent on what other threads (which may also be executing a completely different shader program) are executing concurrently with that thread.
This then makes it difficult for the compilation process to be able to determine a more optimised arrangement for a given shader program to be executed, as the shader execution performance will depend on the actual runtime conditions encountered by the execution threads, such as what other threads are executing at the same time, which may vary and be dependent upon, e.g., the particular content of the particular render output region or regions currently being processed, which runtime conditions are difficult for a shader compiler (e.g. GPU driver) to be able to determine in advance (i.e. when it is compiling the shader program).
This is exacerbated in the case of multi-core graphics processing units that include plural processing cores, each able to execute a given shader program or programs for execution threads in parallel with other processing cores of the multi-core graphics processing unit.
The Applicants believe therefore that there remains scope for improvements to execution of shader programs in graphics processing pipelines that include one or more shader stages.
Like reference numerals are used for like components where appropriate in the drawings.