The technology described herein relates generally to the operation of a data processing systems that include programmable processing stages, such as a graphics processing system that includes one or more programmable processing stages (“shaders”).
Graphics processing is typically carried out in a pipelined fashion, with one or more pipeline stages operating on the data to generate the final render output, e.g. frame that is displayed. Many graphics processing pipelines now include one or more programmable processing stages, commonly referred to as “shaders”. For example, a graphics processing pipeline may include one or more of, and typically all of, a geometry shader, a vertex shader and a fragment (pixel) shader. These shaders are programmable processing stages that execute shader programs on input data values to generate a desired set of output data, for example appropriately shaded and rendered fragment data in the case of a fragment shader, for processing by the rest of the graphics pipeline and/or for output. The shaders of the graphics processing pipeline may share programmable processing circuitry, or they may each be distinct programmable processing units.
A graphics processing unit (GPU) shader core is thus a processing unit that performs graphics processing by running small programs for each graphics item in a graphics output to be generated, such as a render target, e.g. frame (an “item” in this regard is usually a vertex or a sampling position (e.g. in the case of a fragment shader)). This generally enables a high degree of parallelism, in that a typical render output, e.g. frame, features a rather large number of vertices and fragments, each of which can be processed independently.
In graphics shader operation, each “item” will be processed by means of an execution thread which will execute the instructions of the shader program in question for the graphics “item” in question.
A known way to improve shader execution efficiency is to group execution threads (where each thread corresponds, e.g., to one vertex or one sampling position) into “groups” or “bundles” of threads, where the threads of one group are run in lockstep, one instruction at a time. In this way, it is possible to share instruction fetch and scheduling resources between all the threads in the group. Other terms used for such thread groups include “warps” and “wavefronts”. For convenience, the term “thread group” will be used herein, but this is intended to encompass all equivalent terms and arrangements, unless otherwise indicated.
The Applicants believe that there remains scope for improvements to the handling of thread groups, particularly in shaders of graphics processing pipelines.
Like reference numerals are used for like components where appropriate in the drawings.