A graphics processing unit (GPU) is a dedicated graphics rendering device used to generate computerized graphics for display on a display device. A GPU is typically used with a general purpose central processing unit (CPU) to process graphic image data, e.g., three-dimensional computerized graphic image data. In such a case, a GPU can implement a number of primitive graphics operations to create three-dimensional images for display on a display device more quickly than using a CPU to draw the image for display on the display device. Typically, a GPU includes hardware that implements some number of the complex algorithms in hardware.
A typical GPU receives an image geometry and uses a pipeline approach to generate graphics which can be output, for example, for display on a display device. A typical graphics pipeline includes a number of stages which operate in parallel, with the output from one stage possibly being used at another stage in the pipeline. For example, a typical graphics pipeline comprises vertex shader, primitive assembly, viewport transformation, primitive setup, rasterization, hidden primitive and pixel rejection, attribute setup, attribute interpolation and fragment shader stages.
A vertex shader is applied to the image geometry for an image and generates vertex coordinates and attributes of vertices within the image geometry. Vertex attributes include, for example, color, normal, and texture coordinates associated with a vertex. Primitive assembly forms primitives, e.g., point, line, and triangle primitives, from the vertices based on the image geometry. Formed primitives can be transformed from one space to another using a transformation, e.g., a viewport transformation which transforms primitives from a normalized device space to a screen space. Primitive setup can be used to determine a primitive's area, edge coefficients, and perform occlusion culling (e.g., backface culling), and 3-D clipping operations.
Rasterization converts primitives into pixels based on the XY coordinates of vertices within the primitives and the number of pixels included in the primitives. Hidden primitive and pixel rejection use the z coordinate of the primitives and/or pixels to determine and reject those primitives and pixels determined to be hidden (e.g., a primitive or pixel located behind another primitive or pixel in the image frame, a transparent primitive or pixel). Attribute setup determines attribute gradients, e.g., a difference between the attribute value at a first pixel and the attribute value at a second pixel within a primitive moving in either a horizontal (X) direction or a vertical (Y) direction, for attributes associated with pixels within a primitive. Attribute interpolation interpolates the attributes over the pixels within a primitive based on the determined attribute gradient values. Interpolated attribute values are sent to the fragment shader for pixel rendering. Results of the fragment shader can be output to a post-processing block and a frame buffer for presentation of the processed image on the display device.
Shaders, e.g., vertex and fragment shaders, are typically computer programs that compute and control the attributes of primitives, e.g., vertices or pixels, used in graphics or other multi-media systems. Shaders are typically written in a programming language, such as a high-level or low-level programming language, for example. A high-level programming language can be the C++ programming language, and the like. An assembly language is an example of a low-level language.
A shader compiler acts as a translator that translates shader program code written in a high-level or low-level language into a machine-level language. In a case that the shader is written in a high-level language, the translator translates the shader program code from the high-level language in which it is written into a low-level language and then translates the low-level shader program code into machine-level instructions. An instruction scheduler of the shader compiler reorders the machine instructions of the shader in an effort to speed up shader execution. In addition, the shader compiler addresses time constraints of the hardware by inserting dummy instructions, e.g., no operations or NOPs, to make the shader conform to the timing constraints of the hardware that executes the shader.
It would be beneficial to be able to optimize a shader's instructions while taking into account hardware constraints.