The present invention relates to the field of computer graphics. Many computer graphic images are created by mathematically modeling the interaction of light with a three dimensional scene from a given viewpoint. This process, called rendering, generates a two-dimensional image of the scene from the given viewpoint, and is analogous to taking a photograph of a real-world scene.
As the demand for computer graphics, and in particular for real-time computer graphics, has increased, computer systems with graphics processing subsystems adapted to accelerate the rendering process have become widespread. In these computer systems, the rendering process is divided between a computer's general purpose central processing unit (CPU) and the graphics processing subsystem. Typically, the CPU performs high level operations, such as determining the position, motion, and collision of objects in a given scene. From these high level operations, the CPU generates a set of rendering commands and data defining the desired rendered image or images. For example, rendering commands and data can define scene geometry, lighting, shading, texturing, motion, and/or camera parameters for a scene. The graphics processing subsystem creates one or more rendered images from the set of rendering commands and data.
Many graphics processing subsystems are highly programmable, enabling implementation of, among other things, complicated lighting and shading algorithms. In order to exploit this programmability, applications can include one or more graphics processing subsystem programs, which are executed by the graphics processing subsystem in parallel with a main program executed by the CPU. Although not confined to merely implementing shading and lighting algorithms, these graphics processing subsystem programs are often referred to as shading programs or shaders.
Graphics processing subsystems typically use a stream-processing model, in which input elements are read and operated on by successively by a chain of stream processing units. The output of one stream processing unit is the input to the next stream processing unit in the chain. Typically, data flows only one way, “downstream,” through the chain of stream processing units. Examples of stream processing units include vertex processors, which  process two- or three-dimensional vertices, rasterizer processors, which process geometric primitives defined by sets of two- or three-dimensional vertices into sets of pixels or sub-pixels, referred to as fragments, and fragment processors, which process fragments. Additional types of stream processing units can be included in a chain. For example, a tessellation processor can receive descriptions of higher order surfaces and produce sets of geometric primitives defined by vertices and approximating or corresponding to the higher order surfaces.
Some or all of the stream processing units in a chain may be programmable, with each programmable stream processing unit having its own separate shading program operating in parallel with shading programs executing on other stream processing units. Implementations of complicated algorithms often depend on separate shading programs tailored to each stream processing unit working together to achieve the desired result. In these implementations, outputs of shading programs for initial stream processing units in a chain may be linked with the inputs of shading programs for subsequent stream processing units in the chain. Shading programs can be written in a variety of low-level and high-level programming languages, including low-level assembly, the Cg language, the OpenGL shading language, and the DirectX High Level shading language.
It is desirable to optimize shading programs to improve rendering performance and to allow applications to fully exploit the capabilities of the graphics processing subsystem. When shading programs for different stream processing units are chained together, (which may be referred to as linking in some graphics API nomenclatures,) there may be opportunities for optimization based upon the combination of the two or more shading programs, referred to as inter-shader optimizations.
For example, a first shading program may output a value that is unused as an input by a second chained shading program in the graphics processing stream. In this example, the portions of the first shading program used to compute the unused output may be safely omitted, thereby decreasing the execution time of the first shading program. In another example, if the output of a first shading program is constant, then the value of the constant can be propagated to the input of a second shading program chained to the first shading program, decreasing the execution time of the first shading program and potentially allowing for additional optimizations within the second shading program. 
Additionally, application developers prefer to write large, all-purpose shading programs for each stream processing unit. Each all purpose shading program allows an application to select one or more operations from a set to be executed as needed by the stream processing unit. An application can implement a specific algorithm across several stream processing units by selecting the appropriate operations from each stream processing unit's shading program. Using large, all-purpose shading program with selectable operations for each stream processing unit, rather than a number of different small shading programs each implementing a single operation, greatly simplifies application development. Unfortunately, executing large, all-purpose shading programs is slow due to a number of factors, including the time and bandwidth needed to transfer large shading programs to the graphics processing subsystem, even when only a small portion of the shading program is going to be executed. Optimizing the chaining of shading programs can also simplify such large programs to the point that they fit within hardware resource limits that the original programs might not satisfy if not optimized.
Prior automatic optimization techniques only consider each shading program in isolation. Existing optimization techniques for hardware shading compilers do not analyze the relationships between chained shading programs assigned to different stream processing units to determine inter-shader optimizations. Additionally, these prior optimization do not take into account the one way data flow in a chain of stream processing units and therefore miss many other potential optimizations.
It is therefore desirable to optimize two or more shading programs chained together based upon the relationships between the chained shading programs. It is further desirable to optimize large, all-purpose shading programs to execute efficiently and without transferring large amounts of data to the graphics processing subsystem unnecessarily. It is still further desirable to be able to perform inter-shader optimizations at runtime, allowing applications to dynamically select combinations of shading programs without compromising performance.