1. Field of the Invention
The present invention relates to computer systems, and more particularly to computer shading.
2. Description of the Related Art
Graphics processing is an important feature of modern high-performance computing systems. In graphic processing, mathematical procedures are implemented to render, or draw, graphic primitives, e.g., a triangle or a rectangle, on a display to produce desired visual images. Real time graphics processing is based on the high-speed processing of graphic primitives to produce visually pleasing moving images.
Early graphic systems were limited to displaying image objects comprised of graphic primitives having smooth surfaces. That is, visual textures, bumps, scratches, or other surface features were not modeled in the graphics primitives. To enhance image quality, texture mapping of real world attributes was introduced. In general, texture mapping is the mapping of an image onto a graphic primitive surface to create the appearance of a complex image without the high computational costs associated with rendering actual three dimensional details of an object.
Graphics processing is typically performed using application program interfaces (API's) that provide a standard software interface that can be run on multiple platforms, operating systems; and hardware. Examples of API's include the Open Graphics Library (OpenGL®) and D3D™. In general, such open application programs include a predetermined, standardized set of commands that are executed by associated hardware. For example, in a computer system that supports the OpenGL® standard, the operating system and any application software programs can make calls according to that standard without knowing any of the specifics regarding the system hardware. Application writers can use APIs to design the visual aspects of their applications without concern as to how their commands will be implemented.
APIs are particularly beneficial when they are supported by dedicated hardware. In fact, high-speed processing of graphical images is often performed using special graphics processing units (GPUs) that are fabricated on semiconductor substrates. Beneficially, a GPU can be designed and used to rapidly and accurately process commands with little impact on other system resources.
FIG. 1 illustrates a simplified block diagram of a graphics system 100 that includes a graphics processing unit 102. As shown, that graphics processing unit 102 has a host interface/front end 104. The host interface/front end 104 receives raw graphics data from a central processing unit 103 that is running an application program stored in memory 105. The host interface/front end 104 buffers input information and supplies that information to a geometry engine 106. The geometry engine has access to a frame buffer memory 120 via a frame buffer interface 116. The geometry engine 106 produces, scales, rotates, and projects three-dimensional vertices of graphics primitives in “model” coordinates that are stored in the frame buffer memory 120 into two-dimensional frame-buffer co-ordinates. Typically, triangles are used as graphics primitives for three-dimensional objects, but rectangles are often used for 2-dimensional objects (such as text displays).
The two-dimensional frame-buffer co-ordinates of the vertices of the graphics primitives from the geometry engine 106 are applied to a rasterizer 108. The rasterizer 108 identifies the positions of all of the pixels within the graphics primitives. This is typically performed along raster (horizontal) lines that extend between the lines that define the graphics primitives. The output of the rasterizer 108 is referred to as rasterized pixel data.
The rasterized pixel data are applied to a shader 110 that processes input data (code, position, texture, conditions, constants, etc) using a shader program (sequence of instructions) to generate output data. While shaders are described in relation to their applications in graphics processing, shaders are, in general, useful for other functions. Shaders can be considered as a collection of processing capabilities that can handle large amounts of data at the same time, such as by parallel handling of data.
The shader 110 includes a texture engine 112 that modifies the rasterized pixel data to have the desired texture and optical features. The texture engine 112, has access to the data stored in the frame buffer memory 120 via the frame buffer interface 116. The shaded pixel data is sent to a Raster Operations Processor 114 (Raster op in FIG. 1) that optionally performs additional processing on the shaded pixel data. The result is pixel data that is stored in the frame buffer memory 120 by the frame buffer interface 116. The frame pixel data can be used for various processes such as being displayed on a display 122.
Hardwired pipeline shaders 110 are known. For example, hardwired pixel pipelines have been used to perform standard API functions, including such functions as scissor, Alpha test; zbuffer, stencil, blendfunction; logicop; dither; and writemask. Also known are programmable shaders 110 that enable an application writer to control shader processing operations.
Programmable shaders enable flexibility in the achievable visual effects and can reduce the time between a graphics function being available and that function becoming standardized as part of a graphics API. Programmable shaders can have a standard API mode in which standard graphics API commands are implemented and a non-standard mode in which new graphics features can be programmed.
While shaders have proven themselves to be useful, demands for enhanced shader performance have exceeded the capabilities of existing shaders. While improving existing shaders could address some of the demands, such improvements would be difficult to implement. One nearly constant demand is faster performance. Graphical processing speed is often limited by just how fast the shader 110 can process pixels. Furthermore, additional future demands can be anticipated.
In the prior art, shader programming was performed by acquiring programming instructions from the frame buffer memory (or from some other main memory) each time the shader 110 was used. This involved accessing the frame buffer memory 120 (or some other main memory) possibly through texture 112 to acquire the programming instructions and then subsequently programming the shader stations of the shader engine 110 before each data run. Unfortunately, acquiring programming instructions involves a significant time delay. A request to obtain the programming instructions had to be formed, moved through the system, applied to the frame buffer memory 120, the programming instructions had to be obtained, moved back through the system to the texture engine, formatted into programming instructions, and then, finally used to program the various shader stations.
Therefore, a new type of programmable shader would be beneficial. Even more beneficial would be a new type of programmable shader having faster performance. Avoiding the need to acquire programming instructions from a main memory before each data processing run would be particularly helpful. Look-ahead programming instruction acquisition would also be beneficial.