1. Field of the Invention
This invention relates to computer systems, and more particularly to programmable shaders.
2. Description of the Related Art
Graphics processing is an important feature of modern high-performance computing systems. In graphic processing, mathematical procedures are implemented to render, or draw, large numbers of graphic primitives, e.g., triangles or rectangles, on a display to produce desired visual images. Real time graphics processing is based on the high-speed processing of data to form graphic primitives to produce visually pleasing moving images.
Early graphic systems were limited to displaying image objects comprised of graphic primitives having smooth surfaces. That is, visual textures, bumps, scratches, or other surface features were not modeled in the graphics primitives. To enhance image quality, texture mapping of real world attributes was introduced. In general, texture mapping is the mapping of an image onto a graphic primitive surface to create the appearance of a complex graphic primitive without the high computational costs associated with rendering actual three dimensional details.
Graphics processing is typically performed using application program interfaces (API's) that provide a standard software interface that can be run on multiple platforms, operating systems; and hardware. Examples of API's include the Open Graphics Library (OpenGL®) and D3D™. In general, such open application programs include a predetermined, standardized set of commands that are executed by associated hardware. For example, in a computer system that supports the OpenGL® standard, the operating system and application software programs can make calls according to that standard without knowing any of the specifics regarding the system hardware. Application writers can use APIs to design the visual aspects of their applications without concern as to how their commands will be implemented.
APIs are particularly beneficial when they are supported by dedicated hardware. In fact, high-speed processing of graphical images is often performed using special graphics processing units (GPUs) that are fabricated on semiconductor substrates. Beneficially, a GPU can be designed and used to rapidly and accurately process commands with little impact on other system resources.
FIG. 1 illustrates a simplified block diagram of a graphics system 100 that includes a graphics processing unit 102. As shown, that graphics processing unit 102 has a host interface/front end 104. The host interface/front end 104 receives raw information from a central processing unit 103 that is running an application program stored in memory 105. The host interface/front end 104 buffers input information and supplies that information to a geometry engine 106. The geometry engine has access to a frame buffer memory 120 via a frame buffer interface 116. The geometry engine 106 produces, scales, rotates, and projects three-dimensional vertices of graphics primitives in “model” coordinates that are stored in the frame buffer memory 120 into two-dimensional frame-buffer co-ordinates. Typically, triangles are used as graphics primitives for three-dimensional objects, but rectangles are often used for 2-dimensional objects (such as text displays).
The two-dimensional frame-buffer co-ordinates of the vertices of the graphics primitives from the geometry engine 106 are applied to a rasterizer 108. The rasterizer 108 identifies the positions of all of the pixels within the graphics primitives. This is typically performed along raster (horizontal) lines that extend between the lines that define the graphics primitives. The output of the rasterizer 108 is referred to as rasterized pixel data.
The rasterized pixel data are applied to a shader 110 that processes input data (code, position, texture, conditions, constants, etc) using a shader program (sequence of instructions) to generate output data. While shaders are described in relation to their use in graphics processing, shaders are, in general, useful for other functions. Shaders can be considered as a collection of processing capabilities that can process large amounts of data at high speed, such as by parallel handling of data.
The shader 110 includes a texture engine 112 that processes the rasterized pixel data to have the desired texture and optical features. The texture engine 112, which has access to the data stored in the frame buffer memory 120, can be implemented using a hardware pipeline that processes large amounts of data at very high speed. The shaded pixel data is then sent to a Raster Operations Processor 114 (Raster op in FIG. 1) that optionally performs additional processing on the shaded pixel data. The result is pixel data that is stored in the frame buffer memory 120 by the frame buffer interface 116. The frame pixel data can be used for various processes such as being shown on a display 122.
Hardwired shaders 110 are known. For example, shaders can include hardwired pixel processing pipelines that perform standard API functions, including such functions as scissor, Alpha test; zbuffer, stencil, blendfunction; logicop; dither; and writemask. Also known are programmable shaders 110, devices that can be programmed and that enable an application writer to control shader processing operations.
Programmable shaders enable great flexibility in the achievable visual effects and can reduce the time between a graphics function being made available and that function becoming standardized as part of a graphics API. Programmable shaders can have a standard API mode in which standard graphics API commands are directly implemented and a non-standard mode in which new graphics features can be programmed.
Programmable shaders usually having shader engines 112 with multiple shader processing stations, each of which can perform specified functions. FIG. 6 illustrates a prior art shader engine architecture 600. In that architecture, program information is applied via a bus 608 to multiple shader processing stations: a first computation unit 602, a texturizer 604 and a second computation unit 606. The first computation unit 602 can perform certain processing operations on pixel information applied via a bus 614. The computational results are then stored in a memory that is referred to herein as a shader register file 620. The computational results from the first computation unit 602 are recalled from the shader register file 620 by the texturizer 604, which performs further processing, and those results are then stored in the shader register file 620. Then, the second computation unit 606 recalls the results of the texturizer 604, performs other processing operations, and the results are stored back in the shader register file 620. This process enables program information to control the operations of the individual shader processing stations to produce a final result produced by multiple operations. The general scheme of FIG. 6 can be extended by adding more shader processing stations that can recall data from and store data in the shader register file 620.
While the shader engine architecture 600 is useful, it is not without problems. First, is relatively difficult to fabricate a shader register file 620 that can be accessed by multiple shader processing stations. The more there are the more difficult it becomes. Complicating that problem is that testing the operation of a shader engine that is in accord with the shader engine architecture 600, both to eliminate design flaws (including hardware and software, and specifically including compilers and other auxiliary and support services) and to verify the operation of devices after fabrication, is difficult. Furthermore, the actual layout of a shader engine that is in accord with the shader engine architecture 600 is simply difficult.
Therefore, a new shader engine architecture would be beneficial. Particularly beneficial would be a new shader engine architecture having a reduced number of shader stations that can access data in a shader register file. A new shader engine architecture having a shader register file whose operation is easier to test would be particularly beneficial. Methods of operating a shader engine having multiple shader processing stations that do not require storing of all intermediate results would be useful.