1. Field of the Invention
The present invention relates to computer systems, and more particularly to memory devices used to buffer fragment processor (referred to herein as a shader processor) outputs.
2. Description of the Related Art
Graphics processing is an important feature of modern high-performance computing systems. In graphic processing, mathematical procedures are implemented to render, or draw, graphic primitives, e.g., a triangle or a rectangle, on a display to produce desired visual images. Real time graphics processing is based on the high-speed processing of graphic primitives to produce visually pleasing moving images.
Early graphic systems were limited to displaying image objects comprised of graphic primitives having smooth surfaces. That is, visual textures, bumps, scratches, or other surface features were not modeled in the graphics primitives. To enhance image quality, texture mapping of real world attributes was introduced. In general, texture mapping is the mapping of an image onto a graphic primitive surface to create the appearance of a complex image without the high computational costs associated with rendering actual three dimensional details of an object.
Graphics processing is typically performed using application program interfaces (API's) that provide a standard software interface that can be run on multiple platforms, operating systems; and hardware. Examples of API's include the Open Graphics Library (OpenGL®) and D3DTM. In general, such open application programs include a predetermined, standardized set of commands that are executed by associated hardware. For example, in a computer system that supports the OpenGL® standard, the operating system and application software programs can make calls according to that standard without knowing any of the specifics regarding the system hardware. Application writers can use APIs to design the visual aspects of their applications without concern as to how their commands will be implemented.
APIs are particularly beneficial when they are supported by dedicated hardware. In fact, high-speed processing of graphical images is often performed using special graphics processing units (GPUs) that are fabricated on semiconductor substrates. Beneficially, a GPU can be designed and used to rapidly and accurately process commands with little impact on other system resources.
FIG. 1 illustrates a simplified block diagram of a graphics system 100 that includes a graphics processing unit 102. As shown, that graphics processing unit 102 has a host interface/front end 104. The host interface/front end 104 receives raw graphics data from a central processing unit 103 that is running an application program stored in memory 105. The host interface/front end 104 buffers input information and supplies that information to a geometry engine 106. The geometry engine has access to a frame buffer memory 120 via a frame buffer interface 116. The geometry engine 106 produces, scales, rotates, and projects three-dimensional vertices of graphics primitives in “model” coordinates that are stored in the frame buffer memory 120 into two-dimensional frame-buffer co-ordinates. Typically, triangles are used as graphics primitives for three-dimensional objects, but rectangles are often used for 2-dimensional objects (such as text displays).
The two-dimensional frame-buffer co-ordinates of the vertices of the graphics primitives from the geometry engine 106 are applied to a rasterizer 108. The rasterizer 108 identifies the positions of all of the pixels within the graphics primitives. This is typically performed along raster (horizontal) lines that extend between the lines that define the graphics primitives. The output of the rasterizer 108 is referred to as rasterized pixel data.
The rasterized pixel data are applied to a shader processor 110 that processes input data (code, position, texture, conditions, constants, etc) using a shader program (sequence of instructions) to generate output data. While shader processors are described in relation to graphics processing, shader processors are, in general, useful for many other functions. Shader processors can be considered as a collection of processing capabilities that can handle large amounts of data at the same time, such as by parallel handling of data.
The shader processor 110 includes a texture engine 112 that modifies the rasterized pixel data to have the desired texture and optical features. The texture engine 112, which has access to the data stored in the frame buffer memory 120, can be implemented using a hardware pipeline that processes large amounts of data at very high speed. The shaded pixel data is then sent to a Raster Operations Processor 114 (Raster op in FIG. 1) that optionally performs additional processing on the shaded pixel data. The result is pixel data that is stored in the frame buffer memory 120 by the frame buffer interface 116. The frame pixel data can be used for various processes such as being displayed on a display 122.
As shown in FIG. 1, the Shader processor 110 and the Raster Operations Processor 114 are sequential modules. If the Raster Operations Processor 114 stalls, as it may do during normal operation for any number of reasons, the Shader processor will also stall, with obvious system-level performance implications, unless a buffer, say a First-in First-out register, is inserted between the output of the Shader processor 110 and the input of the Raster Operations Processor 114. This is shown in the system 500 of FIG. 5. That system includes a FIFO buffer 502 that is inserted between an optionally multi-ported shader processor register file 504 and a Raster Operations Processor 114. The system 500 includes multiple computation units 506 that communicate through the shader register file 504. Programming instructions are applied via a bus 508, and data is applied to one of the computational units 506 on a bus 510. When the shader processor (elements 504-510) processes data, intermediate results are stored in the shader register file 504. When shader processor operations are completed, data is clocked out of the shader register file 504 into the FIFO buffer 502. When the Raster Operations Processor 114 performs its operations, it can drain data from the FIFO buffer 502. This architecture enables delays between Shader processor and Raster Operations Processor 114 operations without slowdowns or conflicts.
Unfortunately, adding a FIFO buffer 502 to the output of the Shader processor 110 takes up valuable real estate on the substrate on which the Shader processor 110 and/or the Raster Operations Processor 114 is fabricated. That decreases the overall yield of the finished product, driving up costs, increasing heat build-up, and decreasing reliability. Additionally, adding a FIFO buffer 502 increases the already complex design of graphics processors.
Another problem with having a shader processor 110 feed data directly to a Raster Operations Processor 114 is that a shader processor 110 typically processes data in floating point, e.g., 32 bit or 16 bit, format while a Raster Operations Processor 114 typically processes fixed point formatted data. Thus, conversion of the floating point Shader processor 110 output to fixed point values is typically required. This can seriously complicate feeding data into the Raster Operations Processor 114, particularly when a FIFO buffer 502 is inserted between the Shader processor 110 and the Raster Operations Processor 114.
Therefore, a shader design that avoids the necessity of adding a special buffer between the Shader processor and subsequent modules, such as a Raster Operations Processor 114, would be beneficial. Even more beneficial would be a new, high performance programmable shader architecture that enables data buffering within a shader processor. Also beneficial would be a programmable shader architecture that enables data buffering within the shader processor and automatic data format conversion.