In three dimensional graphics, surfaces are typically rendered by assembling a plurality of polygons in a desired shape. The polygons (which are typically triangles) are defined by vertices, and each vertex is defined by three dimensional coordinates in world space, by color values, and by texture coordinates.
The surface determined by an assembly of polygons is typically intended to be viewed in perspective. To display the surface on a computer monitor, the three dimensional world space coordinates of the vertices are transformed into screen coordinates in which horizontal and vertical values (x, y) define screen position and a depth value z determines how near a vertex is to the screen and thus whether that vertex is viewed with respect to other points at the same screen coordinates. The color values define the brightness of each of red/green/blue (r, g, b) color at each vertex and thus the color (often called diffuse color) at each vertex. Texture coordinates (u, v) define texture map coordinates for each vertex on a particular texture map defined by values stored in memory.
The world space coordinates for the vertices of each polygon are processed to determine the two-dimensional coordinates at which those vertices are to appear on the two-dimensional screen space of an output display. If a triangle's vertices are known in screen space, the positions of all pixels of the triangle vary linearly along scan lines within the triangle in screen space and can thus be determined. Typically, a rasterizer uses (or a vertex processor and a rasterizer use) the three-dimensional world coordinates of the vertices of each polygon to determine the position of each pixel of each surface (“primitive” surface”) bounded by one of the polygons.
The color values of each pixel of a primitive surface (sometimes referred to herein as a “primitive”) vary linearly along lines through the primitive in world space. A rasterizer performs (or a rasterizer and a vertex processor perform) processes based on linear interpolation of pixel values in screen space, linear interpolation of depth and color values in world space, and perspective transformation between the two spaces to provide pixel coordinates and color values for each pixel of each primitive. The end result of this is that the rasterizer outputs a sequence red/green/blue color values (conventionally referred to as diffuse color values) for each pixel of each primitive.
One or more of the vertex processor, the rasterizer, and a texture processor compute texture coordinates for each pixel of each primitive. The texture coordinates of each pixel of a primitive vary linearly along lines through the primitive in world space. Thus, texture coordinates of a pixel at any position in the primitive can be determined in world space (from the texture coordinates of the vertices) by a process of perspective transformation, and the texture coordinates of each pixel to be displayed on the display screen can be determined. A texture processor can use the texture coordinates (of each pixel to be displayed on the display screen) to index into a corresponding texture map to determine texels (texture color values at the position defined by the texture coordinates for each pixel) to vary the diffuse color values for the pixel. Often the texture processor interpolates texels at a number of positions surrounding the texture coordinates of a pixel to determine a texture value for the pixel. The end result of this is that the texture processor generates data determining a textured version of each pixel (of each primitive) to be displayed on the display screen.
A texture map typically describes a pattern to be applied to a primitive to vary the color of each pixel of the primitive in accordance with the pattern. The texture coordinates of the vertices of the primitive fix the position of the vertices of a polygon on the texture map and thereby determine the texture detail applied to each of the other pixels of the primitive in accordance with the pattern.
FIG. 1 is a block diagram of a pipelined graphics processing system that can embody the present invention. Preferably, the FIG. 1 system is implemented as an integrated circuit (including other elements not shown in FIG. 1). Alternatively at least one portion (e.g., frame buffer 50) of the FIG. 1 system is implemented as a chip (or portion of a chip) and at least one other portion thereof (e.g., all elements of FIG. 1 other than frame buffer 50) is implemented as another chip (or portion of another chip). Vertex processor 101 of FIG. 1 generates vertex data indicative of the coordinates of the vertices of each primitive (typically a triangle) of each image to be rendered, and attributes (e.g., color values) of each vertex.
Rasterizer 201 generates pixel data in response to the vertex data from processor 101. The pixel data are indicative of the coordinates of a full set of pixels for each primitive, and attributes of each pixel (e.g., color values for each pixel and values that identify one or more textures to be blended with each set of color values). Rasterizer 201 generates packets that include the pixel data and asserts the packets to pixel shader 30. Each packet can but need not have the format to be described with reference to FIG. 3. Each packet includes the pixel data for one or more pixels and also all information that determines the state associated with each such pixel. The state information for a pixel includes a pointer to the next instruction to be executed by pixel shader 30 to accomplish the appropriate processing on the pixel, condition codes that can be used as predicates in subsequent instructions, and a set of arbitrary-use bit locations that can contain color values for pixels, iterated vertex data, texels (e.g., color data from a texture map), intermediate results from previous pixel shader instructions, or other data.
Typically, pixel shader 30 combines the pixel data in each packet received from rasterizer 201 with texture data determined by such packet. For example, a packet specifies one or more texture maps (for a pixel) and a set of texels of each texture map, and pixel shader 30 implements an algorithm to generate a texel average in response to the specified texels of each texture map (by retrieving the texels from memory 25 coupled to pixel shader 30 and computing an average of the texels of each texture map) and to generate textured pixel data by combining the pixel with each of the texel averages. In typical implementations, pixel shader 30 can perform various operations in addition to (or instead of) texturing each pixel, such as one or more of the well known operations of format conversion, input swizzle (e.g., duplicating and/or reordering an ordered set of components of a pixel), scaling and biasing, inversion (and/or one or more other logic operations), clamping, and output swizzle.
When processing each packet, pixel shader 30 updates elements of the packet (e.g., replaces color values with partially processed color values, or with fully processed color values indicative of blends of original color values and texels) but preserves the basic packet structure. Thus, when pixel shader 30 has completed all required processing operations on a packet, it has generated a modified version of the packet (an “updated” packet). In some implementations, pixel shader 30 asserts each updated packet to pixel processor 40, and pixel processor 40 performs additional processing on the updated packets while preserving the basic packet structure. Alternatively, pixel processor 40 performs the required additional processing on textured pixel data generated by pixel shader 30, but after the data have been extracted from the updated packets generated in shader 30 and without preserving packet structure. For example, an input stage of pixel processor 40 extracts textured pixel data from updated packets received from pixel shader 30, and asserts the extracted textured pixel data to other circuitry within processor 40 that performs the required processing thereon.
In variations on the system of FIG. 1, pixel processor 40 is omitted. In this case, pixel shader 30 is coupled directly to frame buffer 50, pixel shader 30 performs all required processing of the pixels generated by rasterizer 201 (by operating on packets containing the pixels to generate updated packets), and pixel shader 30 is configured to extract the fully processed pixels from the updated packets and assert the fully processed pixels to frame buffer 50.
Although pixel shader 30 is sometimes referred to herein as a “texture processor,” in typical implementations it can perform various operations in addition to (or instead of) texturing each pixel, such as one or more of the conventional operations of culling, frustum clipping, polymode operations, polygon offsetting, and fragmenting. Alternatively, texture processor 30 performs all required texturing operations and pixel processor 40 performs some or all required non-texturing operations for each pixel.
Graphics application program interfaces (APIs) have been instrumental in allowing applications to be written to a standard interface and to be run on multiple platforms, i.e. operating systems. Examples of such graphics APIs include Open Graphics Library (OpenGL®) and D3D™ transform and lighting pipelines. OpenGL® is the computer industry's standard graphics API for defining 2-D and 3-D graphic images. With OpenGL®, an application can create the same effects in any operating system using any OpenGL®-adhering graphics adapter. OpenGL® specifies a set of commands or immediately executed functions. Each command directs a drawing action or causes special effects.
Thus, in any computer system which supports the OpenGL® standard, the operating system(s) and application software programs can make calls according to the standard, without knowing exactly any specifics regarding the hardware configuration of the system. This is accomplished by providing a complete library of low-level graphics manipulation commands, which can be used to implement graphics operations.
A significant benefit is afforded by providing a predefined set of commands in graphics APIs such as OpenGL®. By restricting the allowable operations, such commands can be highly optimized in the driver and hardware implementing the graphics API. On the other hand, a major drawback of this approach is that changes to the graphics API are difficult and slow to be implemented. It may take years for a new feature to be broadly adopted across multiple vendors.
With the impending integration of transform operations into high speed graphics chips and the higher integration levels allowed by semiconductor manufacturing, it is now possible to make at least part of the geometry pipeline accessible to the application writer. There is thus a need to exploit this trend in order to afford increased flexibility in visual effects. In particular, there is a need for a graphics processor that is compatible with modified or new graphics APIs, and also with currently established graphics APIs.
Above-referenced U.S. application Ser. No. 09/960,630, filed Sep. 20, 2001, discloses a programmable pipelined graphics processor (a vertex processor in some embodiments), computer code for programming such a processor, and methods for programmable pipelined computer graphics processing (including pipelined vertex processing). In some embodiments, the processor executes branching instructions during operation. The full text of above-referenced U.S. application Ser. No. 09/960,630 is incorporated herein by reference.
The present inventors have recognized that it would be desirable to implement a programmable graphics processor with at least two (and preferably three) processing pipelines, such that the processor can process multiple threads of graphics data in interleaved fashion in each pipeline, and such that each data thread is processed in accordance with a different program that can include branch instructions. The inventors have also recognized that it would be desirable for such a processor to be operable in a parallel processing mode in which the same program is executed in parallel on different streams of data in multiple pipelines. However, until the present invention it was not known how to implement such a processor to handle branch instructions which could cause conflicts during parallel processing mode operation, since execution of a branch instruction might require that one or more (but not all) of the pipelines take a branch. Nor was it known until the present invention how to implement such a processor to efficiently handle instructions whose execution would cause conflicts between pipelines (in use of resources shared by the pipelines) during parallel processing mode operation.