In three dimensional graphics, surfaces are typically rendered by assembling a plurality of polygons in a desired shape. The polygons (which are typically triangles) are defined by vertices, and each vertex is defined by three dimensional coordinates in world space, by color values, and by texture coordinates and other attributes.
The surface determined by an assembly of polygons is typically intended to be viewed in perspective. To display the surface on a computer monitor, the three dimensional world space coordinates of the vertices are transformed into screen coordinates in which horizontal and vertical values (x, y) define screen position and a depth value z determines how near a vertex is to the screen and thus whether that vertex is viewed with respect to other points at the same screen coordinates. The color values define the brightness of each of red/green/blue (r, g, b) color at each vertex and thus the color (often called diffuse color) at each vertex. Texture coordinates (u, v) define texture map coordinates for each vertex on a particular texture map defined by values stored in memory.
The world space coordinates for the vertices of each polygon are processed to determine the two-dimensional coordinates at which those vertices are to appear on the two-dimensional screen space of an output display. If a triangle's vertices are known in screen space, the positions of all pixels of the triangle vary linearly along scan lines within the triangle in screen space and can thus be determined.
Typically, a rasterizer uses (or a vertex processor and a rasterizer use) the three-dimensional world coordinates of the vertices of each polygon to determine the position of each pixel of each surface (“primitive” surface”) bounded by one of the polygons.
The color values of each pixel of a primitive surface (sometimes referred to herein as a “primitive”) vary linearly along lines through the primitive in world space. A rasterizer performs (or a rasterizer and a vertex processor perform) processes based on linear interpolation of pixel values in screen space, linear interpolation of depth and color values in world space, and perspective transformation between the two spaces to provide pixel coordinates and color values for each pixel of each primitive. The end result of this is that the rasterizer outputs a sequence red/green/blue color values (conventionally referred to as diffuse color values) for each pixel of each primitive.
One or more of the vertex processor, the rasterizer, and a texture processor compute texture coordinates for each pixel of each primitive. The texture coordinates of each pixel of a primitive vary linearly along lines through the primitive in world space. Thus, texture coordinates of a pixel at any position in the primitive can be determined in world space (from the texture coordinates of the vertices) by a process of perspective transformation, and the texture coordinates of each pixel to be displayed on the display screen can be determined. A texture processor can use the texture coordinates (of each pixel to be displayed on the display screen) to index into a corresponding texture map to determine texels (texture color values at the position defined by the texture coordinates for each pixel) to vary the diffuse color values for the pixel. Often the texture processor interpolates texels at a number of positions surrounding the texture coordinates of a pixel to determine a texture value for the pixel. The end result of this is that the texture processor generates data determining a textured version of each pixel (of each primitive) to be displayed on the display screen.
A texture map typically describes a pattern to be applied to a primitive to vary the color of each pixel of the primitive in accordance with the pattern. The texture coordinates of the vertices of the primitive fix the position of the vertices of a polygon on the texture map and thereby determine the texture detail applied to each of the other pixels of the primitive in accordance with the pattern.
FIG. 1 is a block diagram of a pipelined graphics processing system that can embody the present invention. Preferably, the FIG. 1 system is implemented as an integrated circuit (including other elements not shown in FIG. 1). Alternatively at least one portion (e.g., frame buffer 50) of the FIG. 1 system is implemented as a chip (or portion of a chip) and at least one other portion thereof (e.g., all elements of FIG. 1 other than frame buffer 50) is implemented as another chip (or portion of another chip). Vertex processor 10 of FIG. 1 generates vertex data indicative of the coordinates of the vertices of each primitive (typically a triangle) of each image to be rendered, and attributes (e.g., color values) of each vertex.
Rasterizer 20 generates pixel data in response to the vertex data from processor 10. The pixel data are indicative of the coordinates of pixels for each primitive, and attributes of each pixel (e.g., color values for each pixel and values that identify one or more textures to be blended with each set of color values). Rasterizer 20 asserts the pixel data to pixel shader 30.
Typically, pixel shader 30 combines the pixel data received from rasterizer 20 with texture data and may execute shader programs. For example, one or more texture maps (and a set of texels of each texture map), or no texture maps, are specified for each pixel, and pixel shader 30 implements an algorithm to generate a texel average in response to the specified texels of each texture map (by retrieving the texels from memory 25 coupled to pixel shader 30 and computing an average of the texels of each texture map) and to generate textured pixel data by combining the pixel with each of the texel averages. In typical implementations, pixel shader 30 can perform various operations in addition to (or instead of) texturing each pixel, such as one or more of the well known operations of format conversion, input swizzle (e.g., duplicating and/or reordering an ordered set of components of a pixel), scaling and biasing, inversion (and/or one or more other logic operations), clamping, and output swizzle.
When pixel shader 30 has completed all required processing operations on a quantity of pixel data, it asserts the updated (e.g., textured and/or programmably shaded) pixel data to pixel processor 40, and pixel processor 40 performs additional processing on the updated data. In variations on the system of FIG. 1, pixel processor 40 is omitted. In this case, pixel shader 30 is coupled directly to frame buffer 50, pixel shader 30 performs all required processing of the pixels generated by rasterizer 20, and pixel shader 30 is configured to assert the fully processed pixels to frame buffer 50. Pixel processor 40 and/or pixel shader 30 typically include the OpenGL® “fragment operations.”
Although pixel shader 30 is sometimes referred to herein as a “texture processor,” in typical implementations it can perform various operations in addition to (or instead of) texturing each pixel, such as one or more of the conventional operations of culling, frustum clipping, polymode operations, polygon offsetting, and fragmenting. Alternatively, texture processor 30 performs all required texturing operations and pixel processor 40 performs some or all required non-texturing operations for each pixel.
In typical implementations of pipelined (and other) graphics processors, there is a need to perform reciprocal and reciprocal square root operations (as well as other mathematical operations) on data values. Such operations are commonly performed in vertex processing, pixel shading, and pixel processing units of graphics processors.
Reciprocal and reciprocal square root functions having typically been implemented in hardware using variations of the conventional technique known as Newton-Rapheson iteration. However, the inventors have recognized that generation of the reciprocal (or the reciprocal of the square root) of an input value in pipelined fashion using Newton-Rapheson iteration would require pipelined processing circuitry having an undesirably large number of pipeline stages and an undesirably large footprint (in the case that the processing circuitry is implemented as an integrated circuit or portion of an integrated circuit).