This invention relates to computing systems generally, to three-dimensional computer graphics, more particularly, and most particularly to a structure and method for performing tangent space lighting in a three-dimensional graphics processor implementing deferred shading features.
Computer graphics is the art and science of generating pictures with a computer. Generation of pictures, or images, is commonly called rendering. Generally, in three-dimensional (3D) computer graphics, geometry that represents surfaces (or volumes) of objects in a scene is translated into pixels stored in a frame buffer, and then displayed on a display device. Real-time display devices, such as CRTs used as computer monitors, refresh the display by continuously displaying the image over and over. This refresh usually occurs row-by-row, where each row is called a raster line or scan line. In this document, raster lines are numbered from bottom to top, but are displayed in order from top to bottom.
In a 3D animation, a sequence of images is displayed, giving the illusion of motion in three-dimensional space. Interactive 3D computer graphics allows a user to change his viewpoint or change the geometry in real-time, thereby requiring the rendering system to create new images on-the-fly in real-time.
In 3D computer graphics, each renderable object generally has its own local object coordinate system, and therefore needs to be translated (or transformed) from object coordinates to pixel display coordinates. Conceptually, this is a 4-step process: 1) translation (including scaling for size enlargement or shrink) from object coordinates to world coordinates, which is the coordinate system for the entire scene; 2) translation from world coordinates to eye coordinates, based on the viewing point of the scene; 3) translation from eye coordinates to perspective translated eye coordinates, where perspective scaling (farther objects appear smaller) has been performed; and 4) translation from perspective translated eye coordinates to pixel coordinates, also called screen coordinates. Screen coordinates are points in three-dimensional space, and can be in either screen-precision (i.e., pixels) or object-precision (high precision numbers, usually floating-point), as described later. These translation steps can be compressed into one or two steps by precomputing appropriate translation matrices before any translation occurs. Once the geometry is in screen coordinates, it is broken into a set of pixel color values (that is xe2x80x9crasterizedxe2x80x9d) that are stored into the frame buffer. Many techniques are used for generating pixel color values, including Gouraud shading, Phong shading, and texture mapping.
A summary of the prior art rendering process can be found in: xe2x80x9cFundamentals of Three-dimensional Computer Graphicsxe2x80x9d, by Watt, Chapter 5: The Rendering Process, pages 97 to 113, published by Addison-Wesley Publishing Company, Reading, Mass., 1989, reprinted 1991, ISBN 0-201-15442-0 (hereinafter referred to as the Watt Reference).
FIG. 1 shows a three-dimensional object, a tetrahedron, with its own coordinate axes (xobj,yobj,zobj). The three-dimensional object is translated, scaled, and placed in the viewing point""s coordinate system based on (xeye,yeye,zeye). The object is projected onto the viewing plane, thereby correcting for perspective. At this point, the object appears to have become two-dimensional; however, the object""s z-coordinates are preserved so they can be used later by hidden surface removal techniques. The object is finally translated to screen coordinates, based on (xscreen,yscreen,zscreen), where zscreen is going perpendicularly into the page. Points on the object now have their x and y coordinates described by pixel location (and fractions thereof) within the display screen and their z coordinates in a scaled version of distance from the viewing point.
Because many different portions of geometry can affect the same pixel, the geometry representing the surfaces closest to the scene viewing point must be determined. Thus, for each pixel, the visible surfaces within the volume subtended by the pixel""s area determine the pixel color value, while hidden surfaces are prevented from affecting the pixel. Non-opaque surfaces closer to the viewing point than the closest opaque surface (or surfaces, if an edge of geometry crosses the pixel area) affect the pixel color value, while all other non-opaque surfaces are discarded. In this document, the term xe2x80x9coccludedxe2x80x9d is used to describe geometry which is hidden by other non-opaque geometry.
Many techniques have been developed to perform visible surface determination, and a survey of these techniques are incorporated herein by reference to: xe2x80x9cComputer Graphics: Principles and Practicexe2x80x9d, by Foley, van Dam, Feiner, and Hughes, Chapter 15: Visible-Surface Determination, pages 649 to 720, 2nd edition published by Addison-Wesley Publishing Company, Reading, Mass., 1990, reprinted with corrections 1991, ISBN0-201-12110-7 (hereinafter referred to as the Foley Reference). In the Foley Reference, on page 650, the terms xe2x80x9cimage-precisionxe2x80x9d and xe2x80x9cobject-precisionxe2x80x9d are defined: xe2x80x9cImage-precision algorithms are typically performed at the resolution of the display device, and determine the visibility at each pixel. Object-precision algorithms are performed at the precision with which each object is defined, and determine the visibility of each object.xe2x80x9d
As a rendering process proceeds, most prior art renderers must compute the color value of a given screen pixel multiple times because multiple surfaces intersect the volume subtended by the pixel. The average number of times a pixel needs to be rendered, for a particular scene, is called the depth complexity of the scene. Simple scenes have a depth complexity near unity, while complex scenes can have a depth complexity of ten or twenty. As scene models become more and more complicated, renderers will be required to process scenes of ever increasing depth complexity. Thus, for most renders, the depth complexity of a scene is a measure of the wasted processing. For example, for a scene with a depth complexity of ten, 90% of the computation is wasted on hidden pixels. This wasted computation is typical of hardware renderers that use the simple Z-buffer technique (discussed later herein), generally chosen because it is easily built in hardware. Methods more complicated than the Z Buffer technique have heretofore generally been too complex to build in a cost-effective manner. An important feature of the method and apparatus invention presented here is the avoidance of this wasted computation by eliminating hidden portions of geometry before they are rasterized, while still being simple enough to build in cost-effective hardware.
When a point on a surface (frequently a polygon vertex) is translated to screen coordinates, the point has three coordinates: 1) the x-coordinate in pixel units (generally including a fraction); 2) the y-coordinate in pixel units (generally including a fraction); and 3) the z-coordinate of the point in either eye coordinates, distance from the virtual screen, or some other coordinate system which preserves the relative distance of surfaces from the viewing point. In this document, positive z-coordinate values are used for the xe2x80x9clook directionxe2x80x9d from the viewing point, and smaller values indicate a position closer to the viewing point.
When a surface is approximated by a set of planar polygons, the vertices of each polygon are translated to screen coordinates. For points in or on the polygon (other than the vertices), the screen coordinates are interpolated from the coordinates of vertices, typically by the processes of edge walking and span interpolation. Thus, a z-coordinate value is generally included in each pixel value (along with the color value) as geometry is rendered.
Many hardware renderers have been developed, and an example is incorporated herein by reference: xe2x80x9cLeo: A System for Cost Effective 3D Shaded Graphicsxe2x80x9d, by Deering and Nelson, pages 101 to 108 of SIGGRAPH93 Proceedings, Aug. 1-6, 1993, Computer Graphics Proceedings, Annual Conference Series, published by ACM SIGGRAPH, New York, 1993, Softcover ISBN 0-201-58889-7 and CD-ROM ISBN 0-201-56997-3 (hereinafter referred to as the Deering Reference). The Deering Reference includes a diagram of a generic 3D graphics pipeline (i.e., a renderer, or a rendering system) that it describes as xe2x80x9ctruly generic, as at the top level nearly every commercial 3D graphics accelerator fits this abstractionxe2x80x9d, and this pipeline diagram is reproduced here as FIG. 2. Such pipeline diagrams convey the process of rendering, but do not describe any particular hardware. This document presents a new graphics pipeline that shares some of the steps of the generic 3D graphics pipeline. Each of the steps in the generic 3D graphics pipeline will be briefly explained here, and are also shown in the method flow diagram of FIG. 3 Processing of polygons is assumed throughout this document, but other methods for describing 3D geometry could be substituted. For simplicity of explanation, triangles are used as the type of polygon in the described methods.
As seen in FIG. 2, the first step within the floating-point intensive functions of the generic 3D graphics pipeline after the data input (Step 212) is the transformation step (Step 214), which was described above. The transformation step is also shown in FIG. 3 as the first step in the outer loop of the method flow diagram, and also includes xe2x80x9cget next polygonxe2x80x9d. The second step, the clip test, checks the polygon to see if it is at least partially contained in the view volume (sometimes shaped as a frustum) (Step 216). If the polygon is not in the view volume, it is discarded; otherwise processing continues. The third step is face determination, where polygons facing away from the viewing point are discarded (Step 218). Generally, face determination is applied only to objects that are closed volumes. The fourth step, lighting computation, generally includes the set up for Gouraud shading and/or texture mapping with multiple light sources of various types, but could also be set up for Phong shading or one of many other choices (Step 222). The fifth step, clipping, deletes any portion of the polygon that is outside of the view volume because that portion would not project within the rectangular area of the viewing plane (Step 224). Generally, polygon clipping is done by splitting the polygon into two smaller polygons that both project within the area of the viewing plane. Polygon clipping is computationally expensive. The sixth step, perspective divide, does perspective correction for the projection of objects onto the viewing plane (Step 226). At this point, the points representing vertices of polygons are converted to pixel space coordinates by step seven, the screen space conversion step (Step 228). The eighth step (Step 230), set up for incremental render, computes the various begin, end, and increment values needed for edge walking and span interpolation (e.g.: x, y, and z-coordinates; RGB color; texture map space u and v-coordinates; and the like).
Within the drawing intensive functions, edge walking (Step 232) incrementally generates horizontal spans for each raster line of the display device by incrementing values from the previously generated span (in the same polygon), thereby xe2x80x9cwalkingxe2x80x9d vertically along opposite edges of the polygon. Similarly, span interpolation (Step 234) xe2x80x9cwalksxe2x80x9d horizontally along a span to generate pixel values, including a z-coordinate value indicating the pixel""s distance from the viewing point. Finally, the z-buffered blending also referred to as Testing and Blending (Step 236) generates a final pixel color value. The pixel values also include color values, which can be generated by simple Gouraud shading (i.e., interpolation of vertex color values) or by more computationally expensive techniques such as texture mapping (possibly using multiple texture maps blended together), Phong shading (i.e., per-fragment lighting), and/or bump mapping (perturbing the interpolated surface normal). After drawing intensive functions are completed, a double-buffered MUX output look-up table operation is performed (Step 238). In this figure the blocks with rounded corners typically represent functions or process operations, while sharp cornered rectangles typically represent stored data or memory.
By comparing the generated z-coordinate value to the corresponding value stored in the Z Buffer, the z-buffered blend either keeps the new pixel values (if it is closer to the viewing point than previously stored value for that pixel location) by writing it into the frame buffer, or discards the new pixel values (if it is farther). At this step, antialiasing methods can blend the new pixel color with the old pixel color. The z-buffered blend generally includes most of the per-fragment operations, described below.
The generic 3D graphics pipeline includes a double buffered frame buffer, so a double buffered MUX is also included. An output lookup table is included for translating color map values. Finally, digital to analog conversion makes an analog signal for input to the display device.
A major drawback to the generic 3D graphics pipeline is its drawing intensive functions are not deterministic at the pixel level given a fixed number of polygons. That is, given a fixed number of polygons, more pixel-level computation is required as the average polygon size increases. However, the floating-point intensive functions are proportional to the number of polygons, and independent of the average polygon size. Therefore, it is difficult to balance the amount of computational power between the floating-point intensive functions and the drawing intensive functions because this balance depends on the average polygon size.
Prior art Z Buffers are based on conventional Random Access Memory (RAM or DRAM), Video RAM (VRAM), or special purpose DRAMs. One example of a special purpose DRAM is presented in xe2x80x9cFBRAM: A new Form of Memory Optimized for 3D Graphicsxe2x80x9d, by Deering, Schlapp, and Lavelle, pages 167 to 174 of SIGGRAPH94 Proceedings, Jul. 24-29, 1994, Computer Graphics Proceedings, Annual Conference Series, published by ACM SIGGRAPH, New York, 1994, Softcover ISBN 0201607956.
OpenGL is a software interface to graphics hardware which consists of several hundred functions and procedures that allow a programmer to specify objects and operations to produce graphical images. The objects and operations include appropriate characteristics to produce color images of three-dimensional objects. Most of OpenGL (Version 1.2) assumes or requires a that the graphics hardware include a frame buffer even though the object may be a point, line, polygon, or bitmap, and the operation may be an operation on that object. The general features of OpenGL Oust one example of a graphical interface) are described in the reference xe2x80x9cThe OpenGL(copyright) Graphics System: A Specification (Version 1.2) edited by Mark Segal and Kurt Akeley, Version 1.2, March 1998; and hereby incorporated by reference. Although reference is made to OpenGL, the invention is not limited to structures, procedures, or methods which are compatible or consistent with OpenGL, or with any other standard or non-standard graphical interface. Desirably, the inventive structure and method may be implemented in a manner that is consistent with the OpenGL, or other standard graphical interface, so that a data set prepared for one of the standard interfaces may be processed by the inventive structure and method without modification. However, the inventive structure and method provides some features not provided by OpenGL, and even when such generic input/output is provided, the implementation is provided in a different manner.
The phrase xe2x80x9cpipeline statexe2x80x9d does not have a single definition in the prior-art. The OpenGL specification, for example, sets forth the type and amount of the graphics rendering machine or pipeline state in terms of items of state and the number of bits and bytes required to store that state information. In the OpenGL definition, pipeline state tends to include object vertex pertinent information including for example, the vertices themselves the vertex normals, and color as well as xe2x80x9cnon-vertexxe2x80x9d information.
When information is sent into a graphics renderer, at least some object geometry information is provided to describe the scene. Typically, the object or objects are specified in terms of vertex information, where an object is modeled, defined, or otherwise specified by points, lines, or polygons (object primitives) made up of one or more vertices. In simple terms, a vertex is a location in space and may be specified for example by a three-space (x,y,z) coordinate relative to some reference origin. Associated with each vertex is other information, such as a surface normal, color, texture, transparency, and the like information pertaining to the characteristics of the vertex. This information is essentially xe2x80x9cper-vertexxe2x80x9d information. Unfortunately, forcing a one-to-one relationship between incoming information and vertices as a requirement for per-vertex information is unnecessarily restrictive. For example, a color value may be specified in the data stream for a particular vertex and then not respecified in the data stream until the color changes for a subsequent vertex. The color value may still be characterized as per-vertex data even though a color value is not explicitly included in the incoming data stream for each vertex.
Texture mapping presents an interesting example of information or data which could be considered as either per-vertex information or pipeline state information. For each object, one or more texture maps may be specified, each texture map being identified in some manner, such as with a texture coordinate or coordinates. One may consider the texture map to which one is pointing with the texture coordinate as part of the pipeline state while others might argue that it is per-vertex information.
Other information, not related on a one-to-one basis to the geometry object primitives, used by the renderer such as lighting location and intensity, material settings, reflective properties, and other overall rules on which the renderer is operating may more accurately be referred to as pipeline state. One may consider that everything that does not or may not change on a per-vertex basis is pipeline state, but for the reasons described, this is not an entirely unambiguous definition. For example, one may define a particular depth test (See later description) to be applied to certain objects to be rendered, for example the depth test may require that the z-value be strictly xe2x80x9cgreater-thanxe2x80x9d for some objects and xe2x80x9cgreater-than-or-equal-toxe2x80x9d for other objects. These particular depth tests which change from time to time, may be considered to be pipeline state at that time.
Parameters considered to be renderer (pipeline) state in OpenGL are identified in Section 6.2 of the afore referenced OpenGL Specification (Version 1.2, at pages 193-217).
Essentially then, there are two types of data or information used by the renderer: (1) primitive data which may be thought of as per-vertex data, and (ii) pipeline state data (or simply pipeline state) which is everything else. This distinction should be thought of as a guideline rather than as a specific rule, as there are ways of implementing a graphics renderer treating certain information items as either pipeline state or non-pipeline state.
In the generic 3D graphics pipeline, the xe2x80x9cz-buffered blendxe2x80x9d step actually incorporates many smaller xe2x80x9cper-fragmentxe2x80x9d operational steps.
Application Program Interfaces (APIs), such as OpenGL (Open Graphics Library) and D3D, define a set of per-fragment operations (See Chapter 4 of Version 1.2 OpenGL Specification). We briefly review some exemplary OpenGL per-fragment operations so that any generic similarities and differences between the inventive structure and method and conventional structures and procedures can be more readily appreciated.
Under OpenGL, a frame buffer stores a set of pixels as a two-dimensional array. Each picture-element or pixel stored in the frame buffer is simply a set of some number of bits. The number of bits per pixel may vary depending on the particular GL implementation or context.
Corresponding bits from each pixel in the framebuffer are grouped together into a bitplane; each bitplane containing a single bit from each pixel. The bitplanes are grouped into several logical buffers referred to as the color, depth, stencil, and accumulation buffers. The color buffer in turn includes what is referred to under OpenGl as the front left buffer, the front right buffer, the back left buffer, the back right buffer, and some additional auxiliary buffers. The values stored in the front buffers are the values typically displayed on a display monitor while the contents of the back buffers and auxiliary buffers are invisible and not displayed. Stereoscopic contexts display both the front left and the front right buffers, while monoscopic contexts display only the front left buffer. In general, the color buffers must have the same number of bitplanes, but particular implementations of context may not provide right buffers, back buffers, or auxiliary buffers at all, and an implementation or context may additionally provide or not provide stencil, depth, or accumulation buffers.
Under OpenGL, the color buffers consist of either unsigned integer color indices or R, G, B, and, optionally, a number xe2x80x9cAxe2x80x9d of unsigned integer values; and the number of bitplanes in each of the color buffers, the depth buffer (if provided), the stencil buffer (if provided), and the accumulation buffer (if provided), is fixed and window dependent. If an accumulation buffer is provided, it should have at least as many bit planes per R, G, and B color component as do the color buffers.
A fragment produced by rasterization with window coordinates of (xw, yw) modifies the pixel in the framebuffer at that location based on a number of tests, parameters, and conditions. Noteworthy among the several tests that are typically performed sequentially beginning with a fragment and its associated data and finishing with the final output stream to the frame buffer are in the order performed (and with some variation among APIs): 1) pixel ownership test; 2) scissor test; 3) alpha test; 4) Color Test; 5) stencil test; 6) depth test; 7) blending; 8) dithering; and 9) logicop. Note that the OpenGL does not provide for an explicit xe2x80x9ccolor testxe2x80x9d between the alpha test and stencil test. Per-Fragment operations under OpenGL are applied after all the color computations. Each of these tests or operations is briefly described below.
2.4.1 Ownership Test
Under OpenGL, the pixel ownership test determines if the pixel at location (xw, yw) in the framebuffer is currently owned by the GL context. If it is not, the window system decides the fate of the incoming fragment. Possible results are that the fragment is discarded or that some subset of the subsequent per-fragment operations are applied to the fragment. This pixel ownership test allows the window system to properly control the GL""s behavior.
Assume that in a computer having a display screen, one or several processes are running and that each process has a window on the display screen. For each process, the associated window defines the pixels the process wants to write or render to. When there are two or more windows, the window associated with one process may be in front of the window associated with another process, behind that window, or both windows may be entirely visible. Since there is only a single frame buffer for the entire display screen or desktop, the pixel ownership test involves determining which process and associated window owns each of the pixels. If a particular process does not xe2x80x9cownxe2x80x9d a pixel, it fails the pixel ownership test relative to the frame buffer and that pixel is thrown away. Note that under the typical paradigm, the pixel ownership test is run by each process, and that for a give pixel location in the frame buffer, that pixel may pass the pixel ownership test for one of the processes, and fail the pixel ownership test for the other process. Furthermore, in general, a particular pixel can pass the ownership test for only one process because only one process can own a particular frame buffer pixel at the same time.
In some rendering schemes the pixel ownership test may not be particularly relevant. For example, if the scene is being rendered to an off-screen buffer, and subsequently Block Transferred or xe2x80x9cblittedxe2x80x9d to the desktop, pixel ownership is not really even relevant. Each process automatically or necessarily passes the pixel ownership test (if it is executed) because each process effectively owns its own off-screen buffer and nothing is in front of that buffer.
If for a particular process, the pixel is not owned by that process, then there is no need to write a pixel value to that location, and all subsequent processing for that pixel may be ignored. In a typical workstation, all the data associated with a particular pixel on the screen is read during rasterization. All information for any polygon that feeds that pixel is read, including information as to the identity of the process that owns that frame buffer pixel, as well as the z-buffer, the color value, the old color value, the alpha value, stencil bits, and so forth. If a process owns the pixel, then the other downstream process are executed (for example, scissor test, alpha test, and the like). On the other hand, if the process does not own the pixel and fails the ownership test for that pixel, the process need not consider that pixel further and that pixel is skipped for subsequent tests.
2.4.2 Scissor Test
Under OpenGL, the scissor test determines if (xw, yw) lies within a scissor rectangle defined by four coordinate values corresponding to a left bottom (left, bottom) coordinate, a width of the rectangle, and a height of the rectangle. The values are set with the procedure xe2x80x9cvoid Scissor(int left, int bottom, sizei width, sizei height)xe2x80x9d under OpenGL. If leftxe2x89xa6xw less than left+width and bottomxe2x89xa6yw less than bottom+height, then the scissor test passes; otherwise the scissor test fails and the particular fragment being tested is discarded. Various initial states are provided and error conditions monitored and reported.
In simple terms, a rectangle defines a window which may be an on-screen or off-screen window. The window is defined by an x-left, x-right, y-top, and y-bottom coordinate (even though it may be expressed in terms of a point and height and width dimensions from that point). This scissor window is useful in that only pixels from a polygon fragment that fall in that screen aligned scissor window will change. In the event that a polygon straddles the scissor window, only those pixels that are inside the scissor window may change.
When a polygon in an OpenGL machine comes down the pipeline, the pipeline calculates everything it needs to in order to determine the z-value and color of that pixel. Once z-value and color are determined, that information is used to determine what information should be placed in the frame buffer (thereby determining what is displayed on the display screen).
Just as with the pixel ownership test, the scissor test provides means for discarding pixels and/or fragments before they actually get to the frame buffer to cause the output to change.
2.4.3 Alpha Test
Color is defined by four values, red (R), green (G), blue (B), and alpha (A). The RGB values define the contribution from each of the primary colors, and alpha is related to the transparency. Typically, color is a 32-bit value, 8-bits for each component, though such representation is not limited to 32-bits. Alpha test compares the alpha value of a given pixel to an alpha reference value. The type of comparison may also be specified, so that for example the comparison may be a greater-than operation, a less-than operation, and so forth. If the comparison is a greater-than operation, then the pixel""s alpha value has to be greater than the reference to pass the alpha test. So if the pixel""s alpha value is 0.9, the reference alpha is 0.8, and the comparison is greater-than, then that pixel passes the alpha test. Any pixel not passing the alpha test is thrown away or discarded. The OpenGL Specification describes the manner in which alpha test is implemented in OpenGL, and we do not describe it further here.
Alpha test is a per-fragment operation and happens after all of the fragment coloring calculations and lighting and shading operations are completed. Each of these per-fragment operations may be though of as part of the conventional z-buffer blending operations.
2.4.4 Color Test
Color test is similar to the alpha test described hereinbefore, except that rather than performing the magnitude or logical comparisons between the pixel alpha (A) value and a reference value, the color test performs a magnitude or logical comparison between one or a combination of the R, G, or B color components and reference value(s). The comparison test may be for example, greater-than, less-than, equal-to, greater-than-or-equal-to, xe2x80x9cgreater-than-c1 and less-than c2xe2x80x9d where c1 and c2 are sore predetermined reference values, and so forth. One might for example, specify a reference minimum R value, and a reference maximum R value, such that the color test would be passed only if the pixel R value is between that minimum and maximum. Color test might, for example, be useful to provide blue-screen functionality. The comparison test may also be performed on a single color component or on a combination of color components. Furthermore, although for the alpha test one typically has one value for each component, for the color test there are effectively two values per component, a maximum value and a minimum value.
2.4.5 Stencil Test
Under OpenGL, stencil test conditionally discards a fragment based on the outcome of a comparison between a value stored in a stencil buffer at location (xw, yw) and a reference value. Several stencil comparison functions are permitted such that the stencil test passes never, always, if the reference value is less than, less than or equal to, equal to, greater than or equal to, greater than, or not equal to the masked stored value in the stencil buffer. The Under OpenGL, if the stencil test fails, the incoming fragment is discarded. The reference value and the comparison value can have multiple bits, typically 8 bits so that 256 different values may be represented. When an object is rendered into the frame buffer, a tag having the stencil bits is also written into the frame buffer. These stencil bits are part of the pipeline state. The type of stencil test to perform can be specified at the time the geometry is rendered.
The stencil bits are used to implement various filtering, masking or stenciling operations. For example, if a particular fragment ends up affecting a particular pixel in the frame buffer, then the stencil bits can be written to the frame buffer along with the pixel information.
2.4.6 Depth Buffer Test
Under OpenGL, the depth buffer test discards the incoming fragment if a depth comparison fails. The comparison is enabled or disabled with the generic Enable and Disable commands using the OpenGL symbolic constant DEPTH_TEST. When depth test is disabled, the depth comparison and subsequent possible updates to the depth buffer value are bypassed and a fragment is passed to the next operation. The stencil bits are also involved and are modified even if the test is bypassed. The stencil value is modified if the depth buffer test passed. If depth test is enabled, the depth comparison takes place and the depth buffer and stencil value may subsequently be modified. The manner in which the depth test is implemented in OpenGL is described in greater detail in the OpenGL specification at page 145.
Depth comparisons are implemented in which possible outcomes are as follows: the depth buffer test passes never, always, if the incoming fragment""s zw value is less than, less than or equal to, equal to, greater than, greater than or equal to, or not equal to the depth value stored at the location given by the incoming fragment""s (xw, yw) coordinates. If the depth buffer test fails, the incoming fragment is discarded. The stencil value at the fragment""s (xw, yw) coordinate is updated according to the function currently in effect for depth buffer test failure. Otherwise, the fragment continues to the next operation and the value of the depth buffer at the fragment""s (xw, yw) location is set to the fragment""s zw value. In this case the stencil value is updated according to the function currently in effect for depth buffer test success. The necessary OpenGL state is an eight-valued integer and a single bit indicating whether depth buffering is enabled or disabled.
2.4.7 Blending
Under OpenGL, blending combines the incoming fragment""s R, G, B, and A values with the R, G, B, and A values stored in the framebuffer at the incoming fragment""s (Xw, Yw) location.
This blending is typically dependent on the incoming fragment""s alpha value (A) and that of the corresponding frame buffer stored pixel. In the following discussion, Cs refers to the source color for an incoming fragment, Cd refers to the destination color at the corresponding framebuffer location, and Cc refers to a constant color in-the GL state. Individual RGBA components of these colors are denoted by subscripts of s, d, and c respectively.
Blending is basically an operation that takes color in the frame buffer and the color in the fragment, and blends them together. The manner in which blending is achieved, that is the particular blending function, may be selected from various alternatives for both the source and destination.
Blending is described in the OpenGL specification at page 146-149 and is hereby incorporated by reference. Various blend equations are available under OpenGL. For example, an additive type blend is available wherein a blend result (C) is obtained by adding the product of a source color (Cs) by a source weighting factor quadruplet (S) to the product of a destination color (Cd) and a destination weighting factor (D) quadruplet, that is C=CsS+CdD. Alternatively, the blend equation may be a subtraction (C=CsSxe2x88x92CdD), a reverse subtraction (C=CdDxe2x88x92CsS), a minimum function (C=min(Cs, Cd)), a maximum function (C=max(Cs, Cd)),. Under OpenGL, the blending equation is evaluated separately for each color component and its corresponding weighting coefficient. Each of the four R, G, B, A components has its own weighting factor.
The blending test (or blending equation) is part of pipeline state and can potentially change for every polygon, but more typically would chang only for the object made up or several polygons.
In generally, blending is only performed once other tests such as the pixel ownership test and stencil test have been passed so that it is clear that the pixel or fragment under consideration would or could have an effect in the output.
2.4.8 Dithering
Under OpenGL, dithering selects between two color values or indices. In RGBA mode, consider the value of any of the color components as a fixed-point value with m bits to the left of the binary point, where m is the number of bits allocated to that component in the framebuffer; call each such value c. For each c, dithering selects a value c1 such that c1∈ {max{0, [c]xe2x88x921, [c]}. This selection may depend on the xw and yw coordinates of the pixel. In color index mode, the same rule applies with c being a single color index. The value of c must not be larger than the maximum value representable in the framebuffer for either the component or the index.
Although many dithering algorithms are possible, a dithered value produced by any algorithm must generally depend only the incoming value and the fragment""s x and y window coordinates. When dithering is disabled, each color component is truncated to a fixed-point value with as many bits as there are in the corresponding framebuffer component, and the color index is rounded to the nearest integer representable in the color index portion of the framebuffer.
The OpenGL Specification of dithering is described more fully in the OpenGL specification, particularly at pages 149-150, which are incorporated by reference.
2.4.9 Logicop
Under OpenGL, there is a final logical operation applied between the incoming fragment""s color or index values and the color or index values stored in the frame buffer at the corresponding location. The result of the logical operation replaces the values in the framebuffer at the fragment""s (x, y) coordinates. Various logical operations may be implemented between source (s) and destination (d), including for example: clear, set, and, noop, xor, or, nor, nand, invert, copy, inverted and, equivalence, reverse or, reverse and, inverted copy, and inverted or. The logicop arguments and corresponding operations, as well as additional details of the OpenGL logicop implementation, are set forth in the OpenGL specification at pates 150-151. Logical operations are performed independently for each color index buffer that is selected for writing, or for each red, green, blue, and alpha value of each color buffer that is selected for writing. The required state is an integer indicating the logical operation, and two bits indicating whether the logical operation is enabled or disabled.
In this document, pixels are referred to as the smallest individually controllable element of the display device. But, because images are quantized into discrete pixels, spatial aliasing occurs. A typical aliasing artifact is a xe2x80x9cstaircasexe2x80x9d effect caused when a straight line or edge cuts diagonally across rows of pixels.
Some rendering systems reduce aliasing effects by dividing pixels into subpixels, where each sub-pixel can be colored independently. When the image is to be displayed, the colors for all sub-pixels within each pixel are blended together to form an average color for the pixel. A renderer that uses up to 16 sub-pixels per pixel is described in xe2x80x9cRealityEngine Graphicsxe2x80x9d, by Akeley, pages 109 to 116 of SIGGRAPH93 Proceedings, Aug. 1-6, 1993, Computer Graphics Proceedings, Annual Conference Series, published by ACM SIGGRAPH, New York, 1993, Softcover ISBN 0-201-58889-7 and CD-ROM ISBN 0-201-56997-3 (hereinafter referred to as the Akeley Reference).
Another prior art antialiasing method is the A-Buffer used to perform blending (this technique is also included in the Akeley Reference), and is described in xe2x80x9cThe A-buffer, an Antialiased Hidden Surface Methodxe2x80x9d by L. Carpenter, SIGGRAPH 1984 Conference Proceedings, pp. 103-108 (hereinafter referred to as the Carpenter Reference). The A-buffer is an antialiasing technique that reduces aliasing by keeping track of the percent coverage of a pixel by a rendered polygon. The main drawback to this technique is the need to sort polygons front-to-back (or back-to-front) at each pixel in order to get acceptable antialiased polygons.
Most Content Addressable Memories (CAM) perform a bit-for-bit equality test between an input vector and each of the data words stored in the CAM. This type of CAM frequently provides masking of bit positions in order to eliminate the corresponding bit in all words from affecting the equality test. It is inefficient to perform magnitude comparisons in a equality-testing CAM because a large number of clock cycles is required to do the task. CAMs are presently used in translation look-aside buffers within a virtual memory systems in some computers. CAMs are also used to match addresses in high speed computer networks.
Magnitude Comparison CAM (MCCAM) is defined here as any CAM where the stored data are treated as numbers, and arithmetic magnitude comparisons (i.e. less-than, greater-than, less-than-or-equal-to, and the like) are performed on the data in parallel. This is in contrast to ordinary CAM which treats stored data strictly as bit vectors, not as numbers. An MCCAM patent, included herein by reference, is U.S. Pat. No. 4,996,666, by Jerome F. Duluk Jr., entitled xe2x80x9cContent-Addressable Memory System Capable of Fully Parallel Magnitude Comparisonsxe2x80x9d, granted Feb. 26, 1991 (hereinafter referred to as the Duluk Patent). Structures within the Duluk Patent specifically referenced shall include the prefix xe2x80x9cDuluk Patentxe2x80x9d (for example, xe2x80x9cDuluk Patent MCCAM Bit Circuitxe2x80x9d).
The basic internal structure of an MCCAM is a set of memory bits organized into words, where each word can perform one or more arithmetic magnitude comparisons between the stored data and input data. In general, for an MCCAM, when a vector of numbers is applied in parallel to an array of words, all arithmetic comparisons in all words occur in parallel. Such a parallel search comparison operation is called a xe2x80x9cqueryxe2x80x9d of the stored data.
The invention described herein is a system and method for performing tangent space lighting in a deferred shading architecture. As documented in the detailed description, in a deferred shading architecture implemented in accordance with the present invention floating point-intensive lighting computations are performed only after hidden surfaces have been removed from the graphics pipeline. This can result in dramatically fewer lighting computations than in the conventional approach described in reference to FIG. 2, where shading computations (FIG. 2, 222) are performed for nearly all surfaces before hidden pixels are removed in the z-buffered blending operation (FIG. 2, 236). To illustrate the advantages of the present invention a description is now provided of a few conventional approaches to performing lighting computations, including bump mapping. One of the described approaches is embodied in 3D graphics hardware sold by Silicon Graphics International (SGI).
The theoretical basis and implementation of lighting computations in conventional 3D graphics systems is well-known and is thoroughly documented in the following publications, which are incorporated herein by reference:
1) Phong, B. T., Illumination for Computer Generated Pictures, Communications of the ACM 18, 6 (June 1975), 311-317 (hereinafter referred to as the Phong reference);
2) Blinn, J. F., Simulation of Wrinkled Surfaces, In Computer Graphics (SIGGRAPH ""78 Proceedings) (August 1978), vol. 12, pp. 286-292 (hereinafter referred to as the Blinn reference);
3) Watt, Alan, 3D Computer Graphics (2nd ed.), p. 250 (hereinafter referred to as the Watt reference);
4) Peercy, M. et al., Efficient Bump Mapping Hardware, In Computer Graphics (SIGGRAPH ""97 Proceedings) (July 1997), vol. 8, pp. 303-306 (hereinafter referred to as the Peercy reference).
Generally, lighting computations generate for each pixel of a surface an RGBA color value that accounts for the surface""s color, orientation and material properties; the orientation and properties of the surface illumination; and the viewpoint from which the illuminated surface is observed. The material properties can include: fog, emissive color, reflective properties (ambient, diffuse, specular) and bump effects. The illumination properties can include for one or more lights: color (global ambient, light ambient, light diffuse, light specular) and attenuation, spotlight and shadow effects.
There are many different lighting models that can be implemented in a 3D graphics system, including Gouraud shading and Phong shading. In Gouraud shading, lighting computations are made at each vertex of an illuminated surface and the resulting colors are interpolated. This technique is computationally simple but provides many undesirable artifacts, such as mach banding. The most realistic lighting effects are provided by Phong shading, where lighting computations are made at each pixel based on interpolated and normalized vertex normals. Typically, a graphics system supports many different lighting models. However, as a focus of the present invention is to efficiently combine Phong shading and bump mapping, the other lighting models are not further described.
2.7.1 Lighting Computations
Referring to FIG. 3 there is shown a diagram illustrating the elements employed in the lighting computations of both the conventional approach and the present invention. This figure does not illustrate the elements used in bump mapping calculations, which are shown in FIG. 4. The elements shown in FIG. 3 are defined below.
2.7.1.1 Definitions of Elements of Lighting Computations
V the position of the fragment to be illuminated in eye coordinates (Vx, Vy, Vz).
{circumflex over (N)} the unit normal vector at the fragment (Nx, Ny, Nz).
PL the location of the light source in eye coordinates (PLx, PLy, PLz).
PLi indicates whether the light is located at infinity (0=infinity). If the light is at infinity then
PL represents the coordinates of a unit vector from the origin to the light, {circumflex over (P)}L 
PE the location of the viewer (viewpoint). In eye coordinates the viewpoint is at either (0,0,0) or (0,0,∞). This is specified as a lighting mode.
Ê is the unit vector from the vertex to the viewpoint, PE, and is defined as follows:       E    ^    =            [                                                  E              x                                                                          E              y                                                                          E              z                                          ]        =          {                                                                  1                                  d                  E                                            ·                                                [                                                                                                              (                                                      -                                                          V                              x                                                                                )                                                                                                                      (                                                      -                                                          V                              y                                                                                )                                                                                                                      (                                                      -                                                          V                              z                                                                                )                                                                                                      ]                                T                                                                                        for                ⁢                                  xe2x80x83                                ⁢                                  P                  E                                            =                              (                                  0                  ,                  0                  ,                  0                                )                                                                                                        [                                                                            0                                                              0                                                              1                                                                      ]                            T                                                                          for                ⁢                                  xe2x80x83                                ⁢                                  P                  E                                            =                              (                                  0                  ,                  0                  ,                  ∞                                )                                                        
xe2x80x83where
dE={square root over (Vx2+Vy2+Vz2)}
{circumflex over (L)} is the unit vector from the vertex to the light, PL, and is defined as follows:       L    ^    =            [                                                  L              x                                                                          L              y                                                                          L              z                                          ]        =          {                                                                  1                                  d                  L                                            ·                              [                                                                                                    (                                                                              P                            Lx                                                    -                                                      V                            x                                                                          )                                                                                                                                                (                                                                              P                            Ly                                                    -                                                      V                            y                                                                          )                                                                                                                                                (                                                                              P                            Lz                                                    -                                                      V                            z                                                                          )                                                                                            ]                                                                                        for                ⁢                                  xe2x80x83                                ⁢                                  P                  Li                                            =              local                                                                          [                                                                                          P                      Lx                                                                                                                                  P                      Ly                                                                                                                                  P                      Lz                                                                                  ]                                                                          for                ⁢                                  xe2x80x83                                ⁢                                  P                  Li                                            =              ∞                                          
xe2x80x83where
      d    L    =                              (                                    P              Lx                        -                          V              x                                )                2            +                        (                                    P              Ly                        -                          V              y                                )                2            +                        (                                    P              Lz                        -                          V              z                                )                2            
Ĥ is the unit vector half way between Ê and {circumflex over (L)}, and is defined as follows:       H    ^    =            H      ϖ                      "LeftDoubleBracketingBar"        H        "RightDoubleBracketingBar"            ϖ      
xe2x80x83where       H    ϖ    =            E      ^        +          L      ^      
hn is the cosine of the angle between {circumflex over (N)}, and the half way vector, Ĥ, and is defined as follows:
hn=Ĥxc2x7{circumflex over (N)}=Hxxc2x7Nx+Hyxc2x7Ny+Hzxc2x7Nz
pn the cosine of the angle between {circumflex over (N)}, and the vector to the light, {circumflex over (L)}, and is defined as follows:
pn={circumflex over (N)}xc2x7{circumflex over (L)}
ŜD the unit vector in the direction of the spotlight. It is a Lighting Source Parameter and is provided as a unit vector.
Sc is the cosine of the angle that defines the spotlight cone. It is a Lighting Source Parameter.
Sdv the cosine of the angle between the spotlight direction, ŜD, and the vector from the light to the vertex, xe2x88x92{circumflex over (L)}, and is defined as follows:
sdv=ŜDxc2x7(xe2x88x92{circumflex over (L)})
dL the distance from the light to the vertex. See {circumflex over (L)} above.
2.7.1.2 Lighting Equation
The xe2x80x9cLighting Colorxe2x80x9d of each pixel is computed according to the following lighting equation (Eq. (1)):                     LightingColor        =                  EmissiveColor          +          GlobalAmbientColor          +                                    ∑                              i                =                0                                            n                -                1                                      ⁢                          xe2x80x83                        ⁢                          [                              Attenuation                ·                SpotLightEffect                ·                                  (                                      AmbientColor                    +                    DiffuseColor                    +                    SpecularColor                                    )                                            ]                                                          Eq        .                  xe2x80x83                ⁢                  (          1          )                    
2.7.1.3 Lighting Equation Terms
The terms used in the lighting equation (Eq. (1)) are defined for the purposes of the present application as follows. These definitions are consistent with prior art usage.
Emissive Color The color given to a surface by its self illuminating material property without a light.
Ambient Color The color given to a surface due to a lights ambient intensity and scaled by the materials ambient reflective property. Ambient Color is not dependent on the position of the light or the viewer. Two types of ambient lights are provided, a Global Ambient Scene Light, and the ambient light intensity associated with individual lights.
Diffuse Color The color given to a surface due to a light""s diffuse intensity and scaled by the material""s diffuse reflective property and the direction of the light with respect to the surface""s normal. Because the diffuse light reflects in all directions, the position of the viewpoint has no effect on a surface""s diffuse color.
Specular Color The color given to a surface due to a light""s specular intensity and scaled by the material""s specular reflective property and the directions of the light and the iewpoint with respect to the surface""s normal. The rate at which a material""s specular reflection fades off is an exponential factor and is specified as the material""s shininess factor.
Attenuation The amount that a color""s intensity from a light source fades away as a function of the distance from the surface to the light. Three factors are specified per light, a constant coefficient, a linear coefficient, and a quadratic coefficient.
Spotlight A feature per light source that defines the direction of the light and its cone of illumination. A spotlight has no effect on a surface that lies outside its cone. The illumination by the spotlight inside the cone depends on how far the surface is from the center of the cone and is specified by a spotlight exponent factor.
The meaning and derivation of each of these terms is now described.
2.7.1.3.1 Emissive Color
The emissive color is just the emissive attribute of the material (Ecm). I.e.,
EmissiveColor=Ecm
2.7.1.3.2 Ambient Effects
The ambient attribute of a material, Acm, is used to scale the Global Scene Ambient Light, Acs, to determine the global ambient effect. I.e.,
GlobalAmbientColor=Acmxc2x7Acs
2.7.1.3.3 Individual light effects
Individual lights have an ambient, diffuse, and specular attribute associated with them. These attributes are effected by the ambient, diffuse, and specular attributes of the material, resp. Each light may also have a spotlight attribute and an attenuation factor, which are expressed as follows.
2.7.1.3.3.1 Attenuation
The Attenuation factor is a fraction that reduces the lighting effect from a particular light depending on the distance of the light""s position to the position of the vertex, dL. If the light""s position is at infinity (PLi=0), then the attenuation factor is one and has no effect. Three positive factors are provided per light that determine the attenuation value, Kc, Kl, and Kq. These are the constant, linear, and quadratic effects, resp. Note that eye coordinates of the surface are needed to determine the light""s distance. Given these factors, Attenuation is expressed as follows:   Attenuation  =      1                  K        c            +                        K          l                ·                  d          L                    +                        K          q                ·                  d          L          2                    
2.7.1.3.3.2 Spotlight
Each light can be specified to act as a spotlight. The result of a spotlight is to diminish the effect that a light has on a vertex based upon the distance of the vertex from the direction that the spotlight is pointed. If the light is not a spotlight then there is no effect and the spotlight factor is one. The parameters needed to specify a spotlight are the position of the spotlight, PL, PLi, the unit length direction of the spotlight, ŜD the cosine of the spotlight cutoff angle, sc, and the spotlight exponent, sE. The range of the cutoff angle cosine is 0 to 1. A negative value of sc indicates no spotlight effect. If the Vertex lies within the spotlight cutoff angle, then it is lit, otherwise, it is not lit. The amount that a vertex is lit is determined by the spotlight exponent, the further the vertex is from the center of the cone the less it is lit.
Sdv, the cosine of the angle between the spotlight direction and the vector from light to vertex, is used to determine whether the vertex is lit and how far the vertex is from the center of the spotlight cone.
sdv=ŜDxc2x7(xe2x88x92{circumflex over (L)})
If sdvxe2x89xa7sc then the vertex is lit. How much it is lit depends on (sdv)sE.
To summarize:   SpotlightEffect  =      {                            1                                      ,                                          for                ⁢                                  xe2x80x83                                ⁢                                  s                  c                                            =                              -                1                                      ,                                                0                                      ,                                          for                ⁢                                  xe2x80x83                                ⁢                                  s                  c                                            ≠                                                -                  1                                ⁢                                  xe2x80x83                                ⁢                and                ⁢                                  xe2x80x83                                ⁢                                  s                  c                                             less than                               s                dv                                                                                                    (                              s                dv                            )                                      s              E                                                            ,                                          for                ⁢                                  xe2x80x83                                ⁢                                  s                  c                                            ≠                                                -                  1                                ⁢                                  xe2x80x83                                ⁢                and                ⁢                                  xe2x80x83                                ⁢                                  s                  c                                            ≥                              s                dv                                                        
The ambient effect of local lights is the Local Ambient Light, Acl, scaled by the ambient attribute of a material, Acm.
AmbientColor=Aclxc2x7Acm
2.7.1.3.3.4 Diffuse Effect
The diffuse light effect is determined by the position of the light with respect to the normal of the surface. It does not depend on the position of the viewpoint. It is determined by the diffuse attribute of the material, Dcm, the diffuse attribute of the light, Dcl, the position of the light, PL, PLi, the position of the vertex, V, and the unit vector normal of the vertex, {circumflex over (N)}.
{circumflex over (L)} is the unit length vector from the vertex to the light position. If the light position is at infinity (PL=0), then only the light position is used, PL, and the eye coordinates of the vertex are not needed.
The diffuse effect can be described as Dcl, the diffuse light, scaled by, Dcm, the diffuse material, and finally scaled by pN, the cosine of the angle between the direction of the light and the surface normal. This cosine is limited between 0 and 1. If the cosine is negative, then the diffuse effect is 0.   DiffuseColor  =      {                            0                                      ,                                          for                ⁢                                  xe2x80x83                                ⁢                                  p                  N                                            ≤              0                                                                                      D              cl                        ·                          D                              c                ⁢                                  xe2x80x83                                ⁢                m                                      ·                          p              N                                                            ,                                          for                ⁢                                  xe2x80x83                                ⁢                                  p                  N                                             greater than               0                                          
xe2x80x83where
pN=Nxc2x7{circumflex over (L)}
2.7.1.3.3.5 Specular Effect
The specular light effect is determined by the position of the light with respect to the normal of the surface and the position of the viewpoint. It is determined by the specular color of the material, Scm, the specular exponent (shininess) of the material, Snm, the specular attribute of the light, Sd, the position of the light, PL, PLi, the unit eye vector Ê (described below), the position of the vertex, V, and the unit vector normal of the vertex, {circumflex over (N)}.
{circumflex over (L)} is the unit length vector from the vertex to the light position. If the light position is at infinity (PLi=0), then only the light position, PL, is used and {circumflex over (L)} is independent of the vertex""s eye coordinates.
Ê is the unit length vector from the vertex to the viewpoint. If the viewpoint position is at infinity, then Ê=[0 0 1]T={circumflex over (Z)} and is independent of the vertex""s eye coordinates.
Ĥ is the unit length vector halfway between {circumflex over (L)} and Ê.       H    ^    =                    H        ϖ                              "LeftDoubleBracketingBar"          H          "RightDoubleBracketingBar"                ϖ              =                            L          ^                +                  E          ^                            "LeftDoubleBracketingBar"                              L            ^                    +                      E            ^                          "RightDoubleBracketingBar"            
If the light position is infinite and the viewpoint is infinite, then the halfway vector, Ĥ, is independent of the vertex position and is provided as light parameter.
The specular effect can be described as Scl, the diffuse light, scaled by, Scm, the diffuse material, and finally scaled by (hN)Snm, the cosine of the angle between the halfway vector and the surface normal raised to the power of the shininess. The cosine is limited between 0 and 1. If the cosine is negative, then the specular effect is 0.   SpecularColor  =      {                            0                                      ,                                          for                ⁢                                  xe2x80x83                                ⁢                                  h                  N                                            ≤              0                                                                                      S              cl                        ·                          S                              c                ⁢                                  xe2x80x83                                ⁢                m                                      ·                                          (                                  h                  N                                )                                            S                m                                                                          ,                                          for                ⁢                                  xe2x80x83                                ⁢                                  h                  N                                             greater than               0                                          
xe2x80x83where
hN={circumflex over (N)}xc2x7Ĥ
2.7.1.3.4 Infinite Viewpoint and Infinite Light Effect
In OpenGL, a light""s position can be defined as having a distance of infinity from the origin but still have a vector pointing to its position. This definition is used in simplifying the calculation needed to determine the vector from the vertex to the light (in other APIs, which do not define the light""s position in this way, this simplification cannot be made). If a light is at infinity, then this vector is independent of the position of the vertex, is constant for every vertex, and does not need the vertex""s eye coordinates. This simplification is used for spotlights, diffuse color, and specular color.
The viewpoint is defined as being at the origin or at infinity in the z direction. This is used to simplify the calculation for specular color. If the viewer is at infinity then the vector from the vertex to the viewpoint is independent of the position of the vertex, is constant for every vertex, and does not need the vertex""s eye coordinates. This vector is then just the unit vector in the z direction, {circumflex over (Z)}.
2.7.1.4 Calculation Cases Summary
The following table (Table 1) summarizes the calculations needed for lighting depending on whether local or infinite light position and viewer are specified.
2.7.2 Bump Mapping
In advanced lighting systems, the lighting computations can account for bump mapping effects. As described in the Blinn reference, bump mapping produces more realistic lighting by simulating the shadows and highlights resulting from illumination of a surface on which the effect of a three dimensional texture is imposed/mapped. An example of such a textured surface is the pebbled surface of a basketball or the dimpled surface of a golf ball.
Generally, in a lighting system that supports bump mapping a texture map (e.g., a representation of the pebbled basketball surface) is used to perturb the surface normal (N) used in the fragment-lighting calculation (described above). This gives a visual effect of 3-dimensional structure to the surface that cannot be obtained with conventional texture mapping. It also assumes per-fragment lighting is being performed. Bump mapping requires extensions to the OpenGL standard. The theoretical basis of bump mapping is now described with reference to FIG. 4. This approach is common to both of the most common bump mapping methods: the SGI approach and the Blinn approach.
Referring to FIG. 4, there are illustrated some of the elements employed in bump mapping computations. The illustrated approach is described at depth in the Blinn reference and is briefly summarized herein.
2.7.2.1 Bump Mapping Background
Bump Mapping is defined as a perturbation of the Normal Vector, {right arrow over (N)} resulting in the perturbed Vector {right arrow over (N)}xe2x80x2.
The perturbed vector can be calculated by defining {right arrow over (V)}xe2x80x2e to be the location of a point, {right arrow over (V)}xe2x80x2e, after it has been moved (xe2x80x9cbumpedxe2x80x9d) a distance h in the direction of the Normal, {right arrow over (N)}. Define the unit vector in the Normal direction as,       N    ^    =            N      →              "LeftDoubleBracketingBar"              N        →            "RightDoubleBracketingBar"      
Then,
[1]{right arrow over (V)}/e={right arrow over (V)}e+hxc2x7{circumflex over (N)}
The surface tangents, {right arrow over (V)}s and {right arrow over (V)}t, are defined as the partial derivatives of {right arrow over (V)}:                     V        →            s        =                  ∂                              V            →                    e                            ∂        s              ,                    V        →            t        =                  ∂                              V            →                    e                            ∂        t            
The Normal Vector can be defined as the cross product of the surface tangents:
{right arrow over (N)}={right arrow over (V)}sxc3x97{right arrow over (V)}t
Then the Perturbed Normal can be defined as the cross product of the surface tangents of the bumped point.
[2]{right arrow over (N)}xe2x80x2={right arrow over (V)}xe2x80x2sxc3x97{right arrow over (V)}xe2x80x2t
Expanding the partials from [1] gives:                     V        →            s      xe2x80x2        =                            V          →                s            +                                    ∂            h                                ∂            s                          ·                  N          ^                    +              h        ·                              ∂                          N              ^                                            ∂            s                                                  V        →            t      xe2x80x2        =                            V          →                t            +                                    ∂            h                                ∂            s                          ·                  N          ^                    +              h        ·                              ∂                          N              ^                                            ∂            t                              
Since       ∂          N      ^            ∂    s  
and       ∂          N      ^            ∂    t  
are relatively small, they are dropped.
Let       h    s    =            ∂      h              ∂      s      
and       h    t    =            ∂      h              ∂      t      
be defined as Height Gradients. Then, substituting back into [2],                                           N            →                    xe2x80x2                =                ⁢                              (                                                            V                  →                                s                            +                                                h                  s                                ·                                  N                  ^                                                      )                    xc3x97                      (                                                            V                  →                                t                            +                                                h                  t                                ·                                  N                  ^                                                      )                                                  =                ⁢                              (                                                            V                  →                                s                            xc3x97                                                V                  →                                t                                      )                    +                      (                                                            V                  →                                s                            xc3x97                                                h                  t                                ·                                  N                  ^                                                      )                    +                      (                                                            h                  s                                ·                                  N                  ^                                            xc3x97                                                V                  →                                t                                      )                    +                      (                                                            h                  s                                ·                                  N                  ^                                            xc3x97                                                h                  t                                ·                                  N                  ^                                                      )                              
Define Basis Vectors:
[3]{right arrow over (b)}s={circumflex over (N)}xc3x97{right arrow over (V)}t, {right arrow over (b)}t={right arrow over (V)}sxc3x97{circumflex over (N)}
Then, since {circumflex over (N)}xc3x97{circumflex over (N)}=0,
[4]{circumflex over (N)}xe2x80x2={circumflex over (N)}+hsxc2x7{right arrow over (b)}s+htxc2x7{right arrow over (b)}t
This equation [4] is used to perturb the Normal, {right arrow over (N)}, given Height Gradients, hs and ht, and Basis Vectors, {right arrow over (b)}s and {right arrow over (b)}t.
How the Height Gradients and Basis Vectors are specified depends on the model used.
2.7.2.2 Basis Vectors
Basis Vectors can be calculated using [5].                                                                         b                xs                            =                                                                                          N                      ^                                        y                                    ·                                      z                    t                                                  -                                                                            N                      ^                                        z                                    ·                                      y                    s                                                                                                                          xe2x80x83                            ⁢                                                b                  xt                                =                                                                                                    N                        ^                                            z                                        ·                                          y                      t                                                        -                                                                                    N                        ^                                            y                                        ·                                          z                      s                                                                                                                                                              b                ys                            =                                                                                          N                      ^                                        z                                    ·                                      x                    t                                                  -                                                                            N                      ^                                        x                                    ·                                      z                    t                                                                                                                          xe2x80x83                            ⁢                                                b                  ys                                =                                                                                                    N                        ^                                            x                                        ·                                          z                      s                                                        -                                                                                    N                        ^                                            z                                        ·                                          x                      s                                                                                                                                                              b                zs                            =                                                                                          N                      ^                                        x                                    ·                                      y                    t                                                  -                                                                            N                      ^                                        y                                    ·                                      x                    t                                                                                                                          xe2x80x83                            ⁢                                                b                  zt                                =                                                                                                    N                        ^                                            y                                        ·                                          x                      s                                                        -                                                                                    N                        ^                                            x                                        ·                                          y                      s                                                                                                                              [        5        ]            
This calculation for Basis Vectors is the one proposed by Blinn and requires Surface Tangents, a unit Normal Vector, and a cross product.
From the diagram, if the Surface Tangents are orthogonal, the Basis can be approximated by:                                                                         b                xs                            =                              -                                  x                  s                                                                                                        b                xt                            =                              -                                  x                  t                                                                                                                        b                ys                            =                              -                                  y                  s                                                                                                        b                yt                            =                              -                                  y                  t                                                                                                                        b                zs                            =                              -                                  z                  s                                                                                                        b                zt                            =                              -                                  z                  t                                                                                        [        6        ]            
2.7.2.3 Height Gradients
The Height Gradients, hs and ht, are provided per fragment by in the conventional approaches.
2.7.2.4 Surface Tangent Generation
The partial derivatives,             V      →        g    =            ∂                        V          →                e                    ∂      s      
and             V      →        g    =            ∂                        V          →                e                    ∂      t      
are called Surface Tangents. If the user does not provide the Surface Tangents per Vertex, then they need to be generated. The vertices V1 and V2 of a triangle can be described relative to V0 as:                     V        →            1        =                            V          →                0            +                                    ∂                                          V                →                            e                                            ∂            s                          ·                  (                                    s              1                        -                          s              0                                )                    +                                    ∂                                          V                →                            e                                            ∂            t                          ·                  (                                    t              1                        -                          t              0                                )                                        V        →            2        =                            V          →                0            +                                    ∂                                          V                →                            e                                            ∂            s                          ·                  (                                    s              2                        -                          s              0                                )                    +                                    ∂                                          V                →                            e                                            ∂            t                          ·                  (                                    t              2                        -                          t              0                                )                    
Let                                                         V              ^                        1                    =                                                    V                →                            1                        -                                          V                →                            0                                      ,                                                                x              ^                        1                    =                                    x              1                        -                          x              0                                      ,                                                                y              ^                        1                    =                                    y              1                        -                          y              0                                      ,                                                  z            ^                    1                =                              z            1                    -          z                                                                            V              ^                        2                    =                                                    V                →                            2                        -                                          V                →                            0                                      ,                                                                x              ^                        2                    =                                    x              2                        -                          x              0                                      ,                                                                y              ^                        2                    =                                    y              2                        -                          y              0                                      ,                                                  z            ^                    2                =                              z            2                    -          z                                                                            s              ^                        1                    =                                    s              1                        -                          s              0                                      ,                                                  t            ^                    1                =                              t            1                    -                      t            0                                              xe2x80x83                            xe2x80x83                                                                    s              ^                        2                    =                                    s              2                        -                          s              0                                      ,                                                  t            ^                    2                =                              t            2                    -                      t            0                                              xe2x80x83                            xe2x80x83            
Then,
{circumflex over (V)}1={right arrow over (V)}sxc2x7ŝ1+{right arrow over (V)}txc2x7{circumflex over (t)}1 {circumflex over (V)}2={right arrow over (V)}sxc2x7ŝ2+{right arrow over (V)}txc2x7{circumflex over (t)}2
Solving for the partials:                     V        →            s        =                                                      V              ^                        1                    ·                                    t              ^                        2                          -                                            V              ^                        2                    ·                                    t              ^                        1                                                                          s              ^                        1                    ·                                    t              ^                        2                          -                                            s              ^                        2                    ·                                    t              ^                        1                                ,      xe2x80x83    ⁢                    V        →            t        =                                        s            ^                    1                ·                              V            ^                    2                                                  s            ^                    1                ·                              t            ^                    2                    
or   "AutoLeftMatch"                                                                                                                                    ∂                                              x                        e                                                                                    ∂                      s                                                        =                                                            D                      xt                                                              D                      st                                                                      ,                                                                            ∂                                              x                        e                                                                                    ∂                      t                                                        =                                                            D                      sx                                                              D                      st                                                                                                                                                                                                          ∂                                              y                        e                                                                                    ∂                      s                                                        =                                                            D                      yt                                                              D                      st                                                                      ,                                                                            ∂                                              y                        e                                                                                    ∂                      t                                                        =                                                            D                      sy                                                              D                      st                                                                                                                                                                                  ∂                                  z                  e                                                            ∂                s                                      =                                          D                zt                                            D                st                                              ,                                                    ∂                                  z                  e                                                            ∂                t                                      =                                          D                sz                                            D                st                                                        
xe2x80x83where:
Dij=î1ĵ2xe2x88x92î2ĵ1
Two different conventional approaches to implementing bump mapping in accordance with the preceding description are now described with reference to FIGS. 5A, 5B, 6A and 6B.
2.7.2.5 SGI Bump Mapping
Referring to FIG. 5A, there is shown a functional flow diagram illustrating a bump mapping approach proposed by Silicon Graphics (SGI). The functional blocks include: xe2x80x9ccompute perturbed normalxe2x80x9d SGI10, xe2x80x9cstore texture mapxe2x80x9d SGI12, xe2x80x9cperform lighting computationsxe2x80x9d SGI14 and xe2x80x9ctransform eye space to tangent spacexe2x80x9d SGI16. In the typical embodiment of this approach the steps SGI10 and SGI12 are performed in software and the steps SGI14 and SGI16 are performed in 3D graphics hardware. In particular, the step SGI16 is performed using the same hardware that is optimized to perform Phong shading. The SGI approach is documented in the Peercy reference.
A key aspect of the SGI approach is that all lighting and bump mapping computations are performed in tangent space, which is a space defined for each surface/object by orthonormal vectors comprising a unit surface normal (N) and two unit surface tangents (T and B). The basis vectors could be explicitly defined at each vertex by an application program or could be derived by the graphics processor from a reference frame that is local to each object. However the tangent space is defined, the components of the basis vectors are given in eye space. A standard theorem from linear algebra states that the matrix used to transform from coordinate system A (e.g., eye space) to system B (e.g., tangent space) can be formed from the coordinates of the basis vectors of system B in system A. Consequently, a matrix M whose columns comprise the basis vectors N, T and B represented in eye space coordinates can be used to transform eye space vectors into corresponding tangent space vectors. As described below, this transformation is used in the SGI pipeline to enable the lighting and bump mapping computations to be done in tangent space.
The elements employed in the illustrated SGI approach include the following:
u one coordinate of tangent space in plane of surface
v one coordinate of tangent space in plane of surface
N surface normal at each vertex of a fragment to be illuminated;
Pu surface tangent along the u axis at each vertex of a fragment to be illuminated;
Pv surface tangent along the v axis at each vertex of a fragment to be illuminated;
fu(u,v) partial derivative along the u axis of the input texture map computed at each point of the texture map (NOTE: according to the OpenGL standard, an input texture map is a 1, 2 or 3-dimensional array of values f(u,v) that define a height field in (u,v) space. In the SGI approach this height field is converted to a collection of partial derivatives fu(u,v), fv(u,v) that gives the gradient in two directions (u and v) for each point of the height field);
fv(u,v) partial derivative along the v axis of the input texture map computed at each point of the texture map (see discussion of fv(u,v));
L light vector in eye space;
H half angle vector in eye space;
LTS light vector in tangent space;
HTS half angle vector in tangent space;
T unit surface tangent along Pu;
B unit surface binormal, defined as the cross product of N and T.
Note: the preceding discussion uses notation from the Peercy paper, other portions of this application (e.g., the remainder of the background and the detailed description) use different notation for similar parameters. The correspondence between the two systems is shown below, with the Peercy notation listed under the column labelled xe2x80x9cSGIxe2x80x9d and the other notation listed under the column labelled xe2x80x9cRaycerxe2x80x9d.
In the SGI approach an input texture map comprising a set of partial derivatives fu(u,v), fv(u,v) is used in combination with the surface normal (N) and tangents (Pu, Pv) and basis vectors B and T to compute the perturbed normal in tangent space (Nxe2x80x2TS) at each point of the height field according to the following equations (step SGI10):
N/TS=(a,b,c)/{square root over (a2+b2+c2)}
where:
a=xe2x88x92fu(Bxc2x7Pv)
b=xe2x88x92fv|Pu|xe2x88x92fu(Txc2x7Pv)
c=|Puxc3x97Pv|
The coefficients a, b and c are the unnormalized components of the perturbed normal Nxe2x80x2TS in tangent space (i.e., the coefficient c is in the normal direction and the coefficients a and b represent perturbations to the normal in the u and v directions). In step (SGI12) these coefficients are stored as a texture map TMAP, which is provided to the SGI 3D hardware in a format specified by an appropriate API (e.g, OpenGL).
Using the linear algebra theorem mentioned above, the light and half angle vectors (L, H) are transformed to the tangent space using a matrix M (shown below) whose columns comprise the eye space (i.e, x, y and z) coordinates of the tangent, binormal and normal (T, B, N) (SGI16):   M  =      "LeftBracketingBar"                                        T            x                                                B            x                                                N            x                                                            T            y                                                B            y                                                N            y                                                            T            z                                                B            z                                                N            z                                "RightBracketingBar"  
Thus, the vectors LTS and HTS are computed as follows:
LTS=Lxc2x7M
HTS=Hxc2x7M
The resulting tangent space versions LTS and HTS of the light and half angle vectors are output to the Phong lighting and bump mapping step (SGI14) along with the input normal N and the texture map TMAP. In the Phong lighting and bump mapping step (SGI14) the graphics hardware performs all lighting computations in tangent space using the tangent space vectors previously described. In particular, if bump mapping is required the SGI system employs the perturbed vector Nxe2x80x2TS (represented by the texture map TMAP components) in the lighting computations. Otherwise, the SGI system employs the input surface normal N in the lighting computations. Among other things, the step SGI14 involves:
1. interpolating the Nxe2x80x2TS, LTS, HTS and NTS vectors for each pixel for which illumination is calculated;
2. normalizing the interpolated vectors;
3. performing the illumination computations.
A disadvantage of the SGI approach is that it requires a large amount of unnecessary information to be computed (e.g., for vertices associated with pixels that are not visible in the final graphics image). This information includes:
Nxe2x80x2TS for each vertex of each surface;
LTS for each vertex of each surface;
HTS for each vertex of each surface.
The SGI approach requires extension to the OpenGL specification. In particular, extensions are required to support the novel texture map representation. These extensions are defined in: SGI OpenGL extension: SGIX_fragment_lighting_space, which is incorporated herein by reference.
FIG. 5B shows a hypothetical hardware implementation of the SGI bump mapping/Phong shading approach that is proposed in the Peercy reference. In this system note that the surface normal N and transformed light and Half-angle vectors LTS, HTS are interpolated at the input of the block SGI14. The LTS and HTS interpolations could be done multiple times, once for each of the active lights. The switch S is used to select the perturbed normal Nxe2x80x2TS when bump mapping is in effect or the unperturbed surface normal N when bump mapping is not in effect. The resulting normal and interpolated light and half-angle vectors are then normalized and the normalized resulting normalized vectors are input to the illumination computation, which outputs a corresponding pixel value.
Problems with SGI bump mapping include:
1. The cost of transforming the L and H vectors to tangent space, which increases with the number of lights in the lighting computation;
2. It is only suited for use in 3D graphics pipelines where most graphics processing (e.g., lighting and bump mapping) is performed fragment by fragment; in other embodiments, where fragments are processed in parallel, the amount of data that would need to be stored to allow the bump mapping computations to be performed would be prohibitive;
3. Interpolating in the lighting hardware, which is a time consuming operation that also requires all vertex information to be available (this is not possible in a deferred shading environment); and
4. Interpolating whole vectors (e.g., LTS, HTS) results in approximation errors that result in visual artifacts in the final image.
2.7.2.6 xe2x80x9cBlinnxe2x80x9d Bump Mapping
Referring to FIG. 6A, there is shown a functional flow diagram illustrating the Blinn bump mapping approach. The functional blocks include: generate gradients B10, xe2x80x9ccompute perturbed normalxe2x80x9d B12 and xe2x80x9cperform lighting computationsxe2x80x9d B14. In the typical embodiment of this approach the step B10 is performed in software and the steps B12 and B14 are performed in dedicated bump mapping hardware. The Blinn approach is described in the Blinn and Peercy references.
The elements employed in the illustrated Blinn approach include the following:
s one coordinate of bump space grid
t one coordinate of bump space grid
N surface normal at each vertex of a fragment to be illuminated;
vs surface tangent along the s axis at each vertex of a fragment to be illuminated;
vt surface tangent along the t axis at each vertex of a fragment to be illuminated;
hs(s,t) partial derivative along the s axis of the bump height field computed at each point of the height field (NOTE: according to the OpenGL standard, an input texture map is a 1, 2 or 3-dimensional array of values h(s,t) that define a height field in (s,t) space. The API converts this height field to a collection of partial derivatives hs(s,t), ht(s,t) that gives the gradient in two directions (s and t) at each point of the height field);
ht(s,t) partial derivative along the t axis of the bump height field computed at each point of the texture map (see discussion of hs(s,t));
L light vector in eye space;
H half angle vector in eye space;
bs basis vector enabling bump gradients hs to be mapped to eye space;
bt basis vector enabling bump gradients ht to be mapped to eye space.
The Blinn approach presumes that a texture to be applied to a surface is initially defined by a height field h(s, t). The Blinn approach does not directly use this height field, but requires that the texture map representing the height field be provided by the API as a set of gradients hs(s, t) and ht(s, t) (SGI10). That is, rather than providing the perturbed normal Nxe2x80x2 (as in the SGI approach), the Blinn texture map provides two scalar values hs, ht that represent offsets/perturbations to the normal. For the offsets to be applied to the normal N two basis vectors bs and bt are needed that define (in eye space) the reference frame in which the offsets are provided. The two possible sources of these vectors are:
1) Provision of the vectors by the user.
2) Automatic generation by the graphics hardware by forming partial derivatives of the per-vertex texture coordinates with respect to eye space. The justification for this definition can be found in the Watt reference.
In step (B12) the Blinn bump mapping approach perturbs the Normal vector N according to the following equation:             N      xe2x80x2        ϖ    =            N      ϖ        +                  h        s            ·                        b          s                ϖ              +                  h        t            ·                        b          t                ϖ            
xe2x80x83where hs and ht are the height gradients read from texture memory and       b    s    ϖ
and       b    t    ϖ
are the basis vectors. See the Watt reference for a derivation of this equation, including derivation of the basis vectors bs and bt. Computation of the perturbed normal includes:
1. interpolation of elements (xe2x88x92Vtxc3x97N, xe2x88x92Nxc3x97Vs, Vsxc3x97Vt) used to compute the perturbed normal Nxe2x80x2;
2. computation of the perturbed normal Nxe2x80x2 using the interpolated elements.
Once the perturbed normal Nxe2x80x2 has been computed the graphics hardware performs the lighting computations (B14). Functions performed in the step B14 include:
1. interpolation of the L and H vectors;
2. normalization of the perturbed normal Nxe2x80x2 and the L and H vectors; and
3. lighting computations.
FIG. 6B shows a hypothetical hardware implementation of the Blinn bump mapping approach that is proposed in the Peercy reference. In this system note that the multiple vector cross-products that must be computed and the required number of interpolations and normalizations. The extra operations are required in the Blinn approach to derive the basis vectors at each pixel (i.e., for each illumination calculation). Moreover, the three interpolation operations applied to the cross-products (Btxc3x97N), (Nxc3x97Bs), (Nsxc3x97Bt) are required to be wide floating point operations (i.e., 32 bit operations) due to the possible large range of the cross-product values.
Computer graphics is the art and science of generating pictures or images with a computer. This picture generation is commonly referred to as rendering. The appearance of motion, for example in a 3-Dimensional animation is achieved by displaying a sequence of images. Interactive 3-Dimensional (3D) computer graphics allows a user to change his or her viewpoint or to change the geometry in real-time, thereby requiring the rendering system to create new images on-the-fly in real-time. Therefore, real-time performance in color, with high quality imagery is becoming increasingly important.
The invention is directed to a new graphics processor and method and encompasses numerous substructures including specialized subsystems, subprocessors, devices, architectures, and corresponding procedures. Embodiments of the invention may include one or more of deferred shading, a tiled frame buffer, and multiple-stage hidden surface removal processing, as well as other structures and/or procedures. In this document, this graphics processor is hereinafter referred to as the DSGP (for Deferred Shading Graphics Processor), or the DSGP pipeline, but is sometimes referred to as the pipeline.
This present invention includes numerous embodiments of the DSGP pipeline. Embodiments of the present invention are designed to provide high-performance 3D graphics with Phong shading, subpixel anti-aliasing, and texture- and bump-mapping in hardware. The DSGP pipeline provides these sophisticated features without sacrificing performance.
The DSGP pipeline can be connected to a computer via a variety of possible interfaces, including but not limited to for example, an Advanced Graphics Port (AGP) and/or a PCI bus interface, amongst the possible interface choices. VGA and video output are generally also included. Embodiments of the invention supports both OpenGL and Direct3D APIs. The OpenGL specification, entitled xe2x80x9cThe OpenGL Graphics System: A Specification (Version 1.2)xe2x80x9d by Mark Segal and Kurt Akeley, edited by Jon Leech, is included incorporated by reference.
An exemplary embodiment, or version, of a Deferred Shading Graphics Pipeline is now described. Several more exemplary embodiments of a Deferred Shading Graphics Pipeline are described in U.S. Provisional Patent Application Serial No. 60/097,336, filed Aug. 20, 1998, entitled xe2x80x9cGraphics Processor with Deferred Shading,xe2x80x9d which is incorporated herein by reference.
Following description of this embodiment, a description is provided of an exemplary embodiment of a Phong shading system and method that can be employed in any of the Deferred Shading Graphics Pipelines. The described Phong shading system and method is compatible with the conventional approaches to bump mapping that are supported by 3D graphics standards, such as the OpenGL specification.
Several versions or embodiments of the Deferred Shading Graphics Pipeline are described here, and embodiments having various combinations of features may be implemented. Furthermore, features of the invention may be implemented independently of other features. Most of the important features described above can be applied to all versions of the DSGP pipeline.
3.1.1 Tiles, Stamps, Samples, and Fragments
Each frame (also called a scene or user frame) of 3D graphics primitives is rendered into a 3D window on the display screen. A window consists of a rectangular grid of pixels, and the window is divided into tiles (hereinafter tiles are assumed to be 16xc3x9716 pixels, but could be any size). If tiles are not used, then the window is considered to be one tile. Each tile is further divided into stamps (hereinafter stamps are assumed to be 2xc3x972 pixels, thereby resulting in 64 stamps per tile, but stamps could be any size within a tile). Each pixel includes one or more of samples, where each sample has its own color values and z-value (hereinafter, pixels are assumed to include four samples, but any number could be used). A fragment is the collection of samples covered by a primitive within a particular pixel. The term xe2x80x9cfragmentxe2x80x9d is also used to describe the collection of visible samples within a particular primitive and a particular pixel.
3.1.2 Deferred Shading
In ordinary Z-buffer rendering, the renderer calculates the color value (RGB or RG BA) and z value for each pixel of each primitive, then compares the z value of the new pixel with the current z value in the Z-buffer. If the z value comparison indicates the new pixel is xe2x80x9cin front ofxe2x80x9d the existing pixel in the frame buffer, the new pixel overwrites the old one; otherwise, the new pixel is thrown away.
Z-buffer rendering works well and requires no elaborate hardware. However, it typically results in a great deal of wasted processing effort if the scene contains many hidden surfaces. In complex scenes, the renderer may calculate color values for ten or twenty times as many pixels as are visible in the final picture. This means the computational cost of any per-pixel operationxe2x80x94such as Phong shading or texture-mappingxe2x80x94is multiplied by ten or twenty. The number of surfaces per pixel, averaged over an entire frame, is called the depth complexity of the frame. In conventional z-buffered renderers, the depth complexity is a measure of the renderer""s inefficiency when rendering a particular frame.
In a pipeline that performs deferred shading, hidden surface removal (HSR) is completed before any pixel coloring is done. The objective of a deferred shading pipeline is to generate pixel colors for only those primitives that appear in the final image (i.e., exact HSR). Deferred shading generally requires the primitives to be accumulated before HSR can begin. For a frame with only opaque primitives, the HSR process determines the single visible primitive at each sample within all the pixels. Once the visible primitive is determined for a sample, then the primitive""s color at that sample location is determined. Additional efficiency can be achieved by determining a single per-pixel color for all the samples within the same pixel, rather than computing per-sample colors.
For a frame with at least some alpha blending (as defined in the afore referenced OpenGL specification) of primitives (generally due to transparency), there are some samples that are colored by two or more primitives. This means the HSR process must determine a set of visible primitives per sample.
In some APIs, such as OpenGL, the HSR process can be complicated by other operations (that is by operation other than depth test) that can discard primitives. These other operations include: pixel ownership test, scissor test, alpha test, color test, and stencil test (as described elsewhere in this specification). Some of these operations discard a primitive based on its color (such as alpha test), which is not determined in a deferred shading pipeline until after the HSR process (this is because alpha values are often generated by the texturing process, included in pixel fragment coloring). For example, a primitive that would normally obscure a more distant primitive (generally at a greater z-value) can be discarded by alpha test, thereby causing it to not obscure the more distant primitive. An HSR process that does not take alpha test into account could mistakenly discard the more distant primitive. Hence, there may be an inconsistency between deferred shading and alpha test (similarly, with color test and stencil test); that is, pixel coloring is postponed until after hidden surface removal, but hidden surface removal can depend on pixel colors. Simple solutions to this problem include: 1) eliminating non-depth-dependent tests from the API, such as alpha test, color test, and stencil test, but this potential solution might prevent existing programs from executing properly on the deferred shading pipeline; and 2) having the HSR process do some color generation, only when needed, but this potential solution would complicate the data flow considerably. Therefore, neither of these choices is attractive. A third alternative, called conservative hidden surface removal (CHSR), is one of the important innovations provided by the inventive structure and method. CHSR is described in great detail in subsequent sections of the specification.
Another complication in many APIs is their ability to change the depth test. The standard way of thinking about 3D rendering assumes visible objects are closer than obscured objects (i.e., at lesser z-values), and this is accomplished by selecting a xe2x80x9cless-thanxe2x80x9d depth test (i.e., an object is visible if its z-value is xe2x80x9cless-thanxe2x80x9d other geometry). However, most APIs support other depth tests such as: greater-than, less-than, greater-than-or-equal-to, equal, less-than-or-equal-to, less-than, not-equal, and the like algebraic, magnitude, and logical relationships. This essentially xe2x80x9cchanges the rulesxe2x80x9d for what is visible. This complication is compounded by an API allowing the application program to change the depth test within a frame. Different geometry may be subject to drastically different rules for visibility. Hence, the time order of primitives with different rendering rules must be taken into account. For example, in the embodiment illustrated in FIG. 4, three primitives are shown with their respective depth test (only the z dimension is shown in the figure, so this may be considered the case for one sample). If they are rendered in the order A, B, then C, primitive B will be the final visible surface. However, if the primitives are rendered in the order C, B, then A, primitive A will be the final visible surface. This illustrates how a deferred shading pipeline must preserve the time ordering of primitives, and correct pipeline state (for example, the depth test) must be associated with each primitive.
3.1.3 Deferred Shading Graphics Pipeline, First Embodiment (Version 1)
A conventional 3D graphics pipeline is illustrated in FIG. 2. We now describe a first embodiment of the inventive 3D Deferred Shading Graphics Pipeline Version 1 (hereinafter xe2x80x9cDSGPv1xe2x80x9d), relative to FIG. 8. It will be observed that the inventive pipeline (FIG. 8) has been obtained from the generic conventional pipeline (FIG. 2) by replacing the drawing intensive functions 231 with: (1) a scene memory 250 for storing the pipeline state and primitive data describing each primitive, called scene memory in the figure; (2) an exact hidden surface removal process 251; (3) a fragment coloring process 252; and (4) a blending process 253.
The scene memory 250 stores the primitive data for a frame, along with their attributes, and also stores the various settings of pipeline state throughout the frame. Primitive data includes vertex coordinates, texture coordinates, vertex colors, vertex normals, and the like In DSGPv1, primitive data also includes the data generated by the setup for incremental render, which includes spatial, color, and edge derivatives.
When all the primitives in a frame have been processed by the floating-point intensive functions 213 and stored into the scene memory 250, then the HSR process commences. The scene memory 250 can be double buffered, thereby allowing the HSR process to perform computations on one frame while the floating-point intensive functions perform computations on the next frame. The scene memory can also be triple buffered. The scene memory could also be a scratchpad for the HSR process, storing intermediate results for the HSR process, allowing the HSR process to start before all primitive have been stored into the scene memory.
In the scene memory, every primitive is associated with the pipeline state information that was valid when the primitive was input to the pipeline. The simplest way to associate the pipeline state with each primitive is to include the entire pipeline state within each primitive. However, this would introduce a very large amount of redundant information because much of the pipeline state does not change between most primitives (especially when the primitives are in the same object). The preferred way to store information in the scene memory is to keep separate lists: one list for pipeline state settings and one list for primitives. Furthermore, the pipeline state information can be split into a multiplicity of sub-lists, and additions to each sub-list occurs only when part of the sub-list changes. The preferred way to store primitives is done by storing a series of vertices, along with the connectivity information to re-create the primitives. This preferred way of storing primitives eliminates redundant vertices that would otherwise occur in polygon meshes and line strips.
The HSR process described relative to DSGPv1 is required to be an exact hidden surface removal (EHSR) because it is the only place in the DSGPv1 where hidden surface removal is done. The exact hidden surface removal (EHSR) process 251 determines precisely which primitives affect the final color of the pixels in the frame buffer. This process accounts for changes in the pipeline state, which introduces various complexities into the process. Most of these complications stem from the per-fragment operations (ownership test, scissor test, alpha test, and the like), as described above. These complications are solved by the innovative conservative hidden surface removal (CHSR) process, described later, so that exact hidden surface removal is not required.
The fragment coloring process generates colors for each sample or group of samples within a pixel. This can include: Gouraud shading, texture mapping, Phong shading, and various other techniques for generating pixel colors. This process is different from edged walk 232 and span interpolation 234 because this process must be able to efficiently generate colors for subsections of primitives. That is, a primitive may be partially visible, and therefore, colors need to be generated for only some of its pixels, and edge walk and span interpolation assume the entire primitive must be colored. Furthermore, the HSR process may generate a multiplicity of visible subsections of a primitive, and these may be interspersed in time amongst visible subsections of other primitives. Hence, the fragment coloring process 252 should be capable of generating color values at random locations within a primitive without needing to do incremental computations along primitive edges or along the x-axis or y-axis.
The blending process 253 of the inventive embodiment combines the fragment colors together to generate a single color per pixel. In contrast to the conventional z-buffered blend process 236, this blending process 253 does not include z-buffer operations because the exact hidden surface removal process 251 as already determined which primitives are visible at each sample. The blending process 253 may keep separate color values for each sample, or sample colors may be blended together to make a single color for the entire pixel. If separate color values are kept per sample and are stored separately into the Frame buffer 240, then final pixel colors are generated from sample colors during the scan out process as data is sent to the digital to analog converter 242.
The pipeline renders primitives, and the invention is described relative to a set of renderable primitives that include: 1) triangles, 2) lines, and 3) points. Polygons with more than three vertices are divided into triangles in the Geometry block, but the DSGP pipeline could be easily modified to render quadrilaterals or polygons with more sides. Therefore, since the pipeline can render any polygon once it is broken up into triangles, the inventive renderer effectively renders any polygon primitive.
To identify what part of a 3D window on the display screen a given primitive may affect, the pipeline divides the 3D window being drawn into a series of smaller regions, called tiles and stamps. The pipeline performs deferred shading, in which pixel colors are not determined until after hidden-surface removal. The use of a Magnitude Comparison Content Addressable Memory (MCCAM) allows the pipeline to perform hidden geometry culling efficiently.
3.2.1 Conservative Deferred Shading
One of the central ideas or inventive concepts provided by the invention pertains to Conservative Hidden Surface Removal (CHSR). The CHSR processes each primitive in time order and, for each sample that a primitive touches, makes conservative decision based on the various API state variables, such at depth test and alpha test. One of the important features of the CHSR process is that color computation does not need to be done during hidden surface removal even though non-depth-dependent tests from the API, such as alpha test, color test, and stencil test can be performed by the DSGP pipeline. The CHSR process can be considered a finite state machine (FSM) per sample. Hereinafter, each per-sample FSM is called a sample finite state machine (SFSM). Each SFSM maintains per-sample data including: (1) z-coordinate information; (2) primitive information (any information needed to generate the primitive""s color at that sample or pixel); and (3) one or more sample state bits (for example, these bits could designate the z-value or z-values to be accurate or conservative). While multiple z-values per sample can be easily used, multiple sets of primitive information per sample would be expensive. Hereinafter, it is assumed that the SFSM maintains primitive information for one primitive. The SFSM may also maintain transparency information, which is used for sorted transparencies, described in the next section.
3.2.2 Two Modes of DSGP Operation
The DSGP can operate in two distinct modes: 1) Time Order Mode, and 2) Sorted Transparency Mode. Time Order Mode is described above, and is designed to preserve, within any particular tile, the same temporal sequence of primitives. The Sorted Transparency mode is described immediately below. In the preferred embodiment, the control of the pipeline operating mode is done in the Sort Block.
The Sort Block is located in the pipeline between a Mode Extraction Unit (MEX) and Setup (STP) unit. Sort Block operates primarily to take geometry scattered around the display window and sort it into tiles. Sort Block also manages the Sort Memory, which stores all the geometry from the entire scene before it is rasterized, along with some mode information. Sort memory comprises a double-buffered list of vertices and modes. One page collects a scene""s geometry (vertex by vertex and mode by mode), while the other page is sending its geometry (primitive by primitive and mode by mode) down the rest of the pipeline.
When a page in sort memory is being written, vertices and modes are written sequentially into the sort memory as they are received by the sort block. When a page is read from sort memory, the read is done on a tile-by-tile basis, and the read process operates in two modes: (1) time order mode, and (2) sorted transparency mode.
3.2.3 Time-Ordered Mode
In time ordered mode, time order of vertices and modes are preserved within each tile, where a tile is a portion of the display window bounded horizontally and vertically. By time order preserved, we mean that for a given tile, vertices and modes are read in the same order as they are written.
3.2.4 Sorted Transparency Mode
In sorted transparency mode, reading of each tile is divided into multiple passes, where, in the first pass, guaranteed opaque geometry is output from the sort block, and in subsequent passes, potentially transparent geometry is output from the sort block. Within each sorted transparency mode pass, the time ordering is preserved, and mode date is inserted in its correct time-order location. Sorted transparency mode by be performed in either back-to-front or front-to-back order. In the preferred embodiment, the sorted transparency method is performed jointly by the Sort Block and the Cull Block.
3.2.5 Multiple-step Hidden Surface Removal
Conventionally hidden surfaces are removed using either an xe2x80x9cexactxe2x80x9d hidden surface removal procedure, or using z-buffers. In one embodiment of the inventive structure and method, a two-step approach is implemented wherein a (i) xe2x80x9cconservativexe2x80x9d hidden surface removal is followed by (ii) a z-buffer based procedure. In a different embodiment, a three-step approach is implemented: (i) a particular spatial Cull procedure, (ii) conservative hidden surface removal, and (iii) z-buffer. Various embodiments of conservative hidden surface removal (CHSR) has already been described elsewhere in this disclosure.
3.2.6 Pipeline State Preservation and Caching
Each vertex includes a color pointer, and as vertices are received, the vertices including the color pointer are stored in sort memory data storage. The color pointer is a pointer to a location in the polygon memory vertex storage that includes a color portion of the vertex data. Associated with all of the vertices, of either a strip or a fan, is an Material-Lighting-Mode (MLM) pointer set. MLM includes six main pointers plus two other pointers as described below. Each of the six main pointers comprises an address to the polygon memory state storage, which is a sequential storage of all of the state that has changed in the pipeline, for example, changes in the texture, the pixel, lighting and so forth, so that as a need arises any time in the future, one can recreate the state needed to render a vertex (or the object formed from one or more vertices) from the MLM pointer associated with the vertex, by looking up the MLM pointers and going back into the polygon memory state storage and finding the state that existed at the time.
The Mode Extraction Block (MEX) is a logic block between Geometry and Sort that collects temporally ordered state change data, stores the state in Polygon memory, and attaches appropriate pointers to the vertex data it passes to Sort Memory. In the normal OpenGL pipeline, and in embodiments of the inventive pipeline up to the Sort block, geometry and state data is processed in the order in which it was sent down the pipeline. State changes for material type, lighting, texture, modes, and stipple affect the primitives that follow them. For example, each new object will be preceded by a state change to set the material parameters for that object.
In the inventive pipeline, on the other hand, fragments are sent down the pipeline in Tile order after the Cull block. The Mode Injection Block figures out how to preserve state in the portion of the pipeline that processes data in spatial (Tile) order instead of time order. In addition to geometry data, Mode Extraction Block sends a subset of the Mode data (cull_mode) down the pipeline for use by Cull. Cull_mode packets are produced in Geometry Block. Mode Extraction Block inserts the appropriate color pointer in the Geometry packets.
Pipeline state is broken down into several categories to minimize storage as follows: (1) Spatial pipeline state includes data headed for Sort that changes every vertex; (2) Cull_mode state includes data headed for Cull (via Sort) that changes infrequently; (3) Color includes data headed for Polygon memory that changes every vertex; (4) Material includes data that changes for each object; (5) TextureA includes a first set of state for the Texture Block for textures 0and1; (6) TextureB includes a second set of state for the Texture Block for textures 2 through 7; (7) Mode includes data that hardly ever changes; (8) Light includes data for Phong; (9) Stipple includes data for polygon stipple patterns. Material, Texture, Mode, Light, and Stipple data are collectively referred to as MLM data (for Material, Light and Mode). We are particularly concerned with the MLM pointers for state preservation.
State change information is accumulated in the MEX until a primitive (Spatial and Color packets) appears. At that time, any MLM data that has changed since the last primitive, is written to Polygon Memory. The Color data, along with the appropriate pointers to MLM data, is also written to Polygon Memory. The spatial data is sent to Sort, along with a pointer into Polygon Memory (the color pointer). Color and MLM data are all stored in Polygon memory. Allocation of space for these records can be optimized in the micro-architecture definition to improve performance.
All of these records are accessed via pointers. Each primitive entry in Sort Memory contains a Color Pointer to the corresponding Color entry in Polygon Memory. The Color Pointer includes a Color Address, Color Offset and Color Type that allows us to construct a point, line, or triangle and locate the MLM pointers. The Color Address points to the final vertex in the primitive. Vertices are stored in order, so the vertices in a primitive are adjacent, except in the case of triangle fans. The Color Offset points back from the Color Address to the first dualoct for this vertex list. (We will refer to a point list, line strip, triangle strip, or triangle fan as a vertex list.) This first dualoct contains pointers to the MLM data for the points, lines, strip, or fan in the vertex list. The subsequent dualocts in the vertex list contain Color data entries. For triangle fans, the three vertices for the triangle are at Color Address, (Color Address-1), and (Color Addressxe2x80x94Color Offset+1). Note that this is not quite the same as the way pointers are stored in Sort memory.
State is a time varying entity, and MEX accumulates changes in state so that state can be recreated for any vertex or set of vertices. The MIJ block is responsible for matching state with vertices down stream. Whenever a vertex comes into MEX and certain indicator bits are set, then a subset of the pipeline state information needs to be saved. Only the states that have changed are stored, not all states, since the complete state can be created from the cumulative changes to state. The six MLM pointers for Material, TextureA, TextureB, Mode, Light, and Stipple identify address locations where the most recent changes to the respective state information is stored. Each change in one of these state is identified by an additional entry at the end of a sequentially ordered state storage list stored in a memory. Effectively, all state changes are stored and when particular state corresponding to a point in time (or receipt of a vertex) is needed, the state is reconstructed from the pointers.
This packet of mode that are saved are referred to as mode packets, although the phrase is used to refer to the mode data changes that are stored, as well as to larger sets of mode data that are retrieved or reconstructed by MIJ prior to rendering.
We particularly note that the entire state can be recreated from the information kept in the relatively small color pointer.
Polygon memory vertex storage stores just the color portion. Polygon memory stores the part of pipeline stat that is not needed for hidden surface removal, and it also stores the part of the vertex data which is not needed for hidden surface removal (predominantly the items needed to make colors.)
3.2.7 Texel Reuse Detection and Tile Based Processing
The inventive structure and method may advantageously make use of trilinear mapping of multiple layers (resolutions) of texture maps.
Texture maps are stored in a Texture Memory which may generally comprise a single-buffered memory loaded from the host computer""s memory using the AGP interface. In the exemplary embodiment, a single polygon can use up to four textures. Textures are MIP-mapped. That is, each texture comprises a series of texture maps at different levels of detail or resolution, each map representing the appearance of the texture at a given distance from the eye point. To produce a texture value for a given pixel fragment, the Texture block performs tri-linear interpolation from the texture maps, to approximate the correct level of detail. The Texture block can alternatively performs other interpolation methods, such as anisotropic interpolation.
The Texture block supplies interpolated texture values (generally as RGBA color values) to the Phong block on a per-fragment basis. Bump maps represent a special kind of texture map. Instead of a color, each texel of a bump map contains a height field gradient.
The multiple layers are MIP layers, and interpolation is within and between the MIP layers. The first interpolation ii within each layer, then you interpolate between the two adjacent layers, one nominally having resolution greater than required and the other layer having less resolution than required, so that it is done 3-dimensionally to generate an optimum resolution.
The inventive pipeline includes a texture memory which includes a texture cache really a textured reuse register because the structure and operation are different from conventional caches. The host also includes storage for texture, which may typically be very large, but in order to render a texture, it must be loaded into the texture cache which is also referred to as texture memory. Associated with each VSP are S and T""s. In order to perform trilinear MIP mapping, we necessarily blend eight (8) samples, so the inventive structure provides a set of eight content addressable (memory) caches running in parallel. n one embodiment, the cache identifier is one of the content addressable tags, and that""s the reason the tag part of the cache and the data part of the cache is located are located separate from the tag or index. Conventionally, the tag and data are co-located so that a query on the tag gives the data. In the inventive structure and method, the tags and data are split up and indices are sent down the pipeline.
The data and tags are stored in different blocks and the content addressable lookup is a lookup or query of an address, and even the xe2x80x9cdataxe2x80x9d stored at that address in itself and index that references the actual data which is stored in a different block. The indices are determined, and sent down the pipeline so that the data referenced by the index can be determined. In other words, the tag is in one location, the texture data is in a second location, and the indices provide a link between the two storage structures.
In one embodiment of the invention Texel Reuse Detection Registers (TRDR) comprise a multiplicity of associate memories, generally located on the same integrated circuit as the texel interpolator. In the preferred embodiment, the texel reuse detection method is performed in the Texture Block.
In conventional 3-D graphics pipelines, an object in some orientation in space is rendered. The object has a texture map on it, and its represented by many triangle primitives. The procedure implemented in software, will instruct the hardware to load the particular object texture into a DRAM. Then all of the triangles that are common to the particular object and therefore have the same texture map are fed into the unit and texture interpolation is performed to generate all of the colored pixels need to represent that particular object. When that object has been colored, the texture map in DRAM can be destroyed since the object has been rendered. If there are more than one object that have the same texture map, such as a plurality of identical objects (possibly at different orientations or locations), then all of that type of object may desirably be textured before the texture map in DRAM is discarded. Different geometry may be fed in, but the same texture map could be used for all, thereby eliminating any need to repeatedly retrieve the texture map from host memory and place it temporarily in one or more pipeline structures.
In more sophisticated conventional schemes, more than one texture map may be retrieved and stored in the memory, for example two or several maps may be stored depending on the available memory, the size of the texture maps, the need to store or retain multiple texture maps, and the sophistication of the management scheme. Each of these conventional texture mapping schemes, spatial object coherence is of primary importance. At least for an entire single object, and typically for groups of objects using the same texture map, all of the triangles making up the object are processed together. The phrase spatial coherency is applied to such a scheme because the triangles form the object and are connected in space, and therefore spatially coherent.
In the inventive deferred shader structure and method we do not necessarily rely on or derive appreciable benefit from this type of spatial object coherence. Embodiments of the inventive deferred shader operate on tiles instead. Any given tile might have an entire object, a plurality of objects, some entire objects, or portions of several objects, so that spatial object coherence over the entire tile is typically absent.
Well we break that conventional concept completely because the inventive structure and method are directed to a deferred shader. Even if a tile should happen to have an entire object there will typically be different background, and the inventive Cull Block and Cull procedure will typically generate and send VSPs in a completely jumbled and spatially incoherent order, even if the tile might support some degree of spatial coherency. As a result, the pipeline and texture block are advantageously capable of changing the texture map on the fly in real-time and in response to the texture required for the object primitive (e.g. triangle) received. Any requirement to repeatedly retrieve the texture map from the host to process the particular object primitive (for example, single triangle) just received and then dispose of that texture when the next different object primitive needing a different texture map would be problematic to say the least and would preclude fast operation.
In the inventive structure and method, a sizable memory is supported on the card. In one implementation 128 megabytes are provided, but more or fewer megabytes may be provided. For example, 34 Mb, 64 Mb, 256 Mb, 512 Mb, or more may be provided, depending upon the needs of the user, the real estate available on the card for memory, and the density of memory available.
Rather than reading the 8 textels for every visible fragment, using them, and throwing them away so that the 8 textels for the next fragment can be retrieved and stored, the inventive structure and method stores and reuses them when there is a reasonable chance they will be needed again.
It would be impractical to read and throw away the eight textels every time a visible fragment is received. Rather, it is desirable to make reuse of these textels, because if you""re marching along in tile space, your pixel grid within the tile (typically processed along sequential rows in the rectangular tile pixel grid) could come such that while the same texture map is not needed for sequential pixels, the same texture map might be needed for several pixels clustered in a n area of the tile, and hence needed only a few process steps after the first use. Desirably, the invention uses the textels that have been read over and over, so when we need one, we read it, and we know that chances are good that once we have seem one fragment requiring a particular texture map, chances are good that for some period of time afterward while we are in the same tile, we will encounter another fragment from the same object that will need the same texture. So we save those things in this cache, and then on the fly we look up from the cache (texture reuse register) which ones we need. If there is a cache miss, for example, when a fragment and texture map are encountered for the first time, that texture map is retrieved and stored in the cache.
3.2.8 Fragment Coloring
Fragment coloring is performed for two-dimensional display space and involves an interpolation of the color from for example the three vertices of a triangle primitive, to the sampled sub-sample of the displayed pixel. Essentially, fragment coloring involves applying an interpolation function to the colors at the three fragment vertices to determine a color for a location spatially located between or among the three vertices. Typically, but optionally, some account will be taken of the perspective correctness in performing the interpolation.
3.2.9 Interpolation of Normals
Various compromises have conventionally been accepted relative to the computation of surface normals, particularly a surface normal that is interpolated between or among other surface normals, in the 3D graphics environment. The compromises have typically traded-off accuracy for computational ease or efficiency. Ideally, surface normals should be interpolated angularly, that is based on the actual angular differences in the angles of the surface normals on which the interpolation is based. In fact such angular computation is not well suited to 3D graphics applications.
Therefore, more typically, surface normals are interpolated based on linear interpolation of the input normals . For low to moderate quality rendering, linear interpolation of the composite surface normals may provide adequate accuracy; however, considering a two-dimensional interpolation example, when one vector (surface normal) has for example a larger magnitude that the other vector, but comparable angular change to the first vector, the resultant vector will be overly influenced by the larger magnitude vector in spite of the comparable angular difference between the two vectors. This may result in objectionable error, for example, some surface shading or lighting calculation may provide an anomalous result and detract from the output scene.
In the inventive structure and method the magnitude is interpolated separately from the direction or angle. The interpolated magnitude are computed then the direction vectors which are equal size. The separately interpreted magnitudes and directions are then recombined, and the direction is normalized.
While the ideal angular interpretation would provide the greatest accuracy, however, the interpolation involves three points on the surface of a sphere and various great-circle calculations. This sort of mathematical complexity is not well suited for real-time fast pipeline processing. The single step linear interpolation is much easier but is susceptible to greater error. In comparison to each of these procedures, the inventive surface normal interpolation procedure has greater accuracy than conventional linear interpolation, and lower computational complexity that conventional angular interpolation.
3.2.10 Variable Scale Bump Maps
Generating variable scale bump maps involves one or both of two separate procedures: automatic basis generation and automatic gradient field generation. Consider a gray scale image and its derivative in intensity space. Automatic gradient field takes a derivative, relative to gray scale intensity, of a gray scale image, and uses that derivative as a surface normal perturbation to generate a bump for a bump map. Automatic basis generation saves computation, memory storage in polygon memory, and input bandwidth in the process.
For each triangle vertex, an s,t and surface normal are specified. But the s and t aren""t color, rather they are two-dimensional surface normal perturbations to the texture map, and therefore a texture bump map. The s and t are used to specify the directions in which to perturb the surface normals in order to create a usable bump map. The s,t give us an implied coordinate system and reference from which we can specify perturbation direction. Use of the s,t coordinate system at each pixel eliminates any need to specify the surface tangent and the bi-normal at the pixel location. As a result, the inventive structure and method save computation, memory storage and input bandwidth.
3.2.11 Performing Tangent Space Lighting in a Deferred Shading Environment
The background describes two exemplary approaches to performing bump mapping in a conventional 3D graphics system. These approaches compute for each vertex of a surface a perturbed surface normal Nxe2x80x2 that accounts for bump effects and then employ in lighting computations the perturbed normal Nxe2x80x2 instead of the input surface normal N.
One of the approaches (the xe2x80x9cSGI approachxe2x80x9d) attempts to reduce the number of bump mapping computations by storing in a texture map precomputed components of the perturbed normals Nxe2x80x2 of the surfaces involved in the lighting computation. The components of the perturbed surface normals Nxe2x80x2 are defined in xe2x80x9ctangent spacexe2x80x9d, which differs from the xe2x80x9ceye spacexe2x80x9d in which many elements of the lighting equation are defined.
To efficiently use this tangent space information the SGI approach performs all lighting computations in tangent space. This allows the perturbed normals Nxe2x80x2 to be used directly from the texture map. However, this also requires that vectors used in the lighting equation (e.g., the light and halfangle vectors L and H) first be transformed from eye space to tangent space. As described in the background, this transformation is done for each vertex using a transformation matrix comprising surface tangent, binormal and normal vectors (T, B, N).
In the conventional manner, the SGI approach performs all graphics processing steps prior to the final pixel output step one primitive (i.e., polygon, triangle, etc.) at a time. One result of this approach is that unnecessary, numerically intensive tangent space transformations and lighting computations are likely to be performed for hidden surfaces whose pixels will be discarded in the z-buffer removal step. Another result of this approach is that in the SGI pipeline there is no need to retain any of the lighting state for primitives other than the one being currently processed.
In contrast, in a deferred shading graphics pipeline (DSGP) implemented in accordance with the present invention, the lighting computations are not performed until after hidden surfaces have been conservatively removed. Implementing the SGI approach to bump mapping in such a DSGP would require the graphics pipeline to retain the lighting state for all visible surfaces. Retaining this lighting state could require significant storage per fragment. For this reason it would not be practical to implement the SGI approach in a deferred shading environment.
Even though it is not practical to employ the Blinn and SGI approaches in a DSGP implemented in accordance with the present invention, many graphics applications that employ bump mapping provide texture/bump maps in the Blinn or SGI formats, or other tangent space formats (e.g., 3D Studio Max). Additionally, these formats are supported in 3D graphics specifications, such as OpenGL. For these reasons, it is desirable for all graphics pipelines, including the DSGP of the present invention, to support the Blinn, SGI and other common texture map formats. Therefore, it is a goal of the present invention to provide systems and methods for use in a DSGP that efficiently perform lighting and bump mapping using conventional lighting and texture map information provided to the DSGP.
In accordance with this goal, the present invention is a system and method for performing tangent space lighting in a DSGP. In particular, the present invention is a system and method for performing bump mapping and lighting computations in eye space using texture information represented in tangent space.
One embodiment encompasses blocks of the DSGP that preprocess data (referred to collectively as the preprocessor hereinafter) and a Phong shader (implemented as hardware or software). The preprocessor receives texture maps specified in a variety of formats and converts those texture maps to a common format for use by the Phong shader. The preprocessor also provides basis vectors (bs, bt, n), a vector Tb that represents in tangent/object space texture/bump data, light data, material data, eye coordinates and other information used by the Phong shader to perform the lighting and bump mapping computations. The data from the preprocessor is provided for each fragment for which lighting effects need to be computed.
The Phong shader computes the RGBA value for the pixels in a fragment using the information provided by the preprocessor. The Phong shader performs all lighting computations in eye space, which requires it first to transform bump data from tangent space to eye space. In one embodiment the Phong hardware does this by multiplying a matrix M whose columns comprise the eye space basis vectors (bs, bt, n) and the vector Tb of bump map data. The eye space basis vectors are defined by the DSGP preprocessor so that the multiplication (Mxc3x97Tb) gives the perturbed normal Nxe2x80x2 in eye space in accordance with the Blinn bump mapping equation:
N/=N+bshs+btht.
The Phong shader uses the resulting perturbed normal Nxe2x80x2 in the lighting equations. One advantage of this approach over the prior art is that it is necessary to transform only a single vector (the perturbed normal) to eye space whereas, in the SGI approach, it is necessary to transform both the light and half angle vectors (L, H) to tangent space for multiple lights.
In one embodiment, the preprocessor provides the basis vectors (bs, bt, n) as a set of unit vectors ({circumflex over (b)}s, {circumflex over (b)}t, {circumflex over (n)}) and their associated magnitudes (mbs, mbt, mn), which allows the Blinn bump equation to be rewritten as follows:
N/={circumflex over (n)}mn+{circumflex over (b)}smbshs+{circumflex over (b)}tmbtht.
Providing the basis information in this way allows the Phong hardware to compute the perturbed normal Nxe2x80x2 in eye space using the following matrix computation:             N      xe2x80x2        =                  "LeftBracketingBar"                                            b              ^                        s                    ⁢                                    b              ^                        t                    ⁢                      n            ^                          "RightBracketingBar"            ⁢              "LeftBracketingBar"                                                                              m                  bs                                ⁢                                  h                  s                                                                                                                          m                  bt                                ⁢                                  h                  t                                                                                                        m                n                                                    "RightBracketingBar"              ,
where |{circumflex over (b)}s {circumflex over (b)}t {circumflex over (n)}|=M/ is expanded as:
      "LeftBracketingBar"                                                                                                                                  b                      ^                                        xs                                    ⁢                                                            b                      ^                                        xt                                    ⁢                                                            n                      ^                                        x                                                                                                                                                                  b                      ^                                        ys                                    ⁢                                                            b                      ^                                        yt                                    ⁢                                                            n                      ^                                        y                                                                                                                                                      b                ^                            zs                        ⁢                                          b                ^                            zt                        ⁢                                          n                ^                            z                                            "RightBracketingBar"    .
In one embodiment the Phong shader performs the bump mapping and lighting computations using floating point hardware.
In another embodiment the Phong shader is optimized to store each component of the matrix Mxe2x80x2 as a fixed point value. In yet another embodiment, the components of the updated bump vector Tbxe2x80x2=(mbshs, mbtht, mn) are scaled by scale factors (ss, st, sn) selected to allow the each component of the resulting bump vector Tbxe2x80x3=(mbshsss, mbthtst, mnsn) to be stored as a vector of fixed-point values. This enables the Phong shader to be configured to perform all or a substantial portion of the matrix multiplication Mxc3x97Tb using fixed point hardware, which reduces hardware complexity.
A significant advantage of the present invention is that the Phong shader does not need to interpolate any vectors (e.g., the tangent space perturbed normal Nxe2x80x2, light L or half angle H vectors). Instead, the preprocessor performs whatever vertex interpolations are necessary and provides the interpolated vectors to the Phong shader referenced to the (s, t) bump grid along with a fragment located at the same grid position. This greatly reduces the complexity of the bump operations, which, as a result can be integrated with the Phong shader whether implemented in hardware or software.
Note that the preprocessor performs vector interpolation by separating each vector into a unit vector and an associated magnitude, interpolating the unit vectors and magnitudes separately, and combining the interpolated unit vector and magnitude. This procedure is more accurate and produces fewer artifacts than when non-normalized vectors are directly interpolated, as in the prior art. For example, one artifact that results from normalizing non-unit vectors is an approximation error directly related to the magnitudes of the vectors being interpolated.
In one embodiment of the DSGP the preprocessor passes the Phong shader at least one packet of texture information (a texel) for each fragment to be illuminated. Among other things, a texel provides the bump mapping data to be used for fragment. In one embodiment, the information content of a texel used to provide bump mapping data depends on the format of the texture information provided to the DSGP. For example, when the texture information is provided in the SGI format the texel vector Tb provides the components nxe2x80x2x, nxe2x80x2y, nxe2x80x2z of the perturbed surface normal. When the input is provided in the Blinn format, the texel vector Tb provides the surface gradients hs, ht of the unperturbed surface normal.
Accordingly, in one embodiment, when the texel provides SGI-type data the Phong hardware determines the perturbed normal in eye space by multiplying the matrix M by a vector Tb that comprises the three texel components (nxe2x80x2x, nxe2x80x2y, nxe2x80x2z). When the texel provides Blinn-type data the Phong hardware determines the perturbed normal in eye space by multiplying the matrix M by a vector Tb that comprises the two texel components hs, ht and a third component that is 1. The third component that is 1 accounts for the fact that the Blinn approach applies the height gradients (hs, ht) to the unperturbed surface normal.
In one embodiment the preprocessor passes the Phong hardware the following fragement information for each fragment being illuminated:
tangent space components nx, ny, nz of the surface normal unit vector;
magntude mn of the surface normal;
surface tangent unit vector bs along the s tangent space axis;
surface tangent unit vector bt along the t tangent space axis;
surface tangent bs magnitude;
surface tangent bt magnitude;
eye coordinates x, y, z.
In one embodiment, the preprocessor computes the basis vectors in a manner that is consistent with the content of the lighting information input to the DSGP. That is, when the lighting information is in the SGI format the preprocessor defines the basis vectors as: bs=xe2x88x92vs, bt=xe2x88x92vt; or, defined as unit vectors and associated magnitudes: {circumflex over (b)}s=xe2x88x92{circumflex over (v)}s, mbs=mvs and {circumflex over (b)}t=xe2x88x92{circumflex over (v)}t, mbt=mvt. When the lighting texture information is in the Blinn format the preprocessor defines the basis vectors as: bs={circumflex over (n)}xc3x97vt, bt=vsxc3x97{circumflex over (n)} or, defined as unit vectors and associated magnitudes: bs={circumflex over (n)}xc3x97{circumflex over (v)}t, mbs=mvt and bt={circumflex over (v)}sxc3x97{circumflex over (n)}, mbt=mvs.
Defining the basis vectors in this manner enables the same Phong shader to perform bump computations in a variety of formats, including, at a minimum, Blinn and SGI formats.