This invention relates to three-dimensional computer graphics computer systems, and more particularly to a system for performing conservative hidden surface removal in a graphics processor with deferred shading.
Computer graphics is the art and science of generating pictures with a computer. This picture or image generation process is commonly called rendering. Generally, in three-dimensional (3D) computer graphics, geometry that represents surfaces (or volumes) of objects in a scene is translated into pixels stored in a frame buffer, and then displayed on a display device. Real-time display devices, such as CRTs used as computer monitors, refresh the display by continuously displaying the image over and over. This refresh usually occurs row-by-row, where each row is called a raster line or scan line. In this document, raster lines are numbered from bottom to top, but are displayed in order from top to bottom.
In a 3D animation, a sequence of images is displayed, giving the illusion of motion in three-dimensional space. Interactive 3D computer graphics allows a user to change his viewpoint or change the geometry in real-time, thereby requiring the rendering system to create new images on-the-fly in real-time.
In 3D computer graphics, each renderable object generally has its own local object coordinate system, and therefore needs to be translated (or transformed) from object coordinates to pixel display coordinates. Conceptually, this is a 4-step process: 1) translation (including scaling for size enlargement or shrink) from object coordinates to world coordinates, which is the coordinate system for the entire scene; 2) translation from world coordinates to eye coordinates, based on the viewing point of the scene; 3) translation from eye coordinates to perspective translated eye coordinates, where perspective scaling (farther objects appear smaller) has been performed; and 4) translation from perspective translated eye coordinates to pixel coordinates, also called screen coordinates. Screen coordinates are points in three-dimensional space, and can be in either screen-precision (i.e., pixels) or object-precision (high precision numbers, usually floating-point), as described later. These translation steps can be compressed into one or two steps by pre-computing appropriate translation matrices before any translation occurs. Once the geometry is in screen coordinates, it is broken into a set of pixel color values (that is xe2x80x9crasterizedxe2x80x9d) that are stored into the framer buffer. Many techniques are used for generating pixel color values, including Gouraud shading, Phong shading, and texture mapping.
A summary of the prior art rendering process can be found in: xe2x80x9cFundamentals of Three-dimensional Computer Graphicsxe2x80x9d, by Watt, Chapter 5: The Rendering Process, pages 97 to 113, published by Addison-Wesley Publishing Company, Reading, Mass., 1989, reprinted 1991, ISBN 0-201-15442-0 (hereinafter referred to as the Watt Reference).
FIG. 1 shows a three-dimensional object, a tetrahedron, with its own coordinate axes (Xobj,Yobj,Zobj). The three-dimensional object is translated, scaled, and placed in the viewing point""s coordinate system based on (Xeye,Yeye,Zeye). The object is projected onto the viewing plane, thereby correcting for perspective. At this point, the object appears to have become two-dimensional; however, the object""s z coordinates are preserved so they can be used later by hidden surface removal techniques. The object is finally translated to screen coordinates, based on (Xscreen,Yscreen,Zscreen), where zscreen is going perpendicularly into the page. Points on the object now have their x and y coordinates described by pixel location (and fractions thereof) within the display screen and their z coordinates in a scaled version of distance from the viewing point.
Because many different portions of geometry can affect the same pixel, the geometry representing the surfaces closest to the scene viewing point must be determined. Thus, for each pixel, the visible surfaces within the volume subtended by the pixel""s area determine the pixel color value, while hidden surfaces are prevented from affecting the pixel. Non-opaque surfaces closer to the viewing point than the closest opaque surface (or surfaces, if an edge of geometry crosses the pixel area) affect the pixel color value, while all other non-opaque surfaces are discarded. In this document, the term xe2x80x9coccludedxe2x80x9d is used to describe geometry which is hidden by other non-opaque geometry.
Many techniques have been developed to perform visible surface determination, and a survey of these techniques are incorporated herein by reference to: xe2x80x9cComputer Graphics: Principles and Practicexe2x80x9d, by Foley, van Dam, Feiner, and Hughes, Chapter 15: Visible-Surface Determination, pages 649 to 720, 2nd edition published by Addison-Wesley Publishing Company, Reading, Mass., 1990, reprinted with corrections 1991, ISBN0-201-12110-7 (hereinafter referred to as the Foley Reference). In the Foley Reference, on page 650, the terms xe2x80x9cimage-precisionxe2x80x9d and xe2x80x9cobject-precisionxe2x80x9d are defined: xe2x80x9cImage-precision algorithms are typically performed at the resolution of the display device, and determine the visibility at each pixel. Object-precision algorithms are performed at the precision with which each object is defined, and determine the visibility of each object.xe2x80x9d
As a rendering process proceeds, most prior art renderers must compute the color value of a given screen pixel multiple times because multiple surfaces intersect the volume subtended by the pixel. The average number of times a pixel needs to be rendered, for a particular scene, is called the depth complexity of the scene. Simple scenes have a depth complexity near unity, while complex scenes can have a depth complexity of in or twenty. As scene models become more and more complicated, renderers will be required to process scenes of ever increasing depth complexity. Thus, for most renders, the depth complexity of a scene is a measure of the wasted processing. For example, for a scene with a depth complexity of ten, 90% of the computation is wasted on hidden pixels. This wasted computation is typical of hardware renderers that use the simple Z buffer technique (discussed later herein), generally chosen because it is easily built in hardware. Methods more complicated than the Z Buffer technique have heretofore generally been too complex to build in a cost-effective manner. An important feature of the method and apparatus invention presented here is the avoidance of this wasted computation by eliminating hidden portions of geometry before they are rasterized, while still being simple enough to build in cost-effective hardware.
When a point on a surface (frequently a polygon vertex) is translated to screen coordinates, the point has three coordinates: 1) the x coordinate in pixel units (generally including a fraction); 2) the y coordinate in pixel units (generally including a fraction); and 3) the z coordinate of the point in either eye coordinates, d from the virtual screen, or some other coordinate system which preserves the relative distance of surfaces from the viewing point. In this document, positive z coordinate values are used for the xe2x80x9clook directionxe2x80x9d from the viewing point, and smaller values indicate a position closer to the viewing point.
When a surface is approximated by a set of planar polygons, the vertices of each polygon are translated to screen coordinates. For points in or on the polygon (other than the vertices), the screen coordinates are interpolated from the coordinates of vertices, typically by the processes of edge walking and. span interpolation. Thus, a z coordinate value is generally included in each pixel value (along with the color value) as geometry is rendered.
Many hardware renderers have been developed, and an example is incorporated herein by reference: xe2x80x9cLeo: A System for Cost Effective 3D Shaded Graphicsxe2x80x9d, by Deering and Nelson, pages 101 to 108 of SIGGRAPH93 Proceedings, Aug. 1-6, 1993, Computer Graphics Proceedings, Annual Conference Series, published by ACM SIGGRAPH, New York, 1993, Softcover ISBN 0-201-58889-7 and CD-ROM ISBN 0-201-56997-3 (hereinafter referred to as the Deering Reference). The Deering Reference includes a diagram of a generic 3D graphics pipeline (i.e., a renderer, or a rendering system) that it describes as xe2x80x9ctruly generic, as at the top level nearly every commercial 3D graphics accelerator fits this abstractionxe2x80x9d, and this pipeline diagram is reproduced here as FIG. 2. Such pipeline diagrams convey the process of rendering, but do not describe any particular hardware. This document presents a new graphics pipeline that shares some of the steps of the generic 3D graphics pipeline. Each of the steps in the generic 3D graphics pipeline will: be briefly explained here. Processing of polygons is assumed throughout this document, but other methods for describing 3D geometry could be substituted. For simplicity of explanation, triangles are used as the type of polygon in the described methods.
As seen in FIG. 2, the first step within the floating-point intensive functions of the generic 3D graphics pipeline after the data input (step 212) is the transformation step (step 214), which was described above. The second step, the clip test, checks the polygon to see if it is at least partially contained in the view volume (sometimes shaped as a frustum) (step 216). If the polygon is not in the view volume, it is discarded; otherwise processing continues. The third step is face determination, where polygons facing away from the viewing point are discarded (step 218). Generally, face determination is applied only to objects that are closed volumes. The fourth step, lighting computation, generally includes the set up for Gouraud shading and/or texture mapping with multiple light sources of various types, but could also be set up for Phong shading or one of many other choices (step 222). The fifth step, chipping, deletes any portion of the polygon that is outside of the view volume because that portion would not project within the rectangular area of toe viewing plane (step 224). Generally, polygon clipping is done by splitting the polygon into two smaller polygons that both project within the area of the viewing plane. Polygon clipping is computationally expensive. The sixth step, perspective divide, does perspective correction for the projection of objects onto the viewing plane,(step 226). At this point, the points representing vertices of polygons are converted to pixel space coordinates by step seven, the screen space conversion step (step 228). The eighth step (step 230), set up for incremental render, computes the various begin, end, and increment values needed for edge walking and span interpolation (e.g.: x, y, and z coordinates; RGB color; texture map space u and v coordinates; and the like).
Within the drawing intensive functions, edge walking (step 232) incrementally generates horizontal spans for each raster line of the display device by incrementing values from the previously generated span (in the same polygon), thereby xe2x80x9cwalkingxe2x80x9d vertically along opposite edges of the polygon. Similarly, span interpolation (step 234) xe2x80x9cwalksxe2x80x9d horizontally along a span to generate pixel values, including a z coordinate value indicating the pixel""s distance from the viewing point. Finally, the z buffered blending also referred to as Testing and Blending (step 236) generates a final pixel color value. The pixel values also include color values, which can be generated by simple Gouraud shading (i.e., interpolation of vertex color values) or by more computationally expensive techniques such as texture mapping (possibly using multiple texture maps blended together),; Phong shading (i.e., per-fragment lighting), and/or bump mapping (perturbing the interpolated surface normal). After drawing intensive functions are completed, a double-buffered MUX output look-up table operation is performed (step 238). In this figure the blocks with rounded corners typically represent functions or process operations, while sharp cornered rectangles typically represent stored data or memory.
By comparing the generated z coordinate value to the corresponding value stored in the Z Buffer, the z buffered blend either keeps the new pixel values (if it is closer to the viewing point than previously stored value for that pixel location) by writing it into the frame buffer, or discards the new pixel values (if it is farther). At this step, antialiasing methods can blend the new pixel color with the old pixel color. The z buffered blend generally includes most of the per-fragment operations, described below.
The generic 3D graphics pipeline includes a double buffered frame buffer, so a double buffered MUX is also included. An output lookup table is included for translating color map values. Finally, digital to analog conversion makes an analog signal for input to the display device.
A major drawback to the generic 3D graphics pipeline is its drawing intensive functions are not deterministic at the pixel level given a fixed number of polygons. That is, given a fixed number of polygons, more pixel-level computation is required as the average polygon size increases. However, the floating-point intensive functions are proportional to the number of polygons, and independent of the average polygon size. Therefore, it is difficult to balance the amount of computational power between the floating-point intensive functions and the drawing intensive functions because this balance depends on the average polygon size.
Prior art Z Buffers are based on conventional Random Access Memory (RAM or DRAM), Video RAM (VRAM), or special purpose DRAMs. One example of a special purpose DRAM is presented in xe2x80x9cFBRAM: A new Form of Memory Optimized for 3D Graphicsxe2x80x9d, by Deering, Schlapp, and Lavelle, pages 167 to 174 of SIGGRAPH94 Proceedings, Jul. 24-29, 1994, Computer Graphics Proceedings, Annual Conference Series, published by ACM SIGGRAPH, New York, 1994, Softcover ISBN 0201607956.
OpenGL(copyright) is a software interface to graphics hardware which consists of several hundred functions and procedures that allow a programmer to specify objects and operations to produce graphical images. The objects and operations include appropriate characteristics to produce color images of three-dimensional objects. Most of OpenGL(copyright) (Version 1.2) assumes or requires a that the graphics hardware include a frame buffer even though the object may be a point, line, polygon, or bitmap, and the operation may be an operation on that object. The general features of OpenGL(copyright) (just one example of a graphical interface) are described in the reference xe2x80x9cThe OpenGL(copyright) Graphics System: A Specification (Version 1.2) edited by Mark Segal and Kurt Akeley, Version Mar. 12, 1998; and hereby incorporated by reference. Although, reference is made to OpenGL(copyright), the invention is not limited to structures, procedures, or methods which are compatible or consistent with OpenGL(copyright), or with any other standard or non-standard graphical interface, Desirably, the inventive structure and method may be implemented in a manner that is consistent with the OpenGL(copyright), or other standard graphical interface, so that a data set prepared for one of the standard interfaces may be processed by the inventive structure and method without modification. However, the inventive structure and method provides some features not provided by OpenGL(copyright), and even when such generic input/output is provided, the implementation is provided in a different manner.
The phrase pipeline statexe2x80x9d does not have a single definition in the prior-art. The OpenGL(copyright) specification, for example, sets forth the type and amount of the graphics rendering machine or pipeline state in terms of items of state and the number of bits and bytes required to store that state information. In the OpenGL(copyright) definition, pipeline state tends to include object vertex pertinent information including for example, the vertices themselves the vertex normals, and color as well as xe2x80x9cnon-vertexxe2x80x9d information.
When information is sent into a graphics renderer, at least some object geometry information is provided to describe the scene. Typically, the object or objects are specified in terms of vertex information, where an object is modeled, defined, or otherwise specified by points, lines, or polygons (object primitives) made up of one or more vertices. In simple terms, a vertex is a location in space and may be specified for example by a three-space (x,y,z) coordinate relative to some reference origin., Associated with each vertex is other information, such as a surface normal, color, texture, transparency, and the like information pertaining to the characteristics of the vertex. This information is essentially xe2x80x9cper-vertexxe2x80x9d information.
Unfortunately, forcing a one-to-one relationship between incoming information and vertices as a requirement for per-vertex information is unnecessarily restrictive. For example, a color value may be specified in the data stream for a particular vertex and then not respecified in the data stream until the color changes for a subsequent vertex. The color value may still be characterized as per-vertex data even though a color value is not explicitly included in the incoming data stream for each vertex.
Texture mapping presents an interesting example of information or data which could be considered as either per-vertex information or pipeline state information. For each object, one or more texture maps may be specified, each texture map being identified in some manner, such as with a texture coordinate or coordinates. One may consider the texture map to which one is pointing with the texture coordinate as part of the pipeline state while others might argue that it is per-vertex information.
Other information, not related on a one-to-one basis to the geometry object primitives, used by the renderer such as lighting location and intensity, material settings, reflective properties, and other overall rules on which the renderer is operating may more accurately be referred to as pipeline state. One may consider that everything that does not or may not change on a per-vertex basis is pipeline state, but for the reasons described, this is not an entirely unambiguous definition. For example, one may define a particular depth test (See later description) to be applied to certain objects to be rendered, for example the depth test may require that the z value be strictly xe2x80x9cgreater-thanxe2x80x9d for some objects and xe2x80x9cgreater-than-or-equal-toxe2x80x9d for other objects. These particular depth tests which change from time to time, may be considered to be pipeline state at that time.
Parameters considered to be renderer (pipeline) state in OpenGL(copyright) are identified in Section 6.2 of the aforementioned OpenGL(copyright) Specification (Version 1.2, at pages 193-217).
Essentially then, there are two types of data or information used by the renderer: (1) primitive data which may be thought of as per-vertex data, and (ii) pipeline state data (or simply pipeline state) which is everything else,. This distinction should be thought of as a guideline rather than as a specific rule, as there are ways of implementing a graphics renderer treating certain information items as either pipeline state or non-pipeline state.
In the generic 3D graphics pipeline, the xe2x80x9cz buffered blendxe2x80x9d step actually incorporates many smaller xe2x80x9cper-fragmentxe2x80x9d operational steps.
Application Program Interfaces (APIs), such as OpenGL(copyright) (Open Graphics Library) and D3D, define a set of per-fragment operations (See Chapter 4 of Version 1.2 OpenGL(copyright) Specification). Some exemplary OpenGL(copyright) per-fragment operations are briefly reviewed so that any generic similarities and differences between the inventive structure and method and conventional structures and procedures can be more readily appreciated.
Under OpenGL(copyright), a frame buffer stores a set of pixels as a two-dimensional array. Each picture-element or pixel stored in the frame buffer is simply a set of some number of bits. The number of bits per pixel may vary depending on the particular GL implementation or context.
Corresponding bits from each pixel in the frame buffer are grouped together into a bitplane; each bitplane containing a single bit from each pixel. The bitplanes are grouped into several logical buffers referred to as the color, depth, stencil, and accumulation buffers. The color buffer in turn includes what is referred to under OpenGL(copyright) as the front left buffer, the front right buffer, the back left buffer, the back right buffer, and some additional auxiliary buffers. The values stored in the front buffers are the values typically displayed on a display monitor while the contents of the back buffers and auxiliary buffers are invisible and not displayed. Stereoscopic contexts display both the front left and the front right buffers, while monoscopic contexts display only the front left buffer. In general, the color buffers must have the same number of bitplanes, but particular implementations of context may not provide right buffers, back buffers, or auxiliary buffers at all, and an implementation or context may additionally provide or not provide stencil, depth, or accumulation buffers.
Under OpenGL(copyright), the color buffers consist of either unsigned integer color indices or R, G, B, and, optionally, a number xe2x80x9cAxe2x80x9d of unsigned integer values; and the number of bitplanes in each of the color buffers, the depth buffer (if provided), the stencil buffer (if provided), and the accumulation buffer (if provided), is fixed and window dependent. If an accumulation buffer is provided, it should have at least as many bit planes per R, G, and B color component as do the color buffers.
A fragment produced by rasterization with window coordinates of Xw, Yw) modifies the pixel in the frame buffer at that location based on a number of tests, parameters, and conditions. Noteworthy among the several tests that are typically performed sequentially beginning with a fragment and its associated data and finishing with, the final output stream to the frame buffer are in the order performed (and with some variation among APIs): 1) pixel ownership test; 2) scissor test; 3) alpha test; 4) Color Test; 5) stencil test; 6) depth test; 7) blending; 8) dithering; and 9) logicop. Note that the OpenGL(copyright) does not provide for an explicit xe2x80x9ccolor testxe2x80x9d between the alpha test and stencil test. Per-Fragment operations under OpenGL(copyright) are applied after all the color computations. Each of these tests or operations is briefly described below.
2.3.1 Ownership Test
Under OpenGL(copyright), the pixel ownership test determines if the pixel at location (xw, yw) in the frame buffer is currently owned by the GL context. If it is not, the window system decides the fate of the incoming fragment. Possible results are that the fragment is discarded or that some subset of the subsequent per-fragment operations are applied to the fragment. This pixel ownership test allows the window system to properly control the GL""s behavior.
Assumes that in a computer having a display screen, one or several processes are running and that each process has a window on the display screen. For each process, the associated window defines the pixels the process wants to write or render to. When there are two or more windows, the window associated with one process may be in front of the window associated with another process, behind that window, or both windows may be entirely visible. Since there is only a single frame buffer for the entire display screen or desktop, the pixel ownership test involves determining which process and associated window owns each of the pixels. If a particular process does not xe2x80x9cownxe2x80x9d a pixel, it fails the pixel ownership test relative to the frame buffer and that pixel is thrown away. Note that under the typical paradigm, the pixel ownership test is run by each process, and that for a give pixel location in the frame buffer, that pixel may pass the pixel ownership test for one of the processes, and fail the pixel ownership test for the other process. Furthermore, in general, a particular pixel can pass the ownership test for only one process because only one process can own a particular frame buffer pixel at the same time.
In some rendering schemes the pixel ownership test may not be particularly relevant. For example, if the scene is being rendered to an off-screen buffer, and subsequently Block Transferred or xe2x80x9cblittedxe2x80x9d to the desktop, pixel ownership is not really even relevant. Each process automatically or necessarily passes the pixel ownership test (if it is executed) because each process effectively owns its own off-screen buffer and nothing is in front of that buffer.
If for a particular process, the pixel is not owned by that process, then there is no need to write a pixel value to that location, and all subsequent processing for that pixel may be ignored. In a typical workstation, all the data associated with a particular pixel on the screen is read during rasterization. All information for any polygon that feeds that pixel is read, including information as to the identity of the process that owns that frame buffer pixel, as well as the Z buffer, the color value, the old color value, the alpha value, stencil bits, and so forth. If a process owns the pixel, then the other downstream process are executed (for example, scissor test, alpha test, and the like) On the other hand, if the process does not own the pixel and fails the ownership test for that pixel, the process need not consider that pixel further and that pixel is skipped for subsequent tests.
2.3.2 Scissor Test
Under OpenGL(copyright), the scissor test determines if (Xw, Yw) lies within a scissor rectangle defined by four coordinate values; corresponding to a left bottom (left, bottom) coordinate, a width of the rectangle, and a height of the rectangle. The values are set with the procedure xe2x80x9cvoid Scissor(int left, int bottom, sizei width, sizei height)xe2x80x9d under OpenGL(copyright). If leftxe2x89xa6Xw less than left+width and bottomxe2x89xa6Yw less than bottom+height, then the scissor test passes; otherwise the scissor test fails and the particular fragment being tested is discarded. Various initial states are provided and error conditions monitored and reported.
In simple terms, a rectangle defines a window which may be an on-screen or off-screen window. The window is defined by an x-left, x-right, y-top, and y-bottom coordinate (even though it may be expressed in terms of a point and height and width dimensions from that point). This scissor window is useful in that only pixels from a polygon fragment that fall in that screen aligned scissor window will change. In the event that a polygon straddles the scissor window, only those pixels that are inside the scissor window may change.
When a polygon in an OpenGL(copyright) machine comes down the pipeline, the pipeline calculates everything it needs to in order to determine the z value and color of that pixel. Once z value and color are determined, that information is used to determine what information should be placed in the frame buffer (thereby determining what is displayed on the display screen).
Just as with the pixel ownership test, the scissor test provides means for discarding pixels and/or fragments before they actually get to the frame buffer to cause the output to change.
2.3.3 Alpha Test
Color is! defined by four values, red (R), green (G), blue (B), and alpha (A). The RGB values define the contribution from each of the primary colors, and alpha is related to the transparency. Typically, color is a 32-bit value, 8-bits for each component, though such representation is not limited to 32-bits. Alpha test compares the alpha value of a given pixel to an alpha reference value. The type of comparison may also be specified, so that for example the comparison may be a greater-than operation, a less-than operation, and so forth. If the comparison is a greater-than operation, then the pixel""s alpha value has to be greater than the reference to pass the alpha test. Any pixel not passing the alpha test is thrown away or discarded. The OpenGL(copyright) Specification describes the manner in which alpha test is implemented in OpenGL(copyright).
Alpha test is a per-fragment operation and after all of the fragment coloring calculations and lighting and shading operations are completed. Each of these per-fragment operations may be thought of as part of the conventional z buffer blending operations.
2.3.4 Color Test
Color test is similar to the alpha test described hereinbefore, except that rather than performing the magnitude or logical comparisons between the pixel alpha (A) value and a reference value, the color test performs a magnitude or logical comparison between one or a combination of the R, G, or B color components and reference value(s). The comparison test may be for example, greater-than, less-than, equal-to, greater than-or-equal-to, xe2x80x9cgreater-than-c1 and less-than c2 where c1 and c2 are sore predetermined reference values, and so forth. One might for example, specify a reference minimum R value, and a reference maximum R value, such that the color test would be passed only if the pixel R value is between that minimum and maximum. Color test might, for example, be useful to provide blue-screen functionality. The comparison test may also be performed on a single color component or on a combination of color components. Furthermore, although for the alpha test one typically has one value for each component, for the color test there are effectively two values per component, a maximum value and a minimum value.
2.3.5 Stencil test
Under OpenGL(copyright), stencil test conditionally discards a fragment based on the outcome of a comparison between a value stored in a stencil buffer at location (Xw, yw) and a reference value. Several stencil comparison functions are permitted such that whether the stencil test passes can depend upon whether the reference value is less than, less than or equal to, equal to, greater than or equal to, greater than, or not equal to the masked stored value in the stencil buffer. The Under OpenGL(copyright), if the stencil test fails, the incoming fragment is discarded. The reference value and the comparison value can have multiple bits, typically 8 bits so that 256 different values may be represented. When an object is rendered into the frame buffer, a tag having the stencil bits is also written into the frame buffer. These stencil bits are part of the pipeline state. The type of stencil test to perform can be specified at the time the geometry is rendered.
The stencil bits are used to implement various filtering, masking or stenciling operations. For example, if a particular fragment ends up affecting a particular pixel in the frame buffer, then the stencil bits can be written to the frame buffer along with the pixel information.
2.3.6 Depth Buffer Test
Under OpenGL(copyright), the depth buffer test discards the incoming fragment if a depth comparison fails. The comparison is enabled or disabled with the generic Enable and Disable commands using the OpenGL(copyright) symbolic constant DEPTH_TEST. When depth test is disabled, the depth comparison and subsequent possible updates to the depth buffer value are bypassed and a fragment is passed to the next operation. The stencil bits are also involved and are modified even if the test is bypassed. The stencil value is modified if the depth buffer test passed. If depth test is enabled, the depth comparison takes place and the depth buffer and stencil value may subsequently be modified. The manner in which the depth test is implemented in OpenGL(copyright) is described in greater detail in the OpenGL(copyright) specification at page 145.
Depth comparisons are implemented in which possible outcomes are as follows: the depth buffer test either never passes or always passes, if the incoming fragment""s zw value is less than, less than or equal to, equal to, greater than, greater than or equal to, or not equal to the depth value stored at the location given by the incoming fragment""s (Xw, Yw) coordinates. If the depth buffer test fails, the incoming fragment is discarded. The stencil value at the fragment""s (Xw, Yw) coordinate is updated according to the function currently in effect for depth buffer test failure. Otherwise, the fragment continues to the next operation and the value of the depth buffer at the fragment""s (Xw, Yw) location is set to the fragment""s zw value. In this case the stencil value is updated according to the function currently in effect for depth buffer test success. The necessary OpenGL(copyright) state is an eight-valued integer and a single bit indicating whether depth buffering is enabled or disabled.
2.3.7 Alpha Blending
Under OpenGL(copyright), alpha blending (also referred to as blending) combines the incoming fragments R, G. B, and A values with the R, G. B, and A values stored in the frame buffer at the incoming fragment""s (Xw, Yw) location.
This blending is typically dependent on the incoming fragment""s alpha value (A) and that of the corresponding frame buffer stored pixel. In the following discussion, Cs refers to the source color for an incoming fragment, Cd refers to the destination color at the corresponding frame buffer location, and Cc refers to a constant color in-the GL state. Individual RGBA components of these colors are denoted by subscripts of s, d, and c respectively.
Blending is basically an operation that takes color in the frame buffer and the color in the fragment, and blends them together. The manner in which blending is achieved, that is the particular blending function, may be selected from various alternatives for both the source and destination.
Blending is described in the OpenGL(copyright) specification at page 146-149 and is hereby incorporated by reference. Various blend equations are available under OpenGL(copyright). For example, an additive type blend is available wherein a blend result (C) is obtained by adding the product of a source color (Cs) by a source weighting factor,quadruplet (S) to the product of a destination color (Cd) and a destination weighting factor (D) quadruplet, that is C=CsS+CdD. Alternatively, the blend equation may be a subtraction (C=CsSxe2x88x92CdD), a reverse subtraction (C=CsS+CdD), a minimum function (C=min(Cs, Cd)), a maximum function (C=max(Cs, Cd)),. Under OpenGL(copyright), the blending equation is evaluated separately for each color component and its corresponding weighting coefficient. Each of the four R, G, B, A components has its own weighting factor.
The blending test (or blending equation) is part of pipeline state and can potentially change for every polygon, but more typically would change only for the object made up or several polygons.
In generally, blending is only performed once other tests such as the pixel ownership test and stencil test have been passed so that it is clear that the pixel or fragment under consideration would or could have an effect in the output.
2.3.8 Dithering Under OpenGL(copyright), dithering selects between two color values or indices. In RGBA mode, consider the value of any of, the color components as a fixed-point value with m bits to the left of the binary point, where m is the number of bits allocated to that component in the frame buffer; call each such value c. For each c, dithering selects a value c1 such that c1xcex5{max{0, [c]xe2x88x921, [c]}. This selection may depend on the Xw and Yw coordinates of the pixel. In color index mode, the same rule applies with c being a single color index. The value of c must not be larger than the maximum value representable in the frame buffer for either the component or the index.
Although many dithering algorithms are possible, a dithered value produced by any algorithm must generally depend only the incoming value and the fragment""s x and y window coordinates. When dithering is disabled, each color component is truncated to a fixed-point value with as many bits as there are in the corresponding frame buffer component, and the color index is rounded to the nearest integer representable in the color index portion of the frame buffer.
The OpenGL(copyright) Specification of dithering is described more fully in the OpenGL(copyright) specification, particularly at pages 149-150, which are incorporated by reference.
2.3.9 Logicop
Under OpenGL(copyright), there is a final logical operation applied between the incoming fragment""s color or index values and the color or index values stored in the frame buffer at the corresponding location. The result of the logical operation replaces the values in the frame buffer at the fragment""s (x, y) coordinates. Various logical operations may be implemented between source (s) and destination (d), including for example: clear, set, and, noop, xor, or, nor, nand, invert, copy, inverted and, equivalence, reverse or, reverse and, inverted copy, and inverted or. The logicop arguments and corresponding operations, as well as additional details of the OpenGL(copyright) logicop implementation, are set forth in the OpenGL(copyright) specification at pates 150-151. Logical operations are performed independently for each color index buffer that is selected for writing, or for each red, green, blue, and alpha value of each color buffer that is selected for writing. The required state is an integer indicating the logical operation, and two bits indicating whether the logical operation is enabled or disabled.
In this document, pixels are referred to as the smallest individually controllable element of the display device. But, because images are quantized into discrete pixels, spatial aliasing occurs. A typical aliasing artifact is a staircasexe2x80x9d effect caused when a straight line or edge cuts diagonally across rows of pixels.
Some rendering systems reduce aliasing effects by dividing pixels into sub-pixels, where each sub-pixel can be colored independently. When the image is to be displayed, the colors for all sub-pixels within each pixel are blended together to form an average color for the pixel. A renderer that uses up to 16 sub-pixels per pixel is described in xe2x80x9cRealityEngine Graphicsxe2x80x9d, by Akeley, pages 109 to 116 of SIGGRAPH93 Proceedings, Aug. 1-6, 1993, Computer Graphics Proceedings, Annual Conference Series, published by
ACM SIGGRAPH, New York, 1993, Softcover ISBN 0-201-58889-7 and CD-ROM ISBN 0-201-56997-3 (hereinafter referred to as the Akeley Reference).
Another prior art antialiasing method is the A-Buffer used to perform blending (this technique is also included in the Akeley Reference), and is described in xe2x80x9cThe A-buffer, an Antialiased Hidden Surface Methodxe2x80x9d by L. Carpenter, SIGGRAPH 1984 Conference Proceedings, pp.103-108 (hereinafter referred to as the Carpenter Reference). The A-buffer is an antialiasing technique that reduces aliasing by keeping track of the percent coverage of a pixel by a rendered polygon. The main drawback to this technique is the need to sort polygons front-to-back (or back-to-front) at each pixel in order to get acceptable antialiased polygons.
Most Content Addressable Memories (CAM) perform a bit-for-bit equality test between an input vector and each of the data words stored in the CAM. This type of CAM frequently provides masking of bit positions in order to eliminate the corresponding bit in all words from affecting the equality test. It is inefficient to perform magnitude comparisons in a equality-testing CAM because a large number of clock cycles is required to do the task. CAMs are presently used in translation look-aside buffers within a virtual memory systems in some computers. CAMs are also used to match addresses in high speed computer networks.
Magnitude comparison CAM (MCCAM) is defined here as any content addressible memory where the stored data are treated as numbers, and arithmetic magnitude comparisons (i.e. less-than, greater-than, less-than-or-equal-to, and the like) are performed on the data in parallel. This is in contrast to ordinary CAM which treats stored data strictly as bit vectors, not as numbers. One exemplary magnitude comparison content addressable memory is described in, and incorporated herein by reference, is U.S. Pat. No. 4,996,666, by Jerome F. Duluk Jr., entitled xe2x80x9cContent-Addressable Memory System Capable of Fully Parallel Magnitude Comparisonsxe2x80x9d, granted Feb. 26, 1991 (hereinafter referred to as the Duluk Patent). Structures within the Duluk Patent specifically referenced shall include the prefix xe2x80x9cDuluk Patentxe2x80x9d (for example, xe2x80x9cDuluk Patent MCCAM Bit Circuitxe2x80x9d). Other types of magnitude comparison content addressable memories may also be used. The xe2x80x9cMCCAMxe2x80x9d abbreviation is conveniently used in this description to refer to various types;, structures, and methods for magnitude comparison content addressable memory and is not limited to the particular magnitude comparison content addressable memory described in U.S. Pat. No. 4,996,666.
The basic internal structure of an MCCAM is a set of memory bits organized into words, where each word can perform one or more arithmetic magnitude comparisons between the stored data and input data. In general, for an MCCAM, when a vector of numbers is applied in parallel to an array of words, all arithmetic comparisons in all words occur in parallel. Such a parallel search comparison operation is called a xe2x80x9cqueryxe2x80x9d of the stored data.
The intention described here augments the capability of the MCCAM by adding various features, 38 including the ability to output all the query result bits every clock cycle and to logically xe2x80x9corxe2x80x9d together these output query result bits to form additional outputs.
The inventive apparatus and method provide conservative hidden surface removal (CHSR) in a deferred shading graphics pipeline (DSGP). The pipeline renders primitives, and the invention is described relative to a set of renderable primitives that include: 1) triangles, 2) lines, and 3) points. Polygons with more than three vertices are divided into triangles in the Geometry block (described hereinafter), but the DSGP pipeline could be easily modified to render quadrilaterals or polygons with more sides. Therefore, since the pipeline can render any polygon once it is broken up into triangles, the inventive renderer effectively renders any polygon primitive. The invention advantageously takes into account whether and in what part of the display screen a given primitive may appear or have an effect. To identify what part of a 3D window on the display screen a given primitive may affect, the pipeline divides the 3D window being drawn into a series of smaller regions, called tiles and stamps. The pipeline performs deferred shading, in which pixel colors are not determined until after hidden-surface removal. The use of a Magnitude Comparison Content Addressable Memory (MCCAM) advantageously allows the pipeline to perform hidden geometry culling efficiently.
Implementation of the inventive Conservative Hidden Surface Removal procedure, advantageously maintains compatibility with other standard APIs, such as OpenGL(copyright), including their support of dynamic rule changes for the primitives (e.g. changing the depth test or stencil test during a scene). In embodiments of the inventive deferred shader, the conventional rendering paradigm, wherein non-deferred shaders typically execute a sequence of rules for every geometry item and then check the final rendered result, is broken. The inventive structure and method anticipate or predict what geometry will actually affect the final values in the frame buffer without having to make or generate all the colors for every pixel inside of every piece of geometry. In principle, the spatial position of the geometry is examined, and a determination is made for any particular sample, the one geometry item that affects the final color in the z buffer, and then generates only at color.
In one embodiment, the CHSR processes each primitive in time order and, for each sample that a primitive touche s, CHSR makes conservative decision based on the various Application Program Interface (API) state variables, such as depth test and alpha test. One of the advantageous features of the CHSR process is that color computation does not need to be done during hidden surface removal even though non-depth-dependent tests from the API, such as alpha test, color test, and stencil test can be performed by the DSGP pipeline. The CHSR process can be considered a finite state machine (FSM) per sample.
Hereinafter, each per-sample FSM is called a sample finite state machine. Each sample FSM maintains per-sample data including: (1) z coordinate information; (2) primitive information (any information needed to generate the primitive""s color at that sample or pixel, or a pointer to such information); and (3) one or more sample state bits (for example, these bits could designate the z value or z values to be accurate or conservative). While multiple z values per sample can be easily used, multiple sets of primitive information per sample would be expensive. Hereinafter, it is assumed that the sample FSM maintains primitive information for one primitive. Each sample FSM may also maintain transparency information, which is used for sorted transparencies.
The DSGP can operate in two distinct modes: 1) time order mode, and 2) sorted transparency mode. Time order mode is designed to preserve, within any particular tile, the same temporal sequence of primitives. In time order mode, time order of vertices and modes are preserved within each tile, where a tile is a portion of the display window bounded horizontally and vertically. By time order preserved, we mean that for a given tile, vertices and modes are read in the same order as they are written. In sorted transparency mode; the process of reading geometry from a tile is divided into multiple passes. In the first pass, the opaque geometry(i.e., geometry that can completely hide more distant geometry) is processed, and in subsequent passes, potentially transparent geometry is processed. Within each sorted transparency mode pass, the time ordering is preserved, and mode data is inserted in its correct time-order location. Sorted transparency mode can spatially sort (on a sample-by-sample basis) the geometry into either back-to-front or front-to-back order, thereby providing a mechanism for the visible transparent objects to be blended in spatial order (rather than time order), resulting in a more correct rendering. In a preferred embodiment, the sorted transparency method is performed jointly by the Sort block and the Cull block.
The inventive structure and method may be implemented in various embodiments. In one aspect, the invention provides structure and method for performing hidden surface removal wherein the structure is advantageously implemented as a computer graphics pipeline and wherein the inventive hidden surface removal method includes the following steps or procedures. First, an object primitive (current primitive) is selected from a group of primitives, each primitive comprising a plurality of stamps. Next, stamps in the current primitive are compared to stamps from previously evaluated primitives in the group of primitives, and a first stamp is selected from the current primitive by the stamp selection process as a current stamp (CS), and optionally by the SAM for performance reasons. CS is compared to a second stamp or a CPVS selected from previously evaluated stamps that have not been discarded. The second stamp is discarded when no part of the second stamp would affect a final graphics display image based on the comparison with the CS. If part, but not all, of the second stamp would not affect the final image based on the comparison with the CS, then the part of second stamp that would not affect the final image is deleted from the second stamp. The CS is discarded when no part of the second stamp would affect a final graphics display image based on the comparison with the second stamp. If part, but not all, of the CS would not affect the final image based on the comparison with the second stamp, then the part of CS that would not affect the final image is deleted from the CS. When all stamps in all primitives within a region of the display screen have been evaluated, the stamps that have not been discarded have their pixels, or samples, colored by the part of the pipeline downstream from these first steps in performing hidden surface removal. In one embodiment, the set of non-discarded stamps can be limited to one stamp per sample. In this embodiment, when the second stamp and the CS include the same sample and both can not be discarded, the second stamp is dispatched and the CS is kept in the list of non-discarded stamps. Also for this alternate embodiment, when the visibility of the second stamp and the CS depends on parameters evaluated later in the computer graphics pipeline, the second stamp and the CS are dispatched. As an alternate embodiment, the selection of the first stamp by for example the SAM and the stamp selection process, as a current stamp (CS) is based on a relationship test of depth states of samples in the first stamp with depth states of samples of previously evaluated stamps; and an aspect of the inventive apparatus simultaneously performs the relationship test on a multiplicity of stamps.
In another aspect of the inventive structure and method for performing hidden surface removal, a set of currently, potentially visible stamps (CPVSs) is maintained separately from the set of current depth values (CDVs), wherein the inventive hidden surface removal method includes the following steps or procedures. First, an object primitive (current primitive) is selected from a group of primitives, each primitive comprising a plurality of stamps. Next, a first stamp from the current primitive is selected as a currently stamp (CS). Next, a currently potentially visible stamp (CPVS) is selected from the set of CPVSs such that the CPVS overlaps the CS. For each sample that is overlapped by both the selected CPVS and the CS, the depth value of the CS is compared to the corresponding value in the set of CDVs, and this comparison operation takes into account the pipeline state and updates the CDVs. Samples in the selected CPVS that are determined to be not visible are deleted for the selected CPVS. If all samples in the selected CPVS are deleted, the selected CPVS is deleted from the set of CPVS""s. If any sample in the CS is determined to be visible, the CS is added to the set of the CPVS""s with only its visible samples included. If for any sample both the CS and selected CPVS are visible, then at least those visible samples in the selected CPVS are sent down the pipeline for color computations. If the visibility of a sample included in both the CS and CPVS depend on parameters evaluate later in the computer graphics pipeline, at least those samples are sent down the pipeline for color computations. The invention provides structure and method for processing in parallel all CPVS""s that overlap the CS. Furthermore, the parallel processing is pipelined such that a CS can be processed at the rate of one CS per clock cycle. Also multiple CS""s can be processed in parallel.
In another aspect, the invention provides structure and method for a hidden surface removal system for a deferred shader computer graphics pipeline, wherein the pipeline includes a Magnitude Comparison Content Addressable Memory (MCCAM) Cull unit for identifying a first group of potentially visible samples associated with a; current primitive; a Stamp Selection unit, coupled to the MCCAM cull unit, for identifying, based on the first group and a perimeter of the primitive, a second group of potentially visible samples associated with the primitive; a Z-Cull unit, coupled to the stamp selection unit and the MCCAM cull unit, for identifying visible stamp portions by evaluating a pipeline state, and comparing depth states of the second group with stored depth state values; and a Stamp Portion Memory unit, coupled to the Z-Cull unit, for storing visible stamp portions based on control signals received from the Z-Cull unit, wherein the Stamp Portion Memory unit dispatches stamps having a visibility dependent on parameters evaluated later in the computer graphics pipeline.
In yet another aspect, the invention provides structure and method of rendering a graphics image including the steps of: receiving a plurality of primitives to be rendered; selecting a sample location;
rendering a front most opaque sample at the selected sample location, and defining the z value of the front most opaque sample as Zfar; comparing z values of a first plurality of samples at the selected sample location; defining to be Znear a first sample, at the selected sample location, having a z value which is less than Zfar and which is nearest to Zfar of the first plurality of samples; rendering the first sample; setting Zfar to the value of Znear; comparing z values of a second plurality of samples at the selected sample location;
defining as Znear the z value of a second sample at the selected sample location, having a z value which is less than Zfar and which is nearest to Zfar of the second plurality of samples; and rendering the second sample.