In the field of computer-generated graphics, three-dimensional ("3D") objects in a graphics scene are represented by data structures called object models. These models represent the surface of an object using a surface model such as a polygon mesh, parametric surface or quadric surface. The surface model defines the location of points on an object's surface in 3D space, normally the local coordinate system of the object model. For example, a polygon mesh consists of a collection of geometric primitives including edges, vertices, and polygons. The object model also stores attributes of the surface such as its color, texture and shading. In a polygon mesh, attributes are typically stored at the vertices of the surface polygons.
To generate a two-dimensional image from the graphical objects and their attributes in a graphics scene, the object models are rendered with respect to a viewing specification, which defines the position of a camera (the eye point) and the two-dimensional coordinate space of the image often called "screen coordinates." In the rendering process, a geometry processing stage transforms the object geometry to world coordinates, a coordinate system for the scene, and then to the screen coordinates, a 2D coordinate system representing the location of pixel elements or pixels in an output image. To compute a 2D image, the surface polygons are scan converted into pixels values (e.g., RGB color values), typically by interpolating attributes stored at the polygon's vertices to pixel locations in screen coordinates.
In real time graphics applications, the graphics scene changes over time, usually in response to user input. The appearance and position of the objects change from one frame to the next, and the camera position changes as well. As objects move around the scene, some surfaces become visible while others are occluded. To enhance realism, the output image has to be recomputed several times a second, usually at a rate of 12 Hz or more (rates of 75 Hz or higher are needed to prevent the perception of flicker). Due to this rigorous time constraint, rendering computations must be very fast and efficient.
One important part of the rendering process is referred to as visible surface determination (also referred to as "hidden surface removal"). Visible surface determination is the process of determining which surfaces of the objects in a scene are visible from the perspective of the camera. The graphics rendering pipeline uses a method for visible surface determination to identify which surfaces are visible and should contribute to the pixel values in the output image.
There are a variety of methods for performing visible surface determination. One way is to "paint" the polygons into the frame buffer in order of decreasing distance from the viewpoint (the location of the scene's camera, also referred to as the eye point). See Newell, M. E., R. G. Newell, and T. L. Sancha, "A Solution to the Hidden Surface Problem," Proc. ACM National Conf., 1972, which discloses a method for performing hidden surface removal that we refer to as NNS.
NNS sorts a set of polygons by furthest depth and tests whether the resulting order is actually a visibility ordering. The depth-sorted list of polygons is traversed: if the next polygon does not overlap in depth with the remaining polygons in the list it can be removed and placed in the ordered output. Otherwise, the collection of polygons that overlap in depth must be further examined using a series of occlusion tests of increasing complexity. If the polygon does not occlude any of these overlapping polygons, it can be sent to the output; otherwise, it is marked and reinserted behind the overlapping polygons. When such a marked polygon is again encountered, a cyclic occlusion is indicated and the polygon is split to remove the cycle.
The following steps summarize the test for occluders at the polygon level:
a. Perform z overlap test. PA1 b. Test the x coordinate of screen bounding boxes of a pair of overlapping polygons, P and Q (if disjoint, neither polygon occludes the other). PA1 c. Test the y coordinate of screen bounding boxes (same as above). PA1 d. Test the vertices of P in plane of Q, and if all are behind this plane from the eye, then P cannot occlude Q. PA1 e. Test vertices of Q in plane of P, and if all are in front of this plane (with respect to the eyepoint), then P cannot occlude Q. PA1 f. Do an exact test for screen overlap between P and Q.
While NNS is an effective method for visible surface determination at the polygon level, it has a number of drawbacks. First, it does not perform a visibility sort on objects, e.g., graphical objects comprising many interconnected polygons. In fact, the special sort to resolve visibility between polygons with overlapping z only works for polygons and not aggregate sets of polygons. Second, it does not perform visibility sorting coherently. The term "coherent" in this context refers to using the results of a previous visibility ordering to reduce the overhead of the computation for each re-calculation of visibility ordering. A "coherent" method is incremental in the sense that results from the previous iteration of the visibility sort or occlusion detection routine are used as a starting point for the next iteration. In graphics applications where objects and the camera have motion that is substantially continuous, the last iteration will provide a good starting point for the next iteration because the changes in object and camera positions are likely to be small. This is critical in animated scenes where visibility is typically re-computed for every frame, or at least each time the output image or parts of it are updated. In real time applications where output images are generated at least at a rate of 12 Hz, it is essential that the visibility sort be fast and efficient so that more computational resources (e.g., memory and processor cycles) can be allocated to generating realistic images.
The NNS approach has less coherence because it performs a depth sort each time rather than starting from the visibility sorted list from the previous frame. As such, it does not take advantage of coherence of the previous visibility sort. In addition, the NNS approach is inefficient because it repeats expensive occlusion tests on polygons and has to adjust the ordering of polygons when the depth sort is not identical to a visibility ordering of the polygons.
It is important to note that a depth sort does not necessarily imply that the depth sorted objects or polygons are in visibility order. FIG. 1 shows an example of a case where the minimum depth ordering of objects A and B does not provide an accurate visibility ordering of these objects. While the minimum depth of object B is smaller than the minimum depth of object A with respect to the eye point E (z.sub.A &lt;z.sub.B), A still occludes B.
A visibility ordering on objects identifies the occlusion relationship among the objects. A visibility sort can be "front-to-back", in which case no object in a visibility ordering of the objects is occluded by any objects following it in the ordering. A visibility sort may also be "back-to-front", in which case no object in the ordering occludes any object following it.
A third drawback of the NNS approach is that it resolves cyclic occlusions by splitting geometry. While an analog of polygon splitting exists for aggregate polyhedral objects, it is an expensive operation to split such objects, since the subset of polygons that require splitting must be computed and then each polygon in the subset must be split. A more practical approach is to simply detect the cyclic occlusions so that the aggregate objects forming the cycle can be handled specially. For example, in layered rendering systems, objects forming occlusion cycles can be grouped into a single layer so that the hardware z-buffer is used to resolve occlusions.
There are number of important applications where it is useful to have a visibility ordering of objects rather than individual polygons. A visibility ordering of objects is particularly useful in applications where the object geometry in a scene is factored into separate layers (e.g., factoring foreground and background objects to separate layers). In these applications, factored layers are combined in a function called image compositing to produce output images. One principal advantage of constructing images in layers is that it enables parts of a scene to be rendered at independent update rates with independent quality parameters (e.g., shading complexity, spatial resolution, etc.).
While constructing images in layers has advantages, one difficulty is addressing cases where no visibility ordering exists for objects in a scene. As noted above, one way to address an occlusion cycle at the object level is to render all of the objects in a cycle into a single image layer. This can be inefficient because it tends to increase the overhead required to re-render this image layer. One advantage of the layered graphics rendering pipeline is that the update rate of each image layer can be reduced by exploiting the temporal coherence of object movements. It is difficult to exploit this advantage if objects are combined into a single layer. As more objects are rendered to a single layer, the polygon count for that layer increases and the likelihood that the image layer can be re-used decreases. In layered graphics rendering pipelines, it is generally more efficient to generate images by re-using image layers as much as possible. Thus, it would be advantageous to deal with cyclic occlusions without having to combine the objects in each cycle into a single layer.