The animation picture industry was founded on the realization that animation or apparent movement could be achieved by successively presenting to the human eye still images at high frequency, each representing a small, incremental movement with respect to its predecessor. Providing the frequency at which the still images are presented to the human eye exceeds the so-called "frequency of fusion", the eye is unable to detect that it is, in reality, seeing only discrete images and the eye-brain combination labors under the illusion that it is seeing a continuous, moving picture.
Modern graphics systems which exploit this fact abound and what all such systems have in common is the capture of successive frames of digital information which are then displayed on a suitable display screen at sufficiently high frequency. When it is desired to display such images at high resolution, then, of course, the display monitor itself must possess a large number of pixels and this in turn demands that each displayed frame of digital image data requires a large volume of data. In practice, this means not only that large memories are required for storing the digital image data but, more importantly, that very fast processing is required in order to be able to process each frame of image data presented to the display monitor at a rate no less than the frequency of fusion, i.e. about 30 Hz.
A graphics model is generally constructed from static objects representing a fixed background scene and one or more dynamic objects which move within the fixed background scene. In practice, a moving picture is derived by generating a large number of frames of pixel data each representing slight incremental movements between one frame and its successor. The visible pixels in each frame represent the instantaneous view of an object as seen by an observer. This instantaneous view is subject to change between successive frames owing to the movement of the dynamic objects within the static scene and changing perspective of the viewer.
Visibility calculation is one of the most important tasks in computer graphics. Given a geometric model of a scene and a viewpoint, the goal of visibility calculation (also known as hidden surface removal) is to find which parts of the model are visible from the viewpoint. The performance of the visibility calculation stage can largely affect that of the entire rendering process because if an element of the geometric model is found to be invisible then various other time-consuming calculations (such as shading, for example) do not have to be performed for this element.
It should be noted, however, that whilst visibility calculation is normally a precursor to displaying graphic images, the display of such images is independent of the visibility calculation and not always performed. Moreover, the enhanced display of graphic images may not be the only consideration which leads to the desire to speed up visibility calculation of graphic images. For example, a graphic model may be stored in a computer which is remote from a user and dynamic changes to the graphic model may be performed by an operator located at the remote computer. Such changes must be reflected in the model stored locally at the user's site by sending update information to the user's computer so that his version of the graphic model can be modified. In such case, it is desirable to minimize as far as possible the volume of update data which needs to be communicated to the user's computer, since the communication channel (typically a network) is usually the major bottleneck in computer systems responsible for degradation of system performance. It may also be the case that the updated graphic model at the user's site is not itself displayed but forms the basis for further computation and processing, for example, as part of a simulation machine. In other words, in all cases the speed of rendering graphic images is paramount: the display of the graphic image may be optional.
In the following discussion of prior art approaches to improving the speed of visibility algorithms, reference will be made to the following publications:
1. P. S. Heckbert and M. Garland, "Multiresolution modeling for fast rendering", in Proceedings of Graphics Interface '94 (Banff, Alberta), (May 1994). PA0 2. T. A. Funkhouser, "RING: A client-server system for multi-server environments", in Proceedings of the 1995 Symposium on Interactive 3D Graphics, (Monterey, Calif.), pp. 85-92, ACM SIGGRAPH, (April 1995). PA0 3. N. Greene, M. Kass and G. Miller "Hierarchical Z-buffer visibility", in SIGGRAPH '93 Conference Proceedings, (Anaheim, Calif.) pp. 231-238, (August 1993). ACM Computer Graphics, 27(4). PA0 4. B. F. Naylor, "Partitioning tree image representation and generation from 3D geometric models", in Proceedings of Graphics Interface '92, (Vancouver), pp. 201-212, (May 1992). PA0 5. N. Greene and M. Kass, "Error-bounded antialiased rendering of complex environments", in SIGGRAPH '94 Conference Proccedings, (Orlando, Fla.), pp. 59-66, (July 1994). ACM Computer Graphics, 28(4). PA0 6. O. Sudarsky and C. Gotsman, "Output-sensitive visibility algorithms for dynamic scenes with applications to virtual reality", in Computer Graphics Forum, September 1996 (Proceedings of Eurographics '96: Aug. 26, 1996). PA0 7. R. A. Earnshaw, N. Chilton and I. J. Palmer, "Visualization and virtual reality on the Internet", in Proceedings of the Visualization Conference, (Jerusalem, Israel), (November 1992). PA0 8. B. F. Naylor, J. Amanatides and W. C. Thibault, "Merging BSP trees yields polyhedral set operations", in SIGGRAPH '90 Conference Proceedings, (Dallas, Tex.), pp. 115-124, (August 1990). ACM Computer Graphics, 24(4).
The visibility calculation's runtime can become a problem with big, complex models featuring large numbers of graphic primitives. Consider, for example, a detailed model of a big building. Although, it might include millions of polygons, only a small fraction of them will be visible from any single viewpoint. In such scenes, it would be preferable if the visibility calculation algorithm's runtime were linearly proportional just to the number of visible primitives, rather than the total number of primitives in the model.
A visibility algorithm is called output-sensitive if its runtime per frame (excluding any initialization) is linearly proportional to n+f(N) where N is the number of primitives in the entire model, n is the number of visible primitives and f(N) is significantly smaller than N. f(N) is the (inevitable) overhead imposed by the algorithm. An output-sensitive visibility calculation algorithm is also called occlusion culling or visibility culling algorithm. However, most visibility algorithms are not output-sensitive. For example, the well-known Z-buffer visibility algorithm is not output-sensitive, because it examines each polygon in the model. Even if every polygon is handled very quickly (e.g. in hardware), the runtime is still proportional to the total number of polygons.
In a recent survey on real-time 3D rendering.sup.1, Heckbert and Garland claimed that output-sensitive visibility algorithms are essential in future generation graphics systems. Since the Z-buffer visibility algorithm is the most well-known and popular visibility algorithm but suffers from the drawback that it is not output-sensitive, much recent research has been expended to extending the Z-buffer visibility algorithm so as to make it output-sensitive.
Such research on output-sensitive visibility calculation has only begun recently, and has yet to reach commercial systems. For example, both SGI's IRIS Performer high-performance graphics package and IBM's 3DIX architectural and mechanical model visualizer incorporate view frustum culling and multiple-resolution representations (level-of-detail switching) to speed up rendering, but neither employs visibility culling.
If significant parts of the model are dynamic, then its complexity becomes even more of a problem. The known output-sensitive visibility algorithms become ineffective in such cases. Furthermore, in addition to the time it takes to render the model's visible parts, considerable time is spent just keeping it up-to-date. An example of a large model with numerous dynamic objects is an environment which multiple users roam simultaneously, such as Funkhouser's RING system.sup.2 and Worlds Inc.'s AlphaWorld. With existing visibility algorithms, the model in each user's workstation must reflect the other users' current whereabouts. In a distributed environment, it might take much time to update this model, and even more time to transmit the other users' movements over communication lines.
The use of hierarchical data structures to subdivide object space would appear to be an intrinsic property of all output-sensitive visibility algorithms: a hierarchical spatial data structure is needed to quickly cull large, occluded regions of space, without explicitly considering every object within those regions. Such an approach is employed in Greene et al's hierarchical Z-buffer algorithm.sup.3 and in Naylor's BSP tree projection method.sup.4. However, the spatial data structure does not have to be a hierarchy in the strict sense of the word. For example, it may be a Directed Acyclic Graph, and sibling nodes do not have to represent disjoint regions of space.
The hierarchical Z-buffer algorithm is based on the ordinary Z-buffer, but uses two hierarchical data structures: an octree and a Z-pyramid. The lowest level of the pyramid is a plain Z-buffer; in all other levels, there is a pixel for every 2.times.2 square of pixels in the next lower level, with a value equal to the greatest (farthest) z among these four pixels.
At the algorithm's initialization stage, an octree is constructed for the entire model. This operation is very time-consuming, and takes much longer than just calculating visibility from a single viewpoint; however, assuming the model is static, the same octree can be used to calculate visibility from many different viewpoints.
To calculate visibility from a viewpoint, the Z-pyramid is first initialized to infinity at all pixels in all levels. Then, recursively from the octree's root, each encountered octree node is checked for occlusion by the current contents of the Z-pyramid. If a node is totally hidden, it can be ignored; otherwise, the primitives directly associated with the node are rendered, the Z-pyramid is updated accordingly, and the eight child nodes are traversed recursively, from near to far. Because of this front-to-back order, there is a good chance that farther nodes will be discovered to be occluded by primitives in nearer ones, thus saving the handling of all the subtrees associated with the farther nodes.
The pyramid is used for fast visibility checking of nodes and primitives: find the lowest pyramid level where a single pixel still covers the entire projection of the primitive or node. If the z value registered at that pixel is closer than the closest z the projection, then the entire primitive or node is invisible. Otherwise, the projection is divided into four, and checked against each of the four corresponding pixels in the next lower level.
A more recent version of the hierarchical Z-buffer algorithm proposed by Greene and Kass.sup.5 uses an image-space quadtree instead of a Z-pyramid. This worsens performance to some extent, but enables effective antialiasing.
Naylor's projection algorithm performs output-sensitive visibility calculation using the same principle as the hierarchical Z-buffer algorithm: elimination of large parts of the model at an early stage of the calculation, using a data structure constructed at preprocessing time. However, Naylor uses more sophisticated data structures: BSP (binary space partitioning) trees.
A BSP tree.sup.4 can be defined in any number of dimensions. It is a binary tree, in which each node represents some hyperplane; the left subtree of the node corresponds to the negative half-space of the hyperplane, while the right subtree corresponds to the positive half-space. For example, FIG. 1a shows the 2D case, wherein each node represents a line, and each subtree represents a region in the plane. FIG. 1b shows the hierarchical relationship of the nodes and corresponding regions in the hyperplane starting from the root node A. Each region is denoted numerically to distinguish from the nodes themselves which are denoted alphabetically. There may also be stored additional data characterizing each region, such as color data.
In the 3D case, a BSP tree is a proper generalization of an octree: the planes dividing each node do not have to be in the middle of the node, and are not necessarily axis-parallel. In fact, if the model consists entirely of planar, polygonal faces, then the BSP tree is general enough to represent the scene itself accurately, without need for any additional data structure; a boolean "in/out" attribute is simply maintained with each leaf node. This is in contrast to an octree, which usually serves only as an auxiliary data structure in computer graphics, and not as a representation of the model itself.
Naylor.sup.4 suggests using 2D BSP trees to represent images, and scan-converting them into raster images only as a last stage, for actual display. He presents an algorithm to project a 3D BSP tree, representing a scene model, into a 2D BSP tree representing its image. This algorithm traverses the input BSP tree recursively, from near to far, discarding all regions of space occluded by model faces. Output sensitivity is achieved for the same reason it is attained in the hierarchical Z-buffer algorithm: wholesale elimination of large, hidden parts of space, without specifically examining each object in these parts. Contrary to the hierarchical Z-buffer algorithm, Naylor's projection algorithm needs no further data structures beyond those representing the model and the image. Again, the construction of the hierarchical spatial data structure (in this case, the 3D BSP tree) is very time-consuming; but it is only constructed once, as a preprocessing stage, and subsequently used for visibility calculation from many different viewpoints.
Both output-sensitive visibility algorithms--hierarchical Z-buffer and BSP tree projection--were originally developed for static scenes. While Greene et al. suggest a certain optimization for animation sequences (yielding about.times.2 speedup after rather significant overhead), these sequences are restricted to "walk-throughs", where the whole model is static and only the viewpoint may change between frames. For visibility culling algorithms to produce correct results, an up-to-date spatial data structure of the model has to be used. If any objects in the model move or deform then the underlying data structure may become incorrect, and must be updated. It is not acceptable to construct it again from scratch, because, as mentioned above, this is a very expensive operation--usually more expensive than rendering a single frame by the plain Z-buffer algorithm.
It would therefore be desirable to provide an improved method for displaying graphic models which adapts visibility culling algorithms to dynamic scenes, and also utilizes them to minimize the update overhead to those parts of the model that may be potentially visible to the user.