Spatial databases in computer graphics
In the field of three-dimensional computer graphics, a three-dimensional scene is most often stored or communicated between devices in the form a spatial database. The database stores a set of three-dimensional objects and their relative positions and orientations. Generally, each three-dimensional object is represented in the database by a set of planar convex polygons which approximate the object's surface. The polygons are most often triangles or quadrilaterals. The database may define objects through their inclusion in a data structure such as a display list used in the PHIGS standard or by sets of calls to software routines that describe objects or polygons, such as those used in the Silicon Graphics GL language. The task of generating a two-dimensional view of the three-dimensional scene is called rendering.
Steps in rendering three-dimensional graphics
A three-dimensional graphics renderer is a device that translates a scene containing a database of three-dimensional objects into a two-dimensional view, for display on a computer screen or other two-dimensional medium. The renderer device can be special purpose hardware, software running on a general purpose CPU, or a hardware-software combination. Hardware renderers improve system performance by replacing slower software, thereby speeding up the process. If the performance of the renderer is sufficient, real time animation on a computer screen is possible.
Rendering is a multi-step process, and one way of listing the steps is as follows:
1) transformation to eye coordinates PA1 2) view volume filtering (optional) PA1 3) perspective projection and translation PA1 4) back-face culling (optional) PA1 5) depth complexity reduction (optional) PA1 6) rasterization and writing to the frame buffer PA1 7) display or store two-dimensional image
Certainly a different set of steps could be made, or other steps could be added, or these steps could be split into finer steps. The present invention focuses on step 5, Depth Complexity Reduction (DCR). DCR is a process which detects and eliminates polygons that are hidden behind other polygons or objects from a scene. If the number of polygons that are hidden is significant, performing DCR can significantly speed up the subsequent rasterization and display steps of the rendering process.
Transformation to Eye Coordinates
Every two-dimensional projection of a three-dimensional scene is generated assuming a particular location for an observer. The two-dimensional scene is then calculated as it would be seen from the observer's frame of reference. The calculation is simplified if a coordinate system is adopted in which the observer is located at the origin, looking down the z axis in the positive z direction. This kind of coordinate system is called an eye coordinate system. Spatial databases are frequently stored in generalized coordinates, which may have as their origin a location other than that of the observer. All objects that may be displayed must have their points expressed in eye coordinates. Eye coordinates can be derived from generalized coordinates through a coordinate transformation. This changes the original coordinates of each object, polygon, line, or point to the coordinate system in which the observer is located at the origin. The surface of each object is constructed out of planar polygons; each vertex of every polygon is represented by a vector that must be transformed. The transformation is performed by multiplying each vector with an appropriate matrix for rotation or translation or scaling.
View Volume Filtering
View volume filtering is the process of eliminating all objects or polygons that are completely outside of the field of view. This reduces computation in subsequent rendering steps. The field of view is defined by a frustum, as illustrated in FIG. 1. Any object or polygon that is at least partly contained in the truncated pyramid (i.e. a frustum) between the front clipping plane and the back clipping plane is retained for possible rendering. This step is listed as optional because subsequent steps could be performed on otherwise discarded polygons; these polygons would not affect the final rendered scene. This process is typically performed on objects that are expressed in eye coordinates.
Perspective projection and translation
Perspective projection is the process of mapping points from a three-dimensional object (in eye coordinates) onto a two-dimensional display screen. The two-dimensional display screen is also called the viewing plane, and it can be anywhere in relation to the front and back clipping planes. In eye coordinates, the observer is at (0, 0, 0) and the display screen is centered at (0, 0, z.sub.s). A point (x.sub.e, y.sub.e, z.sub.e) in three-dimensional eye coordinates maps into the point (x.sub.s, y.sub.s) in two-dimensional screen coordinates as illustrated in FIG. 2. The mapping equations are given by: ##EQU1## Thus, a single division and two multiplications are required for mapping each individual point into screen coordinates.
Perspective translation is similar to perspective projection, but differs in that the z coordinate (as expressed in the eye coordinate system) is preserved. Thus, the point (x.sub.e, y.sub.e, z.sub.e) in three-dimensional eye coordinates maps into the point (x.sub.s, y.sub.s, z.sub.e) in three-dimensional perspective translated coordinates as illustrated in FIG. 3. Perspective translated coordinates are useful because they not only reveal how an object on the viewing plane appears to the observer, but also how distant the object is from the observer in the z axis. The latter piece of information determines which objects or polygons appear "in front" and which appear "behind" from observer's perspective. Therefore the z coordinate is necessary for eliminating hidden objects or polygons or hidden portions of objects or polygons. Perspective translated coordinates are used in subsequent steps, because they reveal both perceived superimposition of polygons in the xy viewing plane and relative distances of the polygons from the observer.
In this document, the prefix "Projected" will denote a geometric entity (such as a point, polygon or object) which is perspective projected into the two-dimensional viewing plane. Geometric entities without the Projected prefix are assumed to be three-dimensional perspective translated. Every three-dimensional perspective translated geometric entity has a corresponding two-dimensional Projection, which is derived from the three-dimensional geometric entity by simply removing the z coordinate from each point and retaining the x and y coordinates. For example, every polygon (in three dimensions) has a corresponding Projected polygon (in two dimensions).
Stereo viewing requires two sets of eye coordinates (for the observer's two separate eyes.) Therefore the perspective projection or translation process must be performed twice, once for each set of coordinates.
Back-face culling
If an object's surface is closed (such as a sphere), then a large fraction of the polygons that make up the object are facing away from the observer. These polygons, as illustrated in FIG. 4, are hidden from view, and need not be rendered. Back-face culling is the process of finding and removing backwards facing polygons. A standard method of identifying backwards facing polygons is illustrated in FIG. 5. It is assumed that all polygons are convex and that, after perspective translation, front facing polygons are drawn in a counter clockwise fashion and back facing polygons are drawn clockwise. This requires objects to be designed such that, for each polygon, all vertices are listed in counter clockwise fashion when the object is rotated so the polygon is oriented toward the observer. When meshes are used for storing data (such as triangle meshes or quadrilateral meshes), special care must be taken when defining the order of vertices for particular polygons within a mesh.
Given three consecutive points of a polygon, two consecutive vector displacements can be generated (from point one to point two and from point two to point three.) The z component of the cross product of these two vectors determines the direction the polygon is facing. A negative z component of the cross product indicates the polygon is front facing and a positive z component indicates the polygon is back facing. A z component of zero indicates the polygon is being viewed edge-on. The cross product is given by: ##EQU2## where i, j, and k are the unit vectors for the x, y, and z axes respectively in the eye coordinate system. The z component of this determinant is: EQU k[(x.sub.2 -x_.sub.1)(y.sub.3 -y.sub.2)-(x.sub.3 -x.sub.2)(y.sub.2 -y.sub.1)]
The polygon is back facing and need not be rendered if EQU [(x.sub.2 -x.sub.1)(y.sub.3 -y.sub.2)-(x.sub.3 -x.sub.2)(y.sub.2 -y.sub.1)]&gt;0 (EXPRESSION 1)
Notice that this operation must be performed after perspective translation in order to avoid errors caused by parallax. Also, this back-face culling step is listed as optional, and may, in practice, be turned off for particular objects during the rendering process. It must be turned off when either the object is not a closed surface, or the object was not properly designed to allow back-face culling.
Depth Complexity Reduction
A screen pixel is a small picture element in the viewing plane that usually corresponds directly to one independently controllable area unit on the final two-dimensional picture or computer screen. A renderer must often recompute the color value of a given screen pixel multiple times, because there may be many polygons that intersect the volume subtended by the screen pixel. The average number of times a screen pixel needs to be rendered, for a particular scene, is called the depth complexity of scene. Simple scenes have a depth complexity near unity, while complex scenes can have a depth complexity of ten or twenty. The real world has a depth complexity very much greater, and as images become more realistic, renderers will be required to process scenes of increasing depth complexity.
In general, renderers must remove hidden lines and surfaces from the rendered scene. This can be done in either the pixel domain (using a z-buffer, as described in the next subsection), the polygon domain, or in the object domain. Hidden surface removal in the polygon or object domain is computationally expensive because objects can be irregular shapes, including concave surfaces and surfaces with holes. Hence, most renderers can only eliminate entire objects if they are behind simple surfaces, such as a wall or a building.
The present invention performs partial hidden surface removal in the polygon domain by providing a unique method for detecting polygons that are completely occulted by other polygons or objects. The invention does not provide complete hidden surface removal, and hence, it requires some form of additional hidden surface removal in the pixel domain. The Depth Complexity Reduction method described here eliminates complete polygons that are hidden in a scene rather than hidden portions of polygons that make up a surface. However, since the depth complexity of a scene can be reduced by an order of magnitude for complex scenes, total system performance can be dramatically improved by use of this invention.
Other methods of detecting visible portions of a scene and removing invisible portions of a scene have been suggested in the literature, such as list-priority methods (e.g., depth-sort or Binary Space-Partition Trees), scan-line methods, and area subdivision methods (e.g., the method developed by Warnock or the method developed by Weiler and Atherton). These methods are summarized in chapter 15, pages 649 through 720, of the book "Computer Graphics: Principles and Practice (2nd Edition)", by James Foley, Andries van Dam, Steven Feiner, and John Hughes, published by Addison Wesley Publishing Co., ISBN 0-201-12110-7, and incorporated herein by reference. None these methods achieves the same function as the invention described here.
Rasterization and Writing to the Frame Buffer
The rasterization portion of a renderer processes polygons one at a time. At this point in the process, each polygon is already in screen coordinates, and must now be converted to a set of pixel color values, which are then stored into the frame buffer. The display screen is segmented into an array of screen pixels, each row referred to as a raster line. Rasterization splits a polygon into raster lines that coincide with raster lines within the display screen, and then into individual rasterized polygon pixels. Shading, lighting, and color of the polygon are taken into account when the color value of each rasterized polygon pixel is computed.
For each rasterized polygon pixel, the z coordinate of the intersection of the original polygon and the volume subtended by the screen pixel is computed. Generally, this is done by interpolating the z coordinates of the perspective translated vertices of the polygon. This is the z coordinate of the rasterized polygon pixel. Many rasterized polygons could affect the same screen pixel, but only that polygon which closest to the observer affects the final value of the pixel (assuming no transparent or translucent objects). Thus, it is necessary to use the z coordinates of the rasterized polygon pixels to determine which one affects the final rendered scene. Hence, hidden surface removal is performed in the pixel domain on a pixel-by-pixel basis.
Each screen pixel corresponds to a z-buffer memory location, which stores the z coordinate of the rasterized polygon pixel that is currently closest to the observer. When a new rasterized polygon pixel is generated, its z coordinate is compared to the value stored in the appropriate z-buffer memory location. If the new z coordinate is smaller (and therefore the new rasterized polygon pixel is closer to the observer), the new rasterized polygon pixel overwrites the old one; otherwise the new rasterized polygon pixel is discarded.
The rasterizing step can also provide image anti-aliasing and many other features to enhance image quality. However, renderers must process every pixel within each polygon to be rasterized, and therefore it is important to eliminate as many polygons as possible before the rasterization step.
Display or Store Two-Dimensional Image
Once all of the objects in the scene have been rasterized and all pixels have been written to the frame buffer, the resulting image is displayed on a computer screen, or stored for later viewing or transferred to another medium such as film.