Three-Dimensional Computer Graphics
Computer graphics is the art and science of generating pictures with a computer. Generation of pictures, or images, is commonly called rendering. Generally, in three-dimensional (3D) computer graphics, geometry that represents surfaces (or volumes) of objects in a scene is translated into pixels stored in a frame buffer, and then displayed on a display device. Real-time display devices, such as CRTs or LCDs used as computer monitors, refresh the display by continuously displaying the image over and over. This refresh usually occurs row-by-row, where each row is called a raster line or scan line. In this document, raster lines are numbered from bottom to top, but are displayed in order from top to bottom.
In a 3D animation, a sequence of images is displayed, giving the appearance of motion in three-dimensional space. Interactive 3D computer graphics allows a user to change his viewpoint or change the geometry in real-time, thereby requiring the rendering system to create new images on-the-fly in real-time. Therefore, real-time performance in color, with high quality imagery, is very important.
In 3D computer graphics, each renderable object generally has its own local object coordinate system, and therefore needs to be translated (or transformed) from object coordinates to pixel display coordinates. Conceptually, this is a 4-step process: 1) translation from object coordinates to world coordinates, which is the coordinate system for the entire scene; 2) translation from world coordinates to eye coordinates, based on the viewing point of the scene; 3) translation from eye coordinates to perspective translated eye coordinates, where perspective scaling (farther objects appear smaller) has been performed; and 4) translation from perspective translated eye coordinates to pixel coordinates, also called screen coordinates. Screen coordinates are points in three-dimensional space, and can be in either screen-precision (i.e., pixels) or object-precision (high precision numbers, usually floating-point), as described later. These translation steps can be compressed into one or two steps by precomputing appropriate translation matrices before any translation occurs. Once the geometry is in screen coordinates, it is broken into a set of pixel color values (that is “rasterized”) that are stored into the frame buffer. Many techniques are used for generating pixel color values, including Gouraud shading, Phong shading, and texture mapping.
A summary of the prior art rendering process can be found in: “Fundamentals of Three-dimensional Computer Graphics”, by Watt, Chapter 5: The Rendering Process, pages 97 to 113, published by Addison-Wesley Publishing Company, Reading, Mass., 1989, reprinted 1991, ISBN 0-201-15442-0 (hereinafter referred to as the Watt Reference).
FIG. 1 shows a three-dimensional object 100, a tetrahedron, with its own coordinate axes (xobj, yobj, zobj). The three-dimensional object is translated, scaled, and placed in the viewing point's coordinate system based on (xeye, yeye, zeye). The object is projected onto the viewing plane, thereby correcting for perspective. At this point, the object appears to have become two-dimensional; however, in accordance with the present invention, the object's z-coordinates are preserved so they can be used later by hidden surface removal techniques. The object is finally translated to screen coordinates, based on (xscreen, yscreen, zscreen), where zscreen is going perpendicularly into the page. Points on the object now have their x and y coordinates described by pixel location (and fractions thereof) within the display screen and their z coordinates in a scaled version of distance from the viewing point.
Because many different portions of geometry can affect the same pixel, the geometry representing the surfaces closest to the scene viewing point must be determined. Thus, in accordance with the present invention for each pixel, the visible surfaces within the volume subtended by the pixel's area determine the pixel color value, while hidden surfaces are prevented from affecting the pixel. Non-opaque surfaces closer to the viewing point than the closest opaque surface (or surfaces, if an edge of geometry crosses the pixel area) affect the pixel color value, while all other non-opaque surfaces are discarded. In this document, the term “occluded” is used to describe geometry which is hidden by other non-opaque geometry.
Many techniques have been developed to perform visible surface determination, and a survey of these techniques are incorporated herein by reference to: “Computer Graphics: Principles and Practice”, by Foley, van Dam, Feiner, and Hughes, Chapter 15: Visible-Surface Determination, pages 649 to 720, 2nd edition published by Addison-Wesley Publishing Company, Reading, Mass., 1990, reprinted with corrections 1991, ISBN0-201-12110-7 (hereinafter referred to as the Foley Reference). In the Foley Reference, on page 650, the terms “image-precision” and “object-precision” are defined: “Image-precision algorithms are typically performed at the resolution of the display device, and determine the visibility at each pixel. Object-precision algorithms are performed at the precision with which each object is defined, and determine the visibility of each object.”
As a rendering process proceeds, most prior art renderers must compute the color value of a given screen pixel multiple times because multiple surfaces intersect the volume subtended by the pixel. The average number of times a pixel needs to be rendered, for a particular scene, is called the depth complexity of the scene. Simple scenes have a depth complexity near unity, while complex scenes can have a depth complexity perhaps within the range of ten to twenty, complexity of ten, 90% of the computation is wasted on hidden pixels. This wasted computation is typical of hardware renderers that use the simple Z-buffer technique (discussed later herein), generally chosen because it is easily built in hardware. Methods more complicated than the Z Buffer technique have heretofore generally been too complex to build in a cost-effective manner. An important feature of the method and apparatus invention presented here is the avoidance of this wasted computation by eliminating hidden portions of geometry before they are rasterized, while still being simple enough to build in cost-effective hardware.
When a point on a surface (frequently a polygon vertex) is translated to screen coordinates, the point has three coordinates: 1) the x-coordinate in pixel units (generally including a fraction); 2) the y-coordinate in pixel units (generally including a fraction); and 3) the z-coordinate of the point in either eye coordinates, distance from the virtual screen, or some other coordinate system which preserves the relative distance of surfaces from the viewing point. In this document, positive z-coordinate values are used for the “look direction” from the viewing point, and smaller positive values indicate a position closer to the viewing point.
When a surface is approximated by a set of planar polygons, the vertices of each polygon are translated to screen coordinates. For points in or on the polygon (other than the vertices), the screen coordinates are interpolated from the coordinates of vertices, typically by the processes of edge walking and span interpolation. Thus, a z-coordinate value is generally included in each pixel value (along with the color value) as geometry is rendered.
Polygons are used in 3D graphics to define the shape of objects. Texture mapping is a technique for simulating surface textures by coloring polygons with detailed images. Typically, a single texture map will cover an entire object that consists of many polygons. A texture map consists of one or more rectangular arrays of Red-Green-Blue-Alpha (RGBA) color, with alpha being the percentage of translucency. Texture coordinates for each vertices of a polygon are determined. These coordinates are interpolated for each geometry component, the texture values are looked up in the texture map and the color is assigned to the fragment.
Objects appear smaller when they are farther from the viewer. Therefore, texture maps must be scaled so that the texture pattern appears the same size relative to the object being textured. To avoid scaling and filtering a texture image for each fragment, a series of pre-filtered texture maps, called mipmaps are used. Each texture has a group of associated mipmaps. Each mipmap, also called a level of detail (LOD), is formed of an n×m array of Texture elements (texels), where n and m are powers of 2. Each texel comprises an R, G, B, and A component. Typically each successive LOD has a power of 2 lower resolution than the previous LOD, and thus a cascading series of smaller, prefiltered images are provided, rather than requiring such computations to be performed in real-time. For example, LOD 0 may be a 512×512 array, and LOD 9 is 1×1 array.
Exact texture coordinates and LOD are typically computed for a sample pixel. The texel values surrounding these texture coordinates are then interpolated to generate texture values for the sample. In bilinear interpolation, the prestored LOD array closest to the computed LOD value is selected, and the values of the four texels in the array nearest to the texture coordinates are interpolated to generate texture values for a sample. In trilinear interpolation, the four texels closest to the texture coordinates in the prestored LOD arrays above and below the computed LOD are used to generate the texture values for a sample. For example, if an LOD value of 3.2 is computed then texels from LOD array 3 and LOD array 4 are used for trilinear interpolation. Trilinear interpolation thus requires eight texels per sample, which makes high memory bandwidth a critical component to efficient image rendering.