Graphics rendering refers generally to the process of generating a two-dimensional image from graphical models. A graphical model defines attributes of a real or imaginary object which are to be represented in a rendered image. These attributes include, for example, color, shape, and position of an object in a graphics scene. In the process of rendering these models, a graphics system generates a display image, which is comprised of an array of pixel data.
A pixel is a point or picture element in a display device, and in the context of graphics processing, also corresponds to a point in the two-dimensional space to which the graphical models are rendered. Each pixel element of a rendered image includes one or more data values describing attributes of the pixel used to display it. For instance in color graphics, this pixel data can include intensity values for color components that give the pixel element its color. The intensity values stored in an array of pixel elements are then used to display the array on physical output device such as a raster display device.
Graphics processing is often classified by the dimension of the models to be rendered to an image. For instance, two-dimensional graphics processing ("2-D graphics") refers to the generation of an image from graphical models having two dimensions (x and y coordinates) and three-dimensional graphics processing ("3-D graphics") refers the processing of three-dimensional models.
Graphics processing can also be classified as "real-time" which means that 1) the display image is updated so that the user perceives continuous motion of the objects in the scene; and 2) there is minimal and predictable "transport delay" between user input, which change the position of objects or the viewpoint of the scene, and the display of an image in response to this input. To achieve this affect, objects in the scene, must be rendered within a predefined period of time.
In 3-D graphics applications, an object in a scene is represented by a 3-D graphical model, which includes geometric data used to model the surface and position of the object, and visual attributes used to model the appearance of the object. There are a number of ways that a geometric model can represent a 3-D object, including polygon meshes, parametric surfaces, or quadratic surfaces. Using a polygon mesh, for example, the surface of an object is modeled with several interconnected polygons,. The surface elements, in this case polygons, are referred to as geometric primitives. Visual attributes such as red, green, and blue color data, and possibly other model data is stored at the vertices of the polygon.
In the rendering process, the geometric primitives corresponding to objects in a scene are processed to generate a display image. In the context of 3-D graphics, the rendering process includes transforming the graphical models in a scene, and rasterizing the geometric primitives in the models to generate pixel data. In some systems, this pixel data is processed further to enhance image quality. The final product of the rendering process is a display image comprised of a collection of pixel values. To display the image, these pixel values are transferred from a memory buffer, such as a frame buffer, to a display controller.
The typical graphics processing system includes a physical output device that displays rendered images. Although other forms of display devices have been developed, the predominant technology today is referred to as raster graphics. A raster display device includes an array of individual points or picture elements (i.e., pixels), arranged in rows and columns, to produce the image. In a CRT, these pixels correspond to a phosphor array provided on the glass faceplate of the CRT. The emission of light from each phosphor in the array is independently controlled by an electron beam that "scans" the array sequentially, one row at a time, in response to stored information representative of each pixel in the image. The array of pixel values that map to the screen is often referred to as a bitmap or pixmap.
The rendering process typically begins by transforming the vertices of the geometric primitives to prepare the model data for the rasterizing step. While the specific details of the transformation phase varies, a few examples will illustrate the process. The modeling transform, in some systems, is used to convert the vertices of a model from the model's local coordinates to world coordinates, the coordinates in which a complete scene is represented. The next step is to determine potentially visible objects in a 3-D space referred to as the view volume. This step is commonly performed in view reference coordinates, which describes object locations relative to a viewpoint or eyepoint. Objects that are not potentially visible at this stage can be disregarded while objects that are at least partially in the view volume are "clipped" to the view volume.
After transforming the objects, the geometric primitives for the objects are "rasterized" or "scan converted." Rasterizing refers generally to the process of computing a pixel value for a pixel in the image being rendered based on data from the geometric primitives that project onto or "cover" the pixel. Rasterizing is sometimes referred to as "tiling" because of the analogy to tiling a floor. Imagine that the pixels are square elements or tiles, and that a polygon is the floor plan. The rasterizing step includes tiling this floor plan by computing pixel values for the pixels or "tiles" within the polygon. The pixel values can include, for example, intensity values representing the red, green, and blue (R, G, B) color components of a pixel.
While there are a number of ways to raisterize a geometric primitive, this process generally involves computing a pixel intensity value or values based on the data from a primitive that projects onto or "covers" a pixel. For example, color values stored at the vertices of a polygon can be interpolated to find a color value at a given pixel. During this process, lighting and shading models can also be used to compute pixel values for pixels across the surface of the polygon.
From the tiling analogy above, it is clear that discrete pixels cannot precisely represent continuous surfaces. For example, a polygon may only partially cover a pixel region. In this case, the edge or edges of a polygon cross over the pixel region. If the pixel were approximated as being fully covered by this polygon, anomalies such as jaggy edges in the rendered image would likely result. A technique known generally as anti-aliasing attempts to address this problem. In general, anti-aliasing is used to compute pixel intensities for partially covered pixels to reduce the discontinuities introduced by representing a continuos object with a discrete array of pixels.
In a given 3-D graphics scene, a number of polygons may project onto the same area of the view space, and some of these polygons may occlude others. As such, some primitives may not be "visible" in the scene. Hidden surface removal is the process of determining which objects or portions of objects are, and conversely, are not visible in the scene. There are a number of approaches to hidden surface removal. Some approaches involve pre-sorting primitives before the rasterizing step. Examples of these types of approaches include: 1) sorting primitives and then rendering the primitives in back to front order so that pixels for visible objects overwrite pixels for occluded objects; and 2) sorting primitives in depth order *and then clipping the primitives relative to each other to eliminate hidden portions of the primitive. Of course there are a number of variations to these examples as well as a variety of additional examples.
These approaches are generally not preferred because they are computationally complex and therefore consume precious processing resources. Additional processing is required to sort the primitives in the scene before the rasterizing step. For real-time systems where objects are in motion from scene to scene, the computations required to sort and/or clip primitives makes them untenable alternatives. In any practical system, there is a trade-off between image quality and computational complexity. If the hidden surface removal technique is complex, yet does not improve image quality relative to other alternatives, it is not an acceptable solution.
In other approaches, the primitives are not sorted before scan conversion, but instead, the pixel data generated from the scan conversion process includes depth values used to perform hidden surface removal. The Z-buffer is one such approach. The Z-buffer includes an array having elements for storing pixel data including depth values for every pixel location in a display image. As geometric primitives are rasterized, the depth value for newly generated pixel data is compared with a depth value of pixel data stored in the z buffer. If the newly generated pixel data is closer to the viewpoint, it is written over the current values in the Z-buffer for the corresponding pixel location. If :not, it is disregarded.
The primary advantages of the Z-buffer approach are computational speed and simplicity. However, by itself, the Z-buffer approach provides no support for dealing with partially covered pixels or translucent pixels.
One improvement to the Z-buffer approach is known as the A-buffer and is described in: L. Carpenter "The A-Buffer, An Anti-Aliased Hidden Surface Method," in Computer Graphics, SIGGRAPH '84 proceedings, July 1984, Vol. 18, No. 3, pp. 103-108.
In this approach, polygons are clipped into fragments at the boundaries of a pixel region. For example, assume a pixel is represented as a square. If 3-sided polygon (triangle) partially covers this square, a pixel fragment is generated to represent the portion of the triangle covering the square. A bit mask, created by exclusive ORing masks representing polygons edges, is used to describe how a polygon partially covers the pixel region.
In the A-buffer approach, there are two different data types representing pixel data. One type data type, called a "pixelstruct," is used to store color and depth of a fully covered pixel, and a pointer to a depth sorted fragment list for partially covered pixels. The other data type is called a "fragment," which can store a pointer to the next fragment in a linked list, color, opacity, area, an object tag, a pixel mask, and a range of z values. Pixelstructs are stored in an array having the size and shape of the final image.
Geometric prirnitives are rasterized to generate the pixel data in the pixelstruct array and fragment lists. After the pixel data is generated for the pixelstruct array, the pixels having fragment lists are processed in a technique called "packing." Packing refers to the process of computing pixel values from a depth sorted fragment list.
The primary advantage of the A-buffer approach is that it supports high-quality anti-aliasing and translucency computations. The A-buffer improves the general Z-buffer approach so that anti-aliasing for partially covered pixels and translucency for non-opaque pixels can both be supported.
One problem with the A-buffer approach is that it requires additional memory to store the fragment lists. The memory required to store fragments for an entire frame of image data can be enormous. The Carpenter paper refers to a software implementation where the pixelstruct array is "paged in software to save virtual memory space." This approach assumes that the computer system used to render an image has enough memory to store fragments generated during the rendering process.
Carpenter uses a technique referred to as fragment merging in which two or more fragments are combined into a single fragment. According to Carpenter, "merging two or more fragments simplifies the data structure and reclaims the space used by the merged-in fragments." Carpenter's approach merges fragments if they have the same object tag and overlap in Z. The object tag identifies an object from which the fragment originates and also includes data indicating whether the surface of the object faces forward or backward. While not described fully in the paper, the Z overlap appears to refer to the case where the range of z values overlap for fragment merge candidates.
In Carpenter's approach, the test to determine whether to merge a fragment is performed "whenever a new fragment is added to the pixelstruct list."
While fragment merging can reduce memory consumption, Carpenter's approach has a number of drawbacks. Every fragment must have an object tag. This complicates the rasterizing process because the object tag, including data indicating a front or back facing surface, must be determined and stored for each fragment. In addition, Carpenter's method has to maintain a Zmax and Zmin value for each fragment, which complicates processing and requires more storage. Carpenter further suggests that fragments should be maintained in a sorted order. While this can improve the efficiency by reducing the number of fragments that need to be examined to merge a fragment, this type of sorting further complicates the fragment generation process.
Carpenter describes fragment processing in terms of a software implementation, which cannot generate images for real-time, interactive applications. While the Carpenter approach is an improvement to the Z-buffer algorithm, it is extremely difficult to implement in a real-time system. In a real-time system, each pixel must be processed very quickly so that a new image can be generated within rigorous timing constraints. In Carpenter's A-buffer approach, this is not a concern because it is only used to compute images in a non-real time software system.