With the wide-spread use of computers in all aspects of modern life, there is an increasing demand to improve the human-machine interface through the use of visual information. Advances in graphical software and hardware have already improved the human-machine interface drastically. Interactive graphics such as windowing environments for desk-top computers, for example, have improved the ease of use and interactivity of computers drastically and are common-place today. As the price-performance ratio of hardware drops, the use of computer generated graphics and animation will become even more pervasive. Unfortunately, the cost of producing truly interactive and realistic effects has limited its application. There is a need, therefore, for new graphics processing techniques and architectures that provide more interactive and realistic effects at a lower cost.
Although there are numerous ways to categorize graphics processing, one common approach is to describe an image in terms of the dimensions of the objects that it seeks to represent. For example, a graphics system may represent objects in two dimensions (e.g., having x and y coordinates); in which case the graphics are said to be "two-dimensional", and three dimensions (e.g., having x, y, and z coordinates), in which case the graphics are said to be "three-dimensional."
Since display devices such as cathode ray tubes (CRTs) are two-dimensional (2-D), the images displayed by computer graphic systems are generally 2-D. As discussed in greater detail below, however, if the computer maintains a graphical model representing the imaged object in three-dimensional space, the computer can alter the displayed image to illustrate a different perspective of the object in 3-D space. In contrast, although a 2-D graphic image can be transformed prior to display (e.g., scaled, or translated), the computer can not readily depict the object's appearance from a different perspective in 3-D space.
The increasing ability of modern computers to efficiently handle 2-D and, particularly, 3-D graphics has resulted in a growing variety of applications for computers, as well as fundamental changes in the interface (UI) between computers and their users. The availability of 3-D graphics is becoming increasingly important to the growth of entertainment related applications including production quality film animation tools, as well as lower resolution games and multimedia products for the home. A few of the many other areas touched by 3-D graphics include education, video conferencing, video editing, interactive user interfaces, computer-aided design and computer-aided manufacturing (CAD/CAM), scientific and medical imaging, business applications, and electronic publishing.
A graphics processing system may be thought of as including an application model, application program, graphics sub-system, as well as the conventional hardware and software components of a computer and its peripherals.
The application model represents the data or objects to be displayed, assuming of course that the image processing is based upon a model. The model includes information concerning primitives such as points, lines, and polygons that define the objects' shapes, as well as the attributes of the objects (e.g., color). The application program controls inputs to, and outputs from, the application model--effectively acting as a translator between the application model and graphics sub-system. Finally, the graphics sub-system is responsible for passing user inputs to the application model and is responsible for producing the image from the detailed descriptions stored by the application model.
The typical graphics processing system includes a physical output device which is responsible for the output or display of the images. Although other forms of display devices have been developed, the predominant technology today is referred to as raster graphics. A raster display device includes an array of individual points or picture elements (i.e., pixels), arranged in rows and columns, to produce the image. In a CRT, these pixels correspond to a phosphor array provided on the glass faceplate of the CRT. The emission of light from each phosphor in the array is independently controlled by an electron beam that "scans" the array sequentially, one row at a time, in response to stored information representative of each pixel in the image. Interleaved scanning of alternate rows of the array is also a common technique in, for example, the television environment. The array of pixel values that map to the screen is often referred to as a bitmap or pixmap.
One problem associated with raster graphics devices is the memory required to store the bitmap for even a single image. For example, the system may require 3.75 megabytes (MB) of random access memory to support a display resolution of 1280.times.1024 (i.e., number of pixel columns and rows) and 24 bits of color information per pixel. This information, which again represents the image of a single screen, is stored in a portion of the computer's display memory known as a frame buffer.
Another problem with conventional raster graphics devices such as CRTs is the relatively quick decay of light emitted by the device. As a result, the display must typically be "refreshed" (i.e., the raster rescanned) at a rate approaching 60 Hz or more to avoid "flickering" of the image. This places a rigorous demand on the image generation system to supply image data at a fixed rate. Some systems address this problem by employing two frame buffers, with one of the buffers being updated with pixmap information corresponding to subsequent image frame, while the other buffer is being used to refresh the screen with the pixmap for the current image frame.
The demands placed upon the system are further exacerbated by the complexity of the information that often must be processed to render an image from the object stored by the application model. For example, the modeling of a three-dimensional surface is, in itself, a complex task. Surface modeling is performed by the application model and may involve the use of polygon meshes, parametric surfaces, or quadric surfaces. While a curved surface can be represented by a mesh of planar polygons, the "smoothness" of its appearance in the rendered image will depend both upon the resolution of the display and the number of individual polygons that are used to model the surface. The computations associated with high resolution modeling of complex surfaces based upon polygon meshes can be extremely resource intensive.
As intimated above, there is a demand to produce more realistic and interactive images. The term, "real-time," is commonly used to describe interactive and realistic image processing systems. In a "real-time" system, the user should perceive a continuous motion of objects in a scene. In a video game having real-time capabilities, the active characters and view point should respond with minimal delay to a user's inputs, and should move smoothly.
To produce such real-time effects, an image rendering system has to generate a new image at a sufficiently high rate such that the user perceives continuous motion of objects in a scene. The rate at which a new image is computed for display is referred to as the "computational" rate or the "computational frame" rate. The computation rate needed to achieve realistic effects can vary depending on how quickly objects move about the scene and how rapidly the viewing perspective changes. For a typical application, a real-time graphics system recomputes a new image at least twelve times a second to generate a series of images that simulate continuous motion. For high-quality animation applications, however, the computational rate must be significantly higher.
Another critical issue for real-time systems is transport delay. Transport delay is the time required to compute and display an image in response to input from the user, i.e. motion of a joystick to move a character in a scene. To the extent transport delay time is noticeable to a user, "real-time" interactivity is impaired. Ideally, the user should not perceive any transport delay. However, in practice there is always some delay attributed to rendering objects in a scene in response to new inputs and generating a display image. Improvements in real-time interactivity are highly desirable without discarding data, which can interfere with image quality.
As introduced above, conventional graphics systems typically include a frame buffer. To generate an image, the graphic system renders all of the objects in a scene and stores the resulting image in this frame buffer. The system then transfers the rendered image data to a display. In a conventional graphics architecture, the entire frame buffer is erased and the scene is re-rendered to create a next frame's image. In this type of system, every object must be redrawn for each frame because the frame buffer is cleared between frames. Every object therefore is updated at the same rate, regardless of its actual motion in the scene or its importance to the particular application.
This conventional architecture presents several hurdles to producing highly realistic and interactive graphics. First, every object in a scene for a particular frame is rendered with the same priority at the same update rate. As such, objects in the background that have little detail and are not moving are re-rendered at the same rate as objects in the foreground that are moving more rapidly and have more surface detail. As a result, processing and memory resources are consumed in re-rendering background objects even though these background objects do not change significantly from frame to frame.
Another drawback in this conventional architecture is that every object in the scene is rendered at the same resolution. In effect, the rendering resources consumed in this type of approach are related to the size of the screen area that the object occupies rather than the importance of the object to the overall scene. An example will help illustrate this problem. In a typical video game, there are active characters in the foreground that can change every frame, and a background that rarely changes from frame to frame. The cost in terms of memory usage for generating the background is much greater than generating the active characters because the background takes up more area on the screen. Image data must be stored for each pixel location that the background objects cover. For the smaller, active characters, however, image data is generated and saved for only the pixels covered by the smaller characters. As a result, the background occupies more memory even though it has lesser importance in the scene. Moreover, in a conventional architecture the entire background has to be re-rendered for every frame, consuming valuable processing resources.
One way to address some of the difficulties outlined above is to use a hardware-based animation technique called "sprites". A sprite is a two-dimensional image in a small rectangular region of memory that is mixed with the rest of the frame buffer memory at the video level. The location of a sprite at any time is specified in registers in the frame buffer. To move the sprite about the screen, the system alters the values in these registers. The sprites can either hide the frame buffer values at each pixel, or can be blended with them. Sprites can be used in animation by moving or scaling the sprite or sprites on top of a background image. One of the most popular uses of sprites is in video games, where a scene is animated by moving or scaling sprites over a fixed background.
While sprites provide a less costly method to achieve animation, they do not adequately address the problems associated with rendering 3-D objects in a real-time graphics system outlined above. The conventional use of sprites to generate a display image is limited because the sprites only represent static two-dimensional images. The two-dimensional is pre-defined, cannot be updated (it is static or "fixed"), and is merely scaled or translated to achieve a simple form of animation. Some sprite processing techniques partially represent the depth dimension by allowing sprites to overlap each other. Even if the sprites are overlapped, they can not simulate the effect of a three-dimensional object moving in three dimensions.
Because conventional sprites are limited to pre-defined images, they cannot achieve the more compelling attributes of objects moving in three dimensions in a three-dimensional scene. Consider an example of a wobbling cylinder moving from the foreground to the background of a graphics scene. A single, two-dimensional sprite image cannot accurately represent the motion of the cylinder because simple scaling and translation of a single two-dimensional image cannot capture all of the positional changes to the projection of the cylinder into two-dimensional screen space as it moves throughout the scene. A sprite of a side view of the cylinder could be translated or scaled, but this sprite could not represent complex motion or wobbling of the cylinder.
Conventional sprites are limited in this way because they do not represent corresponding 3-D graphical objects that can be re-rendered. Stated another way, conventional sprites are not updated or re-rendered in the same way that 3-D graphical objects in a graphics scene are re-rendered to generate frames of animation. Thus, while conventional sprites provide a simpler and less expensive technique to achieve animation, they do not provide the more compelling and realistic effect of a rendered 3-D object.