1. Field of the Invention
This invention relates to the field of computer graphics systems. More particularly, this invention relates to an architecture for a high performance three dimensional graphics accelerator in a computer system.
2. Art Background
A three dimensional graphics accelerator is a specialized graphics rendering subsystem for a computer system. Typically, an application program executing on a host processor of the computer system, generates three dimensional geometry input data that defines three dimensional graphics elements for display on a display device. The application program typically transfers the geometry input data from the host processor to the graphics accelerator. Thereafter, the graphics accelerator renders the corresponding graphics elements on the display device.
The design architecture of a high performance three dimensional graphics system historically embodies a balance between system performance and system cost. The typical design goal is to increase system performance while minimizing increases in system cost. However, prior graphics systems usually suffer from either limited performance or high cost due to a variety of system constraints.
For example, a high performance graphics system typically implements an interleaved frame buffer comprised of multiple VRAM banks because the minimum read-modify-write cycle time for commercially available video random access memory (VRAM) chips is a fundamental constraint on rendering performance. The implementation of multiple interleaved VRAM banks enables parallel pixel rendering into the frame buffer to increase overall rendering performance. Unfortunately, the separate addressing logic required for each interleave VRAM bank increases the cost and power consumption of such high performance systems.
On the other hand, a graphics system may implement a rendering processor on a single integrated circuit chip to minimize cost and power consumption. Unfortunately, such systems suffer from poor rendering performance due to the limited number of interface pins available with the single integrated circuit chip. The limited number of interface pins reduces the interleave factor for the frame buffer, thereby precluding the rendering performance benefits of parallel processing.
Another graphics system constraint is the proliferation of differing three dimensional geometry input data formats that define similar drawing functions. A graphics systems is typically required to support many of the differing geometry input data formats. Some prior graphics systems support the differing geometry formats in graphics processor micro-code. However, such a solution greatly increases the size and complexity of the graphics processor micro-code, thereby increasing system cost and decreasing system performance. Other prior graphics systems support the differing geometry formats by employing a host processor to translate the differing formats into a standard format for the graphics processor. Unfortunately, such format translation by the host processor creates a system bottleneck that may severely limit overall graphics system performance.
In addition, prior graphics systems often perform transformation, clip test, face determination, lighting, clipping, screen space conversion, and setup functions using commercially available digital signal processing (DSP) chips. However, such DSP chips are typically not optimized for three dimensional computer graphics. The internal registers provided in a typical DSP chip are too few in number to accommodate the inner loops of most three dimensional graphics processing algorithms. In such systems, on-chip data caches or SRAMs are typically employed to compensate for the limited number of internal fast registers provided by the DSP chip. However, such on-chip data caches are usually implement scheduling algorithms that are not controllable. Moreover, such on-chip SRAMs are usually not suitable for a multi-processing environment.
Also, DSP chips typically require an assortment of support chips to function in a multi-processing environment. Unfortunately, the addition of the support chips to a graphics system increases printed circuit board area, increases system power consumption, increases heat generation, and increases system cost.
Prior graphics systems often employ a parallel processing pipeline to increase graphics processing performance. For example, the scan conversion function for a shaded triangle in a graphics system is typically performed by a linear pipeline of edgewalking and scan interpolation. Typically in such systems, the edgewalking function is performed by an edgewalking processor, and the scan interpolation function is performed by a set of parallel scan interpolation processors that receive parameters from the edgewalking processor.
However, such systems fail to obtain parallel processing speed benefits when rendering relatively long thin triangles, which are commonly encountered in tessellated geometry. The parameter data flow between the edgewalking processor and the scan interpolation processors greatly increases when performing scan conversion on long thin triangles. Unfortunately, the increased parameter data flow slows triangle rendering and reduces graphics system performance.
As will be described, the present invention is a graphics accelerator that achieves high performance at a relatively low cost by overcoming the variety of system constraints discussed above. The present graphics accelerator comprises a command preprocessor for translating the differing geometry input data formats, a set of floating-point processors optimized for three dimensional graphics functions, and a set of draw processors that concurrently perform edgewalking and scan interpolation rendering functions for separate portions of a geometry object.