1. Field of the Invention
This invention relates generally to the field of computer graphics and, more particularly, to high performance computer graphics systems.
2. Description of the Related Art
A computer system typically relies upon its graphics system for producing visual output on a computer screen or display device. Early graphics systems were limited to two-dimensional (2D) graphics and were only responsible for taking what the processor produced as output and displaying it on the screen. In essence, they acted as simple translators or interfaces. Modern graphics systems, however, must support three-dimensional (3D) graphics with textures and special effects. Consequently, they must incorporate graphics processors with a great deal of processing power. They now act more like coprocessors rather than simple translators. This change is due to the recent increase in both the complexity and amount of data received by the graphics processor and the amount of data being sent to the display device. For example, modern computer displays have many more pixels, greater color depth, and are able to display more complex images with higher refresh rates than earlier models. Similarly, the images displayed are now more complex and may involve advanced techniques such as anti-aliasing, texture mapping, advanced shading, fogging, alpha-blending, and specular highlighting.
As a result, without considerable processing power in the graphics system, the CPU would spend a great deal of time performing graphics calculations. This could rob the computer system of the processing power needed for performing other tasks associated with program execution and thereby dramatically reduce overall system performance.
In recent years, demand for high performance graphics systems that can render complex three-dimensional (3D) objects and scenes has increased substantially. This increase is at least in part due to new applications such as computer-generated animation for motion pictures, virtual reality simulators/trainers, and interactive computer games. These new applications place tremendous demands upon graphics systems. One area in which particularly high demands are placed on graphics systems is bandwidth. This is because 3D graphics data may be several orders of magnitude larger than comparable 2D graphics data. For example, simple 2D graphics data may only comprise color information for each pixel displayed. In contrast, 3D graphics data may include many information components for each vertex of the geometric primitives used to model the objects to be imaged. These vertex information components may comprise: x, y, and z position; normal vector; front, back, and specular color; front and back transparency; 2D, 3D, and perspective surface texture; and viewport clipping information.
A number of different techniques have been proposed to reduce the bandwidth requirements of 3D graphics data. One such technique is known as geometry compression. One type of geometry compression is described in detail in U.S. Pat. No. 5,793,371, issued on Aug. 11, 1998, entitled xe2x80x9cMethod and Apparatus for Geometric Compression of Three-Dimensional Graphics Dataxe2x80x9d by Michael F. Deering, which is incorporated herein by reference in its entirety. One of the techniques used in geometry compression relies upon the removal of vertexes that are repeated, to reduce the size of the 3D graphics data.
A surface of a 3D object may be represented by specifying a number of primitive shapes, such as triangles, that conform to the surface and form a triangle mesh as shown in FIG. 1. Each triangle has three vertexes, but many triangles share vertexes. For example, in FIG. 1, vertexes 1-6-7 form a first triangle and vertexes 1-7-2 form a second triangle. Thus, vertexes 1 and 7 are shared between the two triangles. Vertex 7 is actually shared by nine different triangles in the triangle mesh shown in FIG. 1, and vertex 6 is shared by three different triangles.
To efficiently reuse vertexes, the triangle mesh shown in FIG. 1 may be encoded as one or more xe2x80x9ctriangle-stripsxe2x80x9d. For example, a triangle strip may comprise the following triangles: 6-1-7, 1-2-7, 7-2-3, 7-3-4, 7-4-8, 4-8-5, et seq. As this pattern shows, once a triangle strip is started, subsequent triangles may be specified using only a single new vertex. In general, N vertexes in a triangle strip describe N-2 triangles.
Therefore, instead of transforming and lighting three vertexes for the next triangle in the strip, it may be possible to transform and light only one new vertex and reuse the previously transformed and lit data for the other two vertexes. This may yield a significant reduction in the processing time required to transform and light vertex data (e.g. 67%). Furthermore, large numbers of triangles may not be required to achieve significant reductions of processing time. Four vertexes defining 2 contiguous triangles may be processed 33% faster than two separate triangles. Six vertices defining 4 contiguous triangles may be processed 50% faster than four separate triangles.
For the reasons set forth above, the use of geometry compression may be particularly advantageous in high performance graphics systems.
However, further increases in performance are still demanded by modem applications. Thus, additional methods for increasing the performance of graphics systems configured to utilize compressed 3D graphics data is desired. Inefficiencies in the flow of vertex data, the assembly of vertex components to form a primitive, and the launching of primitives to be processed into pixel data need to be addressed.
The problems outlined above may, in some embodiments, be solved in part by a graphics system capable of delaying the formation of independent primitives until after transformation and/or lighting, and in part by improving the rate of formation of independent primitives by the use of multiple buffers, queues, and/or caches in order to perform process steps in parallel, to accommodate process steps performed at different rates, and to facilitate communication between devices operating at different clock speeds.
Vertexes that are shared by more than one primitive may then have the potential to be transformed and lit only once, as opposed to being transformed and lit for each triangle to which they belong. Vertex 7 in FIG. 1, for instance, is a vertex in 9 different triangles. In one embodiment, vertex 7 would be tagged for storage and multiple re-use during geometry compression. Transforming and/or lighting may thus be performed on an individual vertex basis instead of on a geometric primitive basis. The individually transformed and lit vertexes are then assembled into primitives for further processing into pixel data.
In some embodiments, the graphics system may utilize buffers, queues, and/or caches to store transformed and lit vertexes. Each time a particular vertex is needed to form a geometric primitive, the vertex may be read from the appropriate transformed vertex storage device, which may be identified by using vertex tags assigned by the transforming and lighting processors.
In some embodiments, separate and independent buffers, queues, and/or caches may be used to store the vertex data at each stage of the vertex assembly process. The stream of compressed vertex data, the transformed and lit vertex data, the corresponding vertex tags, the vertex data that will be reused for another primitive, the vertex data that is part of the next primitive to be assembled, the vertex data for the next vertex needed in assembly, and the assembled primitive are all separately stored in independent buffers, queues, and/or caches. Separately storing the output of each step in the process may allow the various process steps to be performed in parallel and at different rates. In addition, multiple processor units may be utilized for those process steps that may require more time to complete.
In one embodiment, a graphics system may comprise a graphics processor configured to receive compressed three-dimensional (3D) graphics data and generate a series of transformed vertexes, one or more vertex buffers configured to store said transformed vertexes, a primitive assembly buffer, a primitive assembler configured to control transfers of selected ones of said transformed vertexes from the one or more vertex buffers to the primitive assembly buffer (wherein the selected transformed vertexes form a portion of a primitive), a primitive launch buffer configured to receive the selected transformed vertexes from the primitive assembly buffer and a remaining transformed vertex from a vertex buffer completing the primitive, and a primitive launcher configured to control the output of the primitive comprising the selected transformed vertexes and the remaining transformed vertex (wherein the primitive is usable to determine at least a portion of a displayable image).
An assembled primitive may not be visible in the displayed image, and therefore would merit no further processing time. For this reason, an assembled primitive may be tested (a clip test) for inclusion in a specified viewport. A viewport is a portion of the screen space that has been defined as the visible region for a particular group of primitives (a viewport could be defined as all of the visible screen). As each primitive is assembled, a clip test may be performed to determine if a primitive is completely within, completely outside, or only a portion of the primitive is outside a viewport. If a primitive is completely within a viewport, it is processed into pixel data for display. If a primitive is completely outside a viewport, it is discarded. If a primitive is partly inside and partly outside a viewport it is returned to the transforming and lighting processors. The portion of a primitive that is outside the viewport is removed and the new vertexes are processed. It may be necessary to subdivide the truncated primitive into several new primitives.
Each of these steps may have the potential to reduce the time required to process a vertex data stream into transformed and lit primitives and thereby increase the efficiency of a graphics processor.