The present invention relates to 3D interactive computer graphics, and more specifically, to arrangements and techniques for efficiently representing and storing vertex information for animation and display processing. Still more particularly, the invention relates to a 3D graphics integrated circuit including a vertex cache for more efficient imaging of 3D polygon data.
Modem 3D computer graphics systems construct animated displays from display primitives, i.e., polygons. Each display object (e.g., a tree, a car, or a person or other character) is typically constructed from a number of individual polygons. Each polygon is represented by its verticesxe2x80x94which together specify the location, orientation and size of the polygon in three-dimensional spacexe2x80x94along with other characteristics (e.g., color, surface normals for shading, textures, etc.). Computer techniques can efficiently construct rich animated 3D graphical scenes using these techniques.
Low cost, high speed interactive 3D graphics systems such as video game systems are constrained in terms of memory and processing resources. Therefore, in such systems it is important to be able to efficiently represent and process the various polygons representing a display object. For example, it is desirable to make the data representing the display object compact, and to present the data to the 3D graphics system in a way so that all of the data needed for a particular task is conveniently available.
One can characterize data in terms of temporal locality and spatial locality. Temporal locality means the same data is being referenced frequently in a small amount of time. In general, the polygon-representing data for typical 3D interactive graphics applications has a large degree of temporal locality. Spatial locality means that the next data item referenced is stored close in memory to the last one referenced. Efficiency improvements can be realized by increasing the data""s spatial locality. In a practical memory system that does not allow unlimited low-latency random access to an unlimited amount of data, performance is increased if all data needed to perform a given task is stored close together in low-latency memory.
To increase the spatial locality of the data, one can sort the polygon data based on the order of processingxe2x80x94assuring that all of the data needed to perform a particular task will be presented at close to the same time so it can be stored together. For example, polygon data making up animations can be sorted in a way that is preferential to the type of animation being performed. As one example, typical complex interactive real-time animation such as surface deformation requires manipulation of all the vertices at the surfaces. To perform such animation efficiently, it is desirable to sort the vertex data in a certain way.
Typical 3D graphical systems perform animation processing and display processing separately, and these separate steps process the data differently. Unfortunately, the optimal order to sort the vertex data for animation processing is generally different from the optimal sort order for display processing. Sorting for animation may tend to add randomness to display ordering. By sorting a data stream to simplify animation processing, we make it harder to efficiently display the data.
Thus, for various reasons, it may not be possible to assume that spatial locality exists when accessing data for display. Difficulty arises from the need to efficiently access an arbitrarily large display object. In addition, for the reasons explained above, there will typically be some amount of randomnessxe2x80x94at least for display purposesxe2x80x94in the order the vertex data is presented to the display engine. Furthermore, there may be other data locality above the vertex level that would be useful to implement (e.g., grouping together all polygons that share a certain texture).
One approach to achieving higher efficiency is to provide additional low-latency memory (e.g., the lowest latency memory system affordable). It might al so be possible to fit a display object in fast local memory to achieve random access. However, objects can be quite large, and may need to be double-buffered. Therefore, the buffers required for such an approach could be very large. It might also be possible to use a main CPU""s data cache to assemble and sort the polygon data in an optimal order for the display engine. However, to do this effectively, there would have to be some way to prevent the polygon data from thrashing the rest of the data cache. In addition, there would be a need to prefetch the data to hide memory latencyxe2x80x94since there will probably be some randomness in the way even data sorted for display order is accessed. Additionally, this approach would place additional loading on the CPUxe2x80x94especially since there might be a need in certain implementations to assemble the data in a binary format the display engine can interpret. Using this approach, the main CPU and the display engine would become serial, with the CPU feeding the data directly to the graphics engine. Parallelizing the processing (e.g., to feed the display engine through a DRAM FIFO buffer) would require substantial additional memory access bandwidth as compared to immediate-mode feeding.
Thus, there exists a need for more efficient techniques that can be used to represent, store and deliver polygon data for a 3D graphics display process.
The present invention solves this problem by providing a vertex cache to organize indexed primitive vertex data streams.
In accordance with one aspect provided by the present invention, polygon vertex data is fed to the 3D graphics processor/display engine via a vertex cache. The vertex cache may be a small, low-latency memory that is local to (e.g., part of) the 3D graphics processor/display engine hardware. Flexibility and efficiency results from the cache providing a virtual memory view much larger than the actual cache contents.
The vertex cache may be used to build up the vertex data needed for display processing on the fly on an as-needed basis. Thus, rather than pre-sorting the vertex data for display purposes, the vertex cache can simply fetch the relevant blocks of data on an as-needed basis to make it available to the display processor. Based on the high degree of temporal locality exhibited by the vertex data for interactive video game display and the use of particularly optimal indexed-array data structures (see below), most of the vertex data needed at any given time will be available in even a small set-associative vertex cache having a number of cache lines proportional to the number of vertex data streams. One example optimum arrangement provides a 512xc3x97128-bit dual ported RAM to form an 8 set-associative vertex cache.
Efficiency can be increased by customizing and optimizing the vertex cache and associated tags for the purpose of delivering vertices to the 3D graphics processor/display enginexe2x80x94allowing more efficient prefetching and assembling of vertices than might be possible using a general-purpose cache and tag structure. Because the vertex cache allows data to be fed directly to the display engine, the cost of additional memory access bandwidth is avoided. Direct memory access may be used to efficiently transfer vertex data into the vertex cache.
To further increase the efficiencies afforded by the vertex cache, it is desirable to reduce the need to completely re-specify a particular polygon or set of polygons each time it is (they are) used. In accordance with a further aspect provided by the present invention, polygons can be represented as arrays, e.g., linear lists of data components representing some feature of a vertex (for example, positions, colors, surface normals, or texture coordinates). Each display object may be represented as a collection of such arrays along with various sets of indices. The indices reference the arrays for a particular animation or display purpose. By representing polygon data as indexed component lists, discontinuities are allowed between mappings. Further, separating out individual components allows data to be stored more compactly (e.g., in a fully compressed format). The vertex cache provided by the present invention can accommodate streams of such indexed data up to the index size.
Through use of an indexed vertex representation in conjunction with the vertex cache, there is no need to provide any resorting for display purposes. For example, the vertex data may be presented to the display engine in a order presorted for animation as opposed to displayxe2x80x94making animation a more efficient process. The vertex cache uses the indexed vertex data structure representation to efficiently make the vertex data available to the display engine without any need for explicit resorting.
Any vertex component can be index-referenced or directly inlined in the command stream. This enables efficient data processing by the main processor without requiring the main processor""s output to conform to the graphics display data structure. For example, lighting operations performed by the main processor may generate only a color array from a list of normals and positions by loop-processing a list of lighting parameters to generate the color array. There is no need for the animation process to follow a triangle list display data structure, nor does the animation process need to reformat the data for display. The display process can naturally consume the data provided by the animation process without adding substantial data reformatting overhead to the animation process.
On the other hand, there is no penalty for sorting the vertex data in display order; the vertex data is efficiently presented to the display engine in either case, without the vertex cache significantly degrading performance vis-a-vis a vertex presentation structure optimized for presenting data presorted for display.
In accordance with a further aspect provided by this invention, the vertex data includes quantized, compressed data streams in any of several different formats (e.g., 8-bit fixed point, 16-bit fixed point, or floating point). This data can be indexed (i.e., referenced by the vertex data stream) or direct (i.e., contained within the stream itself). These various data formats can all be stored in the common vertex cache, and subsequently decompressed and converted into a common format for the graphics display pipeline. Such hardware support of flexible types, formats and numbers of attributes as either immediate or indexed input data avoids complex and time-consuming software data conversion.