In graphics processing, much data is managed in order to provide a resultant image on a computer display. One form of such data includes vertex data that comprises information for displaying triangles, lines, points or any other type of portions of an image on the computer display. Prior Art Table 1 includes an example of typical vertex data.
Prior Art Table 1position (X Y Z W)diffuse (R G B A)specular (R G B F)texture0 (S T R Q)texture1 (S T R Q)
Together, multiple sets of such vertex data are used to represent one of the portions of the image. In order to accomplish this, each vertex, on average, requires 40 bytes of memory storage space. During conventional system operation, over 10 million vertexes are typically exchanged every second during processing. This results in a data transfer rate of 400 MB/s
During the processing of vertex data, various components of a system come into play. Prior Art FIG. 1 illustrates an example of a system 10 that processes vertex data. As shown, included are a processor 12, system memory 14, a graphics accelerator module 16, and a bus 18 for allowing communication among the various components.
In use, the processor 12 locates the vertex data in the system memory 14. The vertex data is then routed to the processor 12, after which the vertex data is copied for later use by the graphics accelerator module 16 by the module 16 directly accessing the system memory 14. The graphics accelerator module 16 may perform various operations such as transform and/or lighting operations on the vertex data.
As mentioned earlier, a typical data transfer rate of 400 MB/s is required in current systems to process the vertex data. During the transfer of such data in the system 10 of Prior Art FIG. 1, the bus 18 connecting the processor 12 and the graphics accelerator module 16 is required to handle an input data transfer rate of 400 MB/s along with an output data transfer rate of 400 MB/s.
As such, the foregoing bus 18 must accommodate a data transfer rate of 800 MB/s while handling the vertex data. Conventionally, such bus 18 is 64 bits wide and the processor 12 runs at about 100 MB/S. Therefore, the bus 18 is often strained during use in the system 10 of Prior Art FIG. 1. Further, with data transfer rates constantly rising, processors will soon not be able to be used to copy vertex data.
A conventional graphics accelerator module 16 has the ability to read vertex data by one of two means. The vertex data can be supplied either in-band through the command stream of the graphics accelerator module 16 can be configured to read vertex data supplied in-band in its command stream.
In-band vertex data may be presented in an “immediate mode” as illustrated below:
SetDiffuseColor0R0G0B0Vertex0X0Y0Z0SetDiffuseColor1R1G1B1Vertex1X1Y1Z1
The single indented lines (SetDiffuseColor, Vertex) represent in a preferred embodiment 32-bit command tokens. The multiple doubly indented lines (RGB, XYZ) following the command tokens in a preferred embodiment are multiple 32-bit words of data for the preceding command. The above example involves sending data for two immediate mode vertices.
The in-band vertex data may also be packed together based on an “inline” vertex array format. First the format for vertex data is defined by commands in the command stream. For example:
SetDiffuseColorArrayFormat                expect 3 RGB floats        
SetVertexArrayFormat                expect 3 XYZ floats, stride is six floats        
Then vertex data can be sent more efficiently with minimal command token overhead because the format of the vertex data is pre-established by the inline format.
Then vertex data can be sent like:
InlineArray                R0        G0        B0        X0        Y0        Z0        R1        G1        B1        X1        Y1        Z1        
In this example, the extra overhead from SetDiffuseColor and Vertex command tokens is eliminated when using inline data.
A copending application entitled, “System, Method and Article of Manufacture for Allowing Direct Memory Access to Graphics Vertex Data While Bypassing a Processor”, filed on Dec. 16, 1999 and assigned to the assignee of this application, a second means for reading vertex data, is disclosed. In this application, the graphics accelerator module is supplied with the vertex array offsets and strides in addition to the vertex array formats. The offsets are relative to the beginning of a pre-established region of memory shared between the graphics accelerator module and CPU. This memory is often a high-bandwidth uncached AGP (Advanced Graphics Port) memory though it may also be “video memory” within the graphics hardware or cached system memory. The CPU is responsible for writing vertex data into this memory region. This shared memory region is called a “vertex array range”.
Accordingly, rather than passing the vertex data in-band through the command stream, the command stream contains only vertex indices that indicate where the graphics accelerator module should read the corresponding vertex data for the given vertex index. The vertex array format, stride, and offset provide the information necessary to read the data for a specified vertex index out of the current vertex array range.
In this approach, the format, offset, and stride is first defined by commands in the command stream. For example:
SetDiffuseColorArrayFormat                expect 3 RGB floats, stride is 6 floats        
SetDiffuseColorArrayOffset                200 bytes from the shared memory region beginning        
SetVertexArrayFormat                expect 3 XYZ floats        
SetVertexArrayOffset                212 bytes from the shared memory region beginning        
Then the vertex data is written into the vertex array range. For example:
200R0204G0208B0212X0216Y0220Z0224R1228G1232B1236X1240Y1244Z1
where the given vertex data components are written at the indicated byte offsets from the beginning of the vertex array range.
Once the vertex array range is set up in the manner described, the graphics accelerator module can much more efficiently generate vertices by reading the required vertex data from the vertex array range via an ArrayElement command token as necessary rather than reading all the vertex data in-band through the command stream.
For example:
ArrayElement                0        1        
This simple command would instruct the graphics accelerator module to read the vertices (X0, Y0, Z0, R0, G0, B0) and (X1, Y1, Z1, R1, G1, B1) from the vertex a vertex indices in the graphic accelerator module's command stream is substantially more efficient than sending the vertex data inline for several reasons.
Firstly, three dimensional (3D) models are represented as meshes of vertices, where the triangles making up the models tend to share vertices, and therefore share the same vertex data. Consider a cube with six faces. Each square face is formed by two triangles. Each triangle has three vertices. If the cube is drawn as twelve (12) independent triangles, the vertex data must be supplied thirty-six (36) times even though a cube has only eight (8) unique vertices.
Using a vertex array range as described above, the data for the 8 unique vertices can be copied into the vertex array range once. Then the 36 vertex indices can be sent to the graphics accelerator module via its command stream. If each vertex is a 6 32-bit float and each vertex index is a 16-bit value, this is a 12-fold reduction in the data that must be written by the CPU and read by the graphic accelerator modules through the graphic accelerator module's command stream.
The second advantage of using a vertex array range is that the graphic accelerator module can cache the reads it performs to the vertex array range so if vertex data from the vertex array range is already in the graphic accelerator module's vertex cache, the data does not have to be read again. Often vertex data is read into the cache, and subsequently, other data from the same cache line is often read for different vertices.
The third advantage of using a vertex array range is that if a vertex is transformed, its transformed results can be cached. If the same vertex index is issued again and the vertex index's transformed results are in the post-transform vertex cache (and no subsequent transform state settings have changed), the transformed vertex can be fetched from the cache rather than re-transforming the vertex again.
The cache in the second stated advantage is typically a memory-based cache of pre-transformed vertex data. The cache in the third stated advantage is a vertex index-based cache of post-transformed vertex data. Both caches provide substantially gains in the efficiency of vertex transformation and reduced bandwidth required for vertex data.
If a three-dimensional (3D) application is only required to draw static objects, the vertex data for all the static objects to be rendered can be written into the vertex array range. At this point, the objects can be rendered by configuring the vertex array range offsets, strides, and formats and sending vertex indices through the graphic accelerator module's command stream.
While some applications involve rendering static objects, games and other interactive 3D applications often render dynamic geometry such as animated characters in expansive virtual worlds. The vertex data is dynamic either because the 3D objects represented are animating in ways that require the vertex data to be updated repeatedly or the virtual world is so expansive that the entire world cannot be statically contained in the vertex array range. In these cases, the CPU is responsible for copying vertex data into the vertex array range on a continuous basis. The problem is that vertex data is not immediately read from the vertex array range when the ArrayElement command tokens are written into the graphic accelerator module's command stream. The command stream is a queue and so any previous commands must be processed before the ArrayElement command tokens are processed causing the vertex data to be read from the vertex array range and be transformed.
In practice, there is typically a substantial delay from when vertex data is written to the vertex array range and the ArrayElement command tokens are written in the command stream to when the vertex data is read from the vertex array range. The CPU is responsible for not modifying the sections of the vertex array range corresponding to vertex indices placed in the graphic accelerator module's command stream until the graphic accelerator module is finished reading the vertex data for the indices.
If the CPU fails to synchronize its writes to the vertex array range to sections that contain vertex data for pending vertex indices yet read, the result is non-deterministic corruption of the vertex data for the vertices being rendered. While this is not a fatal error, the result is incorrect rendering that is typically extremely corrupted and unacceptable. Correct rendering therefore requires proper synchronization between the CPU and the graphic accelerator module.
What is desired is an efficient synchronization mechanism so that the CPU can know when it is safe to re-write sections of the vertex array range that correspond to vertex indices that have been written into the graphic accelerator module's command stream.
The present invention addresses such a need.