1. Field of the Invention
The present invention generally relates to graphics processing and more specifically to the parallel detection of duplicate vertex indices and batching of the vertex indices for primitive processing.
2. Description of the Related Art
Conventional graphics processors have processed primitives at a rate of one triangle per clock while maintaining the applications programming interface (API) specified primitive order. Graphics primitives, such as triangles are received in an ordered stream for processing. The DirectX and OpenGL APIs require that the primitive order be maintained, that is to say, the primitives must be rendered in the order presented by the application program. This requirement ensures that intersecting primitives will be rendered deterministically to avoid the introduction of visual artifacts. Additionally, the primitive order is maintained since the results of rendering a later primitive may be dependent on the state of the frame buffer resulting from rendering an earlier primitive, particular when blending is used.
While some graphics processors have used parallelism to increase processing throughput, these systems typically do not maintain primitive ordering at full speed. Alternatively, these systems operate at slower rates, well below one triangle per clock, when primitive ordering is maintained. In order to process more than one triangle per clock the system needs to organize multiple vertices into primitives at a rate of more than three vertices that are each referenced by indices per clock. The vertex data is stored in a cache that is accessed using the indices. Typically, a content addressable memory (CAM) is used to compare the incoming indices with the existing indices of the vertex data that is resident in the cache. Performing a comparison of three or more indices in a single clock is difficult since each subsequent index must be not only compared with the existing indices, but must also be compared with the previous incoming indices. A critical timing path for determining whether or not the last incoming index is a cache hit or miss limits either the clock rate or the number of indices that may be compared in parallel. Consequently, the number of triangles (specified by the vertex indices) that can be processed in parallel may be limited by the parallel vertex index comparison operation.
Accordingly, what is needed in the art is a system and method for exceeding a processing rate of one triangle per clock while maintaining API primitive ordering. In order to exceed the primitive processing rate of one triangle per clock, multiple primitives must be assembled for processing in a single clock cycle.