Field of the Invention
The present invention generally relates to GPU drivers and, more specifically, to batching vertices of a primitive prior to routing the vertices to a GPU.
Description of the Related Art
In conventional graphics processing, the OpenGL application programming interface (API) includes an explicit API-visible Begin command and End command that encompass primitive draw commands that cause a graphics processor to render primitives. A driver receives the Begin command and subsequently receives a stream of vertices that comprise the primitives to be rendered followed by the End command. The driver may want to arrange the vertices specified by the application into regular batches for optimal processing by parallel graphics processing units (GPUs) and for other performance optimizations. The driver is unaware, however, of the length of the stream of vertices, which causes a number of issues.
For example, a driver in the current art might store the vertex data specified by the application in a vertex buffer that can be directly accessed by graphics hardware. Instead of passing the vertex data directly to the graphics processor, the driver instead passes a single index per vertex, which will be used to identify the location of that vertex's data in the vertex buffer. While building these batches, there are several good reasons for the driver to want to limit the batch size. Such a limit will permit smaller allocations for the vertex buffer and reduces data transfer by passing compact indices to the GPU. In the current art, if each vertex is indexed by sixteen bits, and the stream eventually exceeds 65,536 vertices, then each index of received vertices included in the stream must be updated to thirty-two bits so that the driver may properly index the remaining vertices included in the stream. Such increases require an increased amount of storage space and also reduce the effectiveness of hardware that is optimally configured to interact with sixteen-bit indices.
Even in cases where the vertices specified in a primitive are specified in regular form and with a vertex count known when the primitive is first specified, splitting large primitives into batches may still be desirable. Batching permits optimizations that skip processing of portions of the primitive that are not visible to the end-user, and also allows for state changes in the middle of a primitive.
Accordingly, what is needed in the art is a technique for transforming a set of primitives into a collection of batches with a reduced number of vertices in each batch, while still preserving the semantics of the original API command stream.