1. Field of the Invention
The present invention generally relates to graphics processing and, more specifically, to a technique for improving the performance of a tessellation pipeline.
2. Description of the Related Art
A conventional graphics processing unit (GPU) includes a plurality of different processing engines configured to operate in parallel with one another to process graphics data. The graphics data could be, for example, vertex data and associated vertex attributes, among other types of graphics data. Each processing engine may implement various processing stages within a graphics processing pipeline to process the graphics data. When a given processing engine finishes processing graphics data, that processing engine may cause a fixed-function, copy-out unit to copy the processed graphics data from local memory to a memory that is shared between the different processing engines. Other processing engines may then access the processed graphics data and then perform additional processing operations with that data.
One problem with the approach described above is that the overall throughput of the graphics processing pipeline is limited by the number of copy-out units configured to copy processed graphics data to shared memory for further processing. One solution to this problem is to incorporate additional copy-out units into the GPU. However, due to space constraints associated with GPU fabrication, this solution is usually undesirable.
As the foregoing illustrates, what is needed in the art is an improved technique for sharing data across processing engines in a graphics processing pipeline.