Subject matter described herein relates generally to graphics processing.
As graphics processors scale to increasingly larger die sizes, it may be useful to integrate multiple silicon dies into a single cohesive system capable of running a single 3D context to address manufacturability, scalability, and power delivery problems. This can provide the resources required to solve multiple classes of scalability and interconnect challenges to deliver high performance on a single 3D application running on multiple dies/tiles.
One issue faced by multi-die/tile graphics processing units (GPUs) is the distribution of geometry work when a single API-level draw operation accounts for a huge amount of work. For example, some workstation applications submit single draws encompassing 32 million vertices. If processed in a traditional manner on a single GPU, this amount of work becomes a performance bottleneck that can prevent the application from fully utilizing all GPU tiles, and thus prevents effective performance scaling.
Accordingly, techniques to improve distribution of geometry work of a single draw across multiple GPU tiles may find utility, e.g., in graphics processing applications.