Delivering competitive processor graphics subsystem relies on an efficient and scalable graphics architecture. Scalability is needed to support a range of graphics performance points using a single architecture and limited design resources. Parallelizing the graphics workloads across multiple identical graphics cores typically provides such scalability.
One problem inherent in the architecture of parallel graphics pipelines is efficiently parallelizing both the geometry processing and rasterization stages of the pipeline. Parallelizing both stages is desirable given that either stage can become the performance bottleneck at different times during the processing of a single frame. Unfortunately, parallelizing geometry processing most naturally requires distribution of geometric objects across the graphics cores, while parallelizing rasterization most naturally requires distribution of image space (e.g., the frame buffer) across the graphics cores. However, the correlation between geometric objects and their image space footprint is not known a priori. In addition, geometric objects need to be rasterized in the same temporal order as they are submitted for geometry processing. Therefore a mechanism needs to assemble, reorder and distribute the results of parallel geometry processing across the parallel rasterization pipelines with high performance/power and performance area characteristics, and in a fashion that does not require significant software intervention.
A conventional graphics system including multiple, parallel graphics cores and capable of distributed rasterization through use of CheckerBoard Rendering (CBR) schemes, may subdivide a target surface (e.g., the frame buffer) into small rectangular regions. Non-overlapping regular grid subsets of these regions may then be assigned to each graphics core such that all target surface pixels are assigned. Rasterization pipelines in the graphics cores may then operate parallel such that, for each rasterized object, a rasterization pipeline will render only those pixels contained within its subset of the target surface pixels. In current CBR schemes, each graphic core performs geometry processing for all submitted geometric objects, passing the results to only its internal rasterization pipeline. Because geometry processing is replicated across the graphics cores in such conventional systems, there is no positive scaling of geometry processing rate with the number of graphics cores.