A typical computing system includes a central processing unit (CPU), a graphics processing unit (GPU), a high-capacity memory subsystem, and set of interface subsystems. To achieve generational increases in system performance, sequential generations of GPU devices require increasing degrees of performance and integration. Conventional GPU devices typically achieve higher degrees of performance and integration by implementing an increasing number of graphics processing cluster (GPC) partitions and associated frame buffer (FB) partitions on a single die or “chip.” The GPC partitions are typically coupled to the FB partitions through a crossbar circuit. Cache memory may also be added to each chip.
Measures of die area for GPU devices have increased over time, as more GPC partitions and more FB partitions, each of increasing complexity, are integrated into a single GPU chip. One advantage of integrating multiple partitions and other subsystems onto a single die is that high-performance may be achieved by scaling conventional design techniques and leveraging advances in fabrication technology that enable greater circuit density.
However, one disadvantage of simply integrating more circuitry onto a single chip is that manufacturing cost for the chip typically increases disproportionately with respect to die area, increasing marginal cost associated with each additional GPC or FB. More specifically, manufacturing cost for a given chip is typically a strong function of die area for the chip. In many cases, die area associated with highly-integrated CPU devices is well above a characteristic cost knee, leading to disproportionate cost inefficiencies associated with fabricating advanced CPU chips.
Thus, there is a need for improving CPU architecture, a or other issues associated with the prior art.