Sequential generations of computing systems typically require higher performance and, in many cases, reduced size and reduced overall power consumption. A typical computing system includes a central processing unit, a graphics processing unit, and a high-capacity memory subsystem, such as one or more dynamic random access memory (DRAM) devices. To achieve a high level of integration and miniaturization, conventional computing systems integrate one or more general-purpose central processing unit (CPU) cores and one or more graphics processing unit (GPU) cores on a single processor system chip that is coupled to one or more DRAM chips. One or more hierarchical tiers of high-speed cache memory are typically implemented to reduce relatively long average latencies associated with accessing data stored in DRAM. A first level cache is typically disposed in close physical proximity to each core within the processor system chip to provide relatively fast access to cached data. Additional cache memory levels may be integrated in the processor system chip, at increasing physical distance from each core to provide larger, but typically slightly slower cache memory pools between each first level cache and DRAM.
Conventional on-chip interconnect signaling is characterized as having relatively slow propagation velocity, even at higher metal levels. The relatively slow propagation velocity becomes increasingly significant for longer on-chip traces required to interconnect processor cores to cache memories. As a consequence, increasing physical distance between the cache memories and related processor cores also increases access latency, which can lead to an overall reduction in system performance.
Thus, there is a need for improving signaling and/or other issues associated with the prior art.