The present invention relates in general to data processing, and in particular, to decreasing data access latency in a data processing system by providing early data from a lower level cache memory.
In high performance computer systems, the design trend over many years has been to scale systems to ever larger numbers of processor chips, each having an ever increasing number of processor cores. Increasing the number of processor cores increases the volume of data consumed by execution of the processor cores, and accordingly places pressure on external data storage devices (e.g., dynamic random access memory (DRAM), magnetic and optical disks, flash drives, storage area networks (SANs), etc.) and the associated interconnects to supply the required volume of data.
In particular, DRAM access latency, while continuing to slowly improve over recent years, has not kept pace with increases in processor core clock rates. Consequently, external memory access latency, as measured relative to processor clock rates, has actually degraded. The conventional technique for compensating for external memory access latency has been to implement larger and deeper on-chip cache hierarchies to buffer frequently used data closer to the consuming processor cores. However, limits in overall chip sizes forces a tradeoff between the number of processor cores and the amount of cache memory on the chip. Consequently, the opportunity to improve effective memory access latency simply by increasing on-chip cache capacity is limited.