Die-stacking technology enables multiple layers of Dynamic Random Access Memory (DRAM) to be integrated with single or multicore processors. Die-stacking technologies provide a way to tightly integrate multiple disparate silicon die with high-bandwidth, low-latency interconnects. The implementation could involve vertical stacking as illustrated in FIG. 1A, in which one or more DRAM layers 100 are stacked above a multicore processor 102. Alternately, as illustrated in FIG. 1B, a horizontal stacking of the DRAM 100 and the processor 102 can be achieved on an interposer 104. In either case the processor 102 (or each core thereof) is provided with a high bandwidth, low-latency path to the stacked memory 100.
Computer systems typically include a processing unit and one or more cache memories. A cache memory is a high-speed memory that acts as a buffer between the processor and main memory. Although smaller than the main memory, the cache memory typically has appreciably faster access time than the main memory. Memory subsystem performance can be increased by storing the most commonly used data in smaller but faster cache memories.
When the processor accesses a memory address, the cache memory determines if the data associated with the memory address is stored in the cache memory. If the data is stored in the cache memory, a cache “hit” results and the data is provided to the processor from the cache memory. If the data is not in the cache memory, a cache “miss” results and a lower level in the memory hierarchy must be accessed. Due to the additional access time for lower level memory, data cache misses can account for a significant portion of an application program's execution time.
In order to reduce cache miss rates, various prefetching techniques have been developed. Prefetching involves fetching data from lower levels in the memory hierarchy and into the cache memory before the processor would ordinarily request the data be fetched. By anticipating processor access patterns, prefetching helps to reduce cache miss rates. However, when die-stacked DRAM memory is used as a large last-level cache with row-based access, high access latency may result due to the activation, read and pre-charge command sequences that are typically required. Such high latency causes techniques like pre-fetching to become less effective, since data is not considered to be cached until it physically resides in the cache.