The rapid advance in semiconductor technology allows the processor speed or the aggregate processor speed on chips with multicore/manycore architectures to grow fast and steadily. The memory speed, or the data load/store performance, on the other hand, has been increasing at a snail's pace for over decades. This trend is predicted to continue in the next decade. This unbalanced performance improvement leads to one of the significant performance bottlenecks in computer architectures, known as the “memory wall” problem. Memory hierarchies have been the primary solution to bridging the processor-memory performance gap. However, due to the limited cache capacity and highly associative structure, large amount of off-chip accesses and long memory access latency still largely limit the performance. Data prefetching has been widely recognized as a companion technique of memory hierarchy solution to overcoming the memory-wall issue.
Data prefetching is a technique to fetch data for microprocessors in advance from memory systems. A data prefetcher is an on-chip hardware component that carries out data prefetching. Data prefetchers are widely adopted in microprocessor architectures to hide memory fetch latency and to overlap memory access with computation. Data prefetching techniques are widely used to bridge the growing performance gap between processor and memory. Numerous prefetching techniques have been proposed to exploit data patterns and correlations in the miss address stream. In general, the miss addresses are grouped by some common characteristics, such as program counter or memory region they belong to, into localized streams to improve prefetch accuracy and coverage. However, the existing stream localization technique lacks the timing information of misses. This drawback can lead to a large fraction of untimely prefetches, which in turn limits the effectiveness of prefetching, wastes precious bandwidth and leads to high cache pollution potentially.
Large amounts of untimely prefetches not arriving within a proper time window can result in cache pollution, bandwidth waste, and a negative impact on overall performance. In general, untimely prefetches can be categorized into two types: early prefetches and late prefetches. A prefetch is defined to be late if the prefetched data are still on the way back to the cache when an instruction requests the data. In this case, the late prefetch might not contribute much to the performance even though it is an accurate prefetch. A prefetch is defined to be early if the prefetched data are kicked out by other blocks due to the limited cache capacity before such prefetched data are accessed by the processor. Apparently, the early prefetch is not merely useless, but also imposes negative effects by causing cache pollution and waste of bandwidth. It is critical to control the number of untimely prefetches within an acceptable range to lessen the adverse impact and exploit the benefits of data prefetching.
A principle of data prefetching is that the prefetcher is able to fetch the data from a lower level memory hierarchy to a higher level closer to the processor in advance and in a timely manner. This principle requires consideration of two critical aspects of a data prefetching strategy, what to prefetch and when to prefetch. Existing data prefetching technology has been focused on the problem of what to prefetch. The other critical issue, when to prefetch, has long been neglected. The ignorance of the timing issue of prefetches can considerably affect the prefetching effectiveness. There is a continuing need for improved prefetching.