As computer devices and systems continue to advance and become more complex, effective and efficient data transfer between the various components in computer systems have become more and more critical in system design and implementation. In particular, considerable effort and research has been focused on various mechanisms of information prefetching in computer systems to improve system performance. Prefetching is basically a technique used to hide memory latency in computer systems. For example, instructions and data can be prefetched from long-latency memory devices (e.g., main memory or external memory devices) to short-latency memory devices (e.g., cache memory devices that reside on the processors) so that instructions and data that are needed by execution units of the processors are readily available from the memory devices that are closer to the execution units. Therefore, prefetching can substantially reduce memory access time, which in turn improves the overall system performance. In general, hardware pre-fetching can be an effective technique to hide memory latency in various computing devices such as microprocessors (e.g., to improve performance of both integer and floating point applications). The advantages of having the pre-fetch hardware inside the processor include: (1) data can be brought closer the execution units (also called execution engine herein); (2) the pre-fetch predictor can obtain detailed information of a stride based on the linear address, instruction pointer, branching information, and the thread information; and (3) large cache arrays inside the processor are the cost-effective and logical place to store the pre-fetched data.
Recently, prefetching hardware has also been added to memory controllers in various chipset devices that are designed and implemented by Intel Corporation of Santa Clara, Calif. One of the advantages of having a prefetcher in a memory controller is that the memory controller can monitor the main memory bandwidth closely and utilize the unused memory bandwidth to prefetch information without significantly impacting the system performance. Alternatively, the memory controller can also throttle the prefetch traffic if memory bandwidth utilization for demand fetches is high. The prefetch logic or mechanism (also called prefetcher) in the memory controller has almost no visibility into the CPU where the actual execution of programs is taking place. As a result, the prefetch logic in the memory controller is not very effective with respect to the prediction of the next request address. For example, when the stride is not unit cacheline (i.e., the next cacheline), the prefetcher in the memory controller does not help in reducing memory access time. While the prefetcher of the memory controller does not hurt the performance of the system, it can have a negative impact with respect to power management of a power-conscious system. In addition, with increasing cacheline sizes in the processor, the next-line prefetching in the memory controller can become less effective.