A primary factor in the utility of a computer system is its speed in executing application programs. Thus, it is important to provide software instructions and data to a processor (e.g., central processing unit, or CPU) at least as fast as the rate at which the CPU executes such instructions and data. Failure to provide the needed instructions/data results in the CPU idling, or stalling, as it waits for instructions. Modern integrated circuit fabrication technology has enabled the production of CPUs that function at very high speeds (e.g., 2 gigahertz and above). Consequently, it has become challenging for system designers to ensure that the needed instructions/data are provided to a modern high-speed CPU from the system memory without imposing substantial CPU idle time penalties.
A widely used solution for reducing CPU stall time involves the incorporation of highly optimized memory caches within the CPU die. In general, a memory cache is used to speed-up data transfer. Memory caches are well known and widely used to speed-up instruction execution and data retrieval. These caches serve as staging areas, and are optimized to reduce data access latency in comparison to system memory. In addition to the incorporation of caches, various prior art memory prefetch schemes have been implemented to further reduce data access latency. However, modern high-speed CPUs are rendering even the most elaborate prior art caching/prefetching schemes inadequate.