Many computer processors now include cache systems physically integrated into the processors. When data from a memory is requested, the processor first checks the cache to determine if the data is already stored in the cache. When the data is not in the cache, resulting in a “cache miss,” the processor typically accesses a central memory, retrieves the desired data, and stores that data in the cache for future use. Retrieving information from the central memory generally requires accessing a transfer bus linking the processor to other system components. Therefore, it typically takes longer to retrieve data from the memory than from the cache, and this increase in time results in a higher latency for the processor. Higher latencies mean lower performance rates for the processor since the processor is waiting to receive the data from the memory, rather than performing other functions.
One approach to reducing latency is to prefetch a block of data from the memory and store it in a cache residing in the central processing unit. When a processor prefetches data, the processor not only retrieves the requested data from the memory, but also data that has not yet been requested but which might be requested in the future. All of the retrieved data is then stored in the cache residing in the central processing unit. If and when the processor actually requests the additional data that was prefetched from the memory, the processor retrieves the data from the cache rather than from the memory.
Several problems exist with current cache systems. One problem is that caches physically located on the processor chip are typically very shallow due to the demand for chip space by other physical features of the processor. The small size means that only a limited amount of data can be stored in the cache. The small size of the on-chip cache often leads to the use of intelligence, which is used to control the behavior of the cache and to maximize the cache's performance. For example, the intelligence may determine the data that is most likely to be requested in the future and prefetch that data accordingly. However, the intelligence also adds complexity and expense to the design of the processor.
Another disadvantage of locating the cache in the central processing unit is the added traffic created on the central processing unit transfer bus. In order to prefetch multiple pieces of data from the memory, the processor makes multiple memory transfer requests to retrieve the desired data. The processor is unable to carry out other instructions while the processor is issuing the memory transfer requests. The memory transfer requests take time and further reduce the performance of the processor. Furthermore, these multiple transfer requests put a heavy load on the transfer bus.