Typical processor system designs use various cache techniques to minimize the effects of memory access latency on processor performance. A cache is a smaller block of memory than main memory that can be accessed faster than memory levels organized beneath it. When a block of memory is accessed from lower levels such the main memory, it can be copied into the cache. Future accesses to that memory can retrieve the data more quickly from the cache than from main memory, creating much less likelihood that the processor will stall when waiting for data to be fetched from memory.
Prefetching of data or instructions prior to explicit requests for that data from the processor is a technique that is sometimes used in conjunction with a cache, in an attempt to improve cache effectiveness. Prefetching obtains data from memory and makes it available to the processor in the cache before potential accesses to the data occur from the processor executing instructions, thus reducing memory latency. Cache lines (contiguous blocks of data in a cache, each fetched as a unit) can be brought into the cache preemptively, before a demand miss occurs to those lines in which the data requested by the processor is not in the cache.
Requests for prefetching data are typically based on previous explicit requests. For example, if an explicit request from the processor is for a particular block in memory, then a prefetch request following that explicit request can issue a read command to memory for the next sequential block of data after the explicitly-requested block. The prefetch request is typically for a block the size of a cache line.
Prefetching can be very effective for some scenarios, such as sequential code execution, sequential data access, or stack operations. However, prefetching may actually be detrimental to performance for other scenarios, such as a function call to a short procedure, non-sequential or random data access, linked list processing, or a regular stride (distance in bytes between accesses) greater than one cache line through a large data structure. Thus, it is useful to be able to distinguish the scenarios in which prefetching is efficient, and scenarios in which prefetching is detrimental.
Prior solutions to determine the value of prefetching for particular data involves a hardware prefetch mechanism that examines a number of accesses by the processor to memory and examines access patterns. From these patterns, the prefetch mechanism could determine which data was sequentially accessed and/or likely to be accessed in the future, and prefetch that data in the detected sequence. However, a problem with this method is that the prefetch mechanism must not prefetch any data for several initial accesses, before any pattern is detected. Thus, there are several missed opportunities for prefetching data that are missed, causing less optimal performance. In addition, prior mechanisms may have difficulty in detecting a stride in the processor accesses other than one cache line. Strides of greater than one cache line might be used, but the prefetch mechanism might have to examine several accesses before an unusual stride is detected, at which point it can prefetch data at the correct stride addresses.
Cache touch instructions can potentially be useful for some of the scenarios that are not handled well with prefetching. Cache touch instructions, when executed by the processor, can prefetch data that will be needed after a few iterations, e.g., the touch prefetches data from the next cache block. However, compilers are seldom effective at using cache touch instructions.
Accordingly, what is needed is an apparatus and method for providing prefetching of data and instructions that is more reliable and efficient than the prior prefetching techniques. The present invention addresses such a need.