The introduction of caches in processors has been an important step in alleviating the problem of ensuring sufficient supply of data into the processor. However, with ever increasing processor speeds and the use of massive instruction level parallelism within processors, performance may be hindered by the data fetching. While the cache is configured to retain data, cache misses (i.e., when requested data is not available and needs to be fetched from a different source) may be associated with considerable cache miss latency. In some cases, when an hierarchical cache is provided, different latency may be associated with different cache misses, depending on the hierarchical distance from the data to be fetched.
This well known problem has attracted much attention from the computer systems research community. Many hardware, software and hybrid schemes to alleviate the problem have been proposed. One example of such scheme is data prefetching. By fetching the data before it is required, the parallelism characteristics are exploited to reduce the cache miss latency overhead.
As prefetching may be associated with some overhead only a small portion of the fetch commands are usually handled by this mechanism. In some cases, delinquent loads of a program are detected and their associated overhead may be reduced using prefetching. “Delinquent loads” are instructions which require loading of data and which are associated with a considerable portion of the overhead of the program associated with cache misses. In some cases, delinquent loads are loads who often cause cache misses. Additionally or alternatively, delinquent loads may be associated with a high average of cache miss latency (e.g., data often needs to be loaded from remote cache levels or from outside of the cache all together). It will be noted the “delinquent load” is a relative term and one instruction may be a delinquent load with respect to a first program and another instruction having similar associated cache miss latencies, may not be considered as a delinquent load in a second program.
However, data prefetching is useful only in case the prefetched data is still relevant once the delinquent load instruction is performed. In some cases, data prefetch might become irrelevant in case the value in the data address used in prefetch is changed by access to the same memory address or same cache line. Data changes may be performed by the same processing entity or a different processing entity.