Many portable products, such as cell phones, laptop computers, personal data assistants (PDAs) and the like, utilize a processing system that executes programs, such as communication and multimedia programs. A processing system for such products may include multiple processors, complex memory systems including multi-levels of caches for storing instructions and data, controllers, peripheral devices such as communication interfaces, and fixed function logic blocks configured, for example, on a single chip. At the same time, portable products have a limited energy source in the form of batteries that are often required to support high performance operations by the processing system. To increase battery life, it is desirable to perform these operations as efficiently as possible. Many personal computers are also being developed with efficient designs to operate with reduced overall energy consumption.
In order to provide high performance in the execution of programs, data prefetching may be used that is based on the concept of spatial locality of memory references and is generally used to improve processor performance. By prefetching multiple data elements from a cache at addresses that are near to a fetched data element or are related by a stride address delta or an indirect pointer, and that are likely to be used in future accesses, cache miss rates may be reduced. Cache designs generally implement a form of prefetching by fetching a cache line of data for an individual data element fetch. Hardware prefetchers may expand on this by speculatively prefetching one or more additional cache lines of data, where the prefetch addressing may be formed based on, sequential, stride, or pointer information. Such hardware prefetcher operation for memory intensive workloads, such as processing a large array of data, may significantly reduce memory latency. However, data prefetching is not without its drawbacks. For example, in a software loop used to process an array of data, a data prefetcher circuit prefetches data to be used in future iterations of the loop including the last iteration of the loop. However, the data prefetched for the last iteration of the loop will not be used and cache pollution occurs by storing this data that will not be used in the cache. The cache pollution problem is compounded when loops are unrolled.