1. Field of the Present Invention
The present invention is in the field of microprocessors and, more particularly, processors that employ data prefetching.
2. History of Related Art
Hardware data prefetchers have been used in recent microprocessors to anticipate and mitigate the substantial latency associated with retrieving data from distant caches and system memory. This latency, which is the total number of processor cycles required to retrieve data from memory, has been growing rapidly as processor frequencies have increased faster than system memory access times.
Stream hardware data prefetchers have been used to detect data streams. A stream may be defined as any sequence of storage accesses that reference a contiguous set of cache lines in a monotonically increasing or decreasing manner. In response to detecting a data stream, hardware prefetchers are configured to begin prefetching data up to a predetermined number of cache lines ahead of the data currently being processed.
Prior art stream prefetch mechanisms include support for software instructions to direct or control certain aspects of the prefetch hardware including instructions to define the beginning and the end of a software stream, when prefetching should be started, and the total number of outstanding L2 prefetches allowed at any time. While these instructions are useful, the most effective depth of prefetching in a high latency multi-processor system depends upon a number of factors such as the number of other streams currently being prefetched and the rate of consumption of each of those streams by the executing software programs. For example, the optimal prefetch depth in an environment where multiple code sequence are interleaving the access to ten streams at equal consumption rates would be smaller than the optimal depth for code that is accessing only one data stream, but with a much higher consumption rate. For the latter case, if the prefetch request rate is too low (i.e., the prefetch depth is too low), the performance of the code will be sub-optimal due to the exposed latency of not prefetching far enough ahead. As another example, a code sequence that includes two streams where one stream has a much higher consumption rate than the other stream will be difficult to optimize in conventional prefetching hardware that does not permit dynamic and stream-by-stream prefetch control. It would be desirable, therefore, to implement a microprocessor that included stream dependent prefetch control.