Embodiments of the inventive subject matter generally relate to the field of computer architecture, and, more particularly, to persistent prefetch settings for data streams.
The available memory bandwidth in a multiprocessor system is shared among all of the processors on a chip, and is a limiting factor for performance of data intensive applications. Ensuring that the available memory bandwidth is conserved for useful work helps maximize total performance of the chip. The available memory bandwidth can be conserved by altering the state of cache lines that are anticipated to be used only once such that the cache lines are replaced sooner than a default replacement policy. Existing techniques include defining a data stream and assigning a transient property to the cache lines of the data stream.
Along with conserving available memory bandwidth, performance of a certain thread running on any one processor is also significant. The performance of a thread can be improved by employing aggressive hardware-based data prefetching. In aggressive hardware-based data prefetching, a hardware prefetcher detects a data stream and begins prefetching data for the detected stream beginning with the next line and prefetching up to a predetermined number of cache lines ahead of the data currently being processed. However, aggressive data prefetching can lead to overshoot (i.e., prefetching a number of cache lines more than required), which results in wasted memory bandwidth.