1. Technical Field
The present invention relates to a system and method for improving the page crossing performance of a data prefetcher. More particularly, the present invention relates to a system and method for identifying a time at which a data stream's prefetched cache lines approach the end of a real page, and aggressively prefetch the data stream's subsequent cache lines that resume on a different real page of data.
2. Description of the Related Art
Microprocessors use a prefetch engine to anticipate upcoming program requirements for data located in distant caches and system memory. By using a prefetch engine, a microprocessor prefetches data and locates the data in local cache when the program calls for the data. This mitigates substantial latency associated with retrieving the data if, instead, the microprocessor waits until the program calls for the data.
A microprocessor may use a data stream prefetch engine to detect data streams, which may be defined as a sequence of storage accesses that reference a contiguous set of cache lines in an increasing or decreasing manner. In response to detecting a data stream, a prefetch engine is configured to begin prefetching data up to a predetermined number of cache lines ahead of the data currently in process. Data stream prefetch engines, however, are often implemented such that they prefetch only up to the end of a real page of data, which greatly simplifies their implementation due to the fact that the prefetch engines are not required to track effective addresses of the data streams or retranslate addresses within a data stream.
A challenge found is that when a data stream crosses a page boundary, the continuation of the stream in a new real page appears like a new data stream altogether. In current implementations, a new stream startup profile is typically “conservative” in nature. A conservative startup profile does not start prefetching cache lines until the application loads data that causes at least two consecutive cache lines to miss the cache, and begins prefetching ahead of the line currently being loaded by the program slowly, gradually prefetching farther ahead as the application loads more and more lines along the stream until it is prefetching the desired lines ahead of the demand loads.
For example, when an existing prefetch engine is in a conservative, or “normal” startup profile and a program loads line “I,” the prefetch engine speculates that the program might next load line I+1 and record the address for line I+1 in a prefetch request queue. When the program sometime later loads an address in line I+1, the prefetch logic detects the load and might send an L1 prefetch for line I+2 while also setting the address in the prefetch request queue to line I+2 (advancing the address in the request queue to the next line it expects the program to load). When the program later loads an address from line I+2, the prefetcher might send an L1 prefetch for line I+3, and L2 prefetches for lines I+4 and I+5, etc. Note, however, that the prefetch engine does not prefetch across a page boundary. As such, once the prefetch engine reaches a page boundary, the prefetch engine terminates its prefetching.
A challenge found is that for a long stream that crosses one or more page boundaries, the data stream interruption and the process of reacquiring the data stream impairs performance by adding stalls equal to multiples of the memory latency due to added cache misses at the beginning of each page and the re-ramping of the prefetches.
What is needed, therefore, is a system and method that improves the page crossing performance of a data prefetch engine.