This invention relates to methods and architecture of a processor (or microprocessor) for prefetching from memory.
Presently, hardware initiated stride prefetching is used in microprocessors to detect accesses to memory that exhibit a striding pattern, and then prefetch cache lines into caches by predicting future memory configurations by relying on the associated striding pattern. Most of the algorithms used by the hardware rely on detecting repeated accesses to memory addresses. For example, access to memory addresses may show a striding pattern of X, X+y, X+2y where y is the stride distance. The algorithm is then employed to prefetch X+3y, etc. Some microprocessors implement aggressive algorithms to prefetch considerable data from memory. For example, where the aggressive algorithm detects a fairly repeated pattern, the algorithm may provide for prefetching the stride pattern predicted addresses until the end of a page. Usually, when prefetching a page, information regarding actual hardware implementation of the prefetch engine is required for the software. This information may be used to train the prefetch engine.
One problem with this traditional design for prefetching is that it requires that the software team understand a microarchitecture of a specific hardware prefetch engine training algorithm. This may require different code generation for different processor designs, even when architecture for the prefetching engine is unchanged. Further, this may restrict the flexibility and aggressiveness of a prefetch engine design. That is, if the prefetch engine is not well matched to the processor design, the desired performance benefit might not be obtained.
What are needed are techniques for performing reliable prefetching in a processor, while maintaining flexibility of design and providing reliable performance.