Prefetching data from memory into a buffer is a common approach for reducing the effects of memory latency during load operations in processing systems. Common prefetching techniques are broadly classified into two types: prediction prefetching or precomputation prefetching. Prediction prefetching techniques rely on the context of the data accesses to predict and prefetch data. Prediction prefetching techniques are particularly advantageous when prefetching data that has regular access patterns, as frequently found in numerical and scientific applications. An exemplary prediction prefetching technique includes a stride-based prefetching technique that utilizes a stride value that defines the identified access pattern.
In contrast, conventional precomputation prefetching techniques rely on the execution of a version of the main program at a separate hardware engine so as to run ahead of the execution of the main program at the main processing engine. Precomputation prefetching techniques are grouped into two types: coupled techniques or decoupled techniques. Coupled precomputation prefetching techniques rely on the execution of a pre-marked instruction in the main program to trigger the precomputation execution. As a result, coupled precomputation prefetching techniques typically cannot prefetch in time for programs that have little time between the trigger and when the prefetched data is needed. Such instances are common in processing systems that utilize register renaming and out-of-order execution that results in a shortened time between the loading of values and their use in the program. Conventional decoupled precomputation techniques have been designed in an attempt to overcome the timeliness problem present in coupled techniques. These conventional techniques allow a prefetch engine to execute several iterations ahead of the program at the main processor. While these conventional decoupled precomputation prefetching techniques can be relatively effective for programs that have a static traversal order along data structures, these conventional techniques fail to account for instances where the traversal path changes between access iterations. Accordingly, improved techniques for prefetching data in a processing system would be advantageous.