In an effort to improve and optimize performance of processor systems, many different prefetching techniques (i.e., anticipating the need for data input requests) are used to remove or “hide” latency (i.e., delay) of processor systems.
Prefetching addresses the memory latency problem by fetching data into processor caches prior to their use. To prefetch in a timely manner, the processor must materialize a prefetch address early enough to overlap the prefetch latency with other computations and/or latencies. For either hardware-based or software-based strategies, prefetching for linked data structures (LDSs) remains a major challenge because serial data dependencies between elements in an LDS preclude timely materialization of prefetch addresses. On the other hand, when accessing a data array structure where the address of subsequent objects may be calculated from the base of the data array structure, loops may be unrolled and techniques such as stride prefetching may be performed to avoid cache misses while iterating through the data array structure. These array prefetching techniques assume that the address of subsequent objects may be calculated using the base of the data array structure. However, most LDSs do not have layout properties that may be exploited by stride prefetching techniques. Further, the gap between processor and memory speeds continues to increase. As a result, managed runtime environments (MRTEs) may encounter difficulties when attempting to insert prefetch instructions properly to reduce latencies while traversing LDSs.