Field
The disclosed embodiments generally relate to techniques for improving the performance of computer systems. More specifically, the disclosed embodiments relate to the design of a processor, which includes a mechanism to facilitate efficient prefetching for scatter/gather operations.
Related Art
As the gap between processor speeds and memory performance continues to grow, prefetching is becoming an increasingly important technique for improving computer system performance. Prefetching involves retrieving cache lines from memory and placing them in cache before the cache lines are actually accessed by an application. This prevents the application from having to wait for a cache line to be retrieved from memory and thereby improves computer system performance.
Prefetching tends to work well for workloads that exhibit predictable access patterns. For these applications stride-based prefetching techniques can typically be used to predict which data items will be accessed next. However, other types of applications, for example applications associated with database operations, perform scatter/gather type memory operations that do not exhibit such predictable access patterns, and require the computing system to follow pointers to access relevant data items. (These scatter/gather memory operations are also referred to as “vector-indirect memory operations,” and the associated prefetching instructions are referred to as “vector-indirect prefetch instructions” in this specification and the appended claims.)
Prefetching can also be used to improve the performance of these scatter/gather operations. However, performing prefetching for scatter/gather operations involves performing a large number of lookups in a translation-lookaside buffer (TLB) to translate virtual addresses into corresponding physical addresses. This can potentially create performance problems because performing numerous TLB lookups for prefetching operations can interfere with other non-prefetch-related accesses to the TLB. Moreover, many TLB lookups for scatter/gather operations are unnecessary because target operands for scatter/gather operations tend to be located on the same virtual memory page, so many of these TLB lookups will simply access the same TLB entry.
Hence, what is needed is a technique for facilitating prefetching operations for scatter/gather operations without performing unnecessary TLB accesses.