Processing systems typically implement a hierarchical cache complex, e.g., a cache complex that includes an L2 cache and one or more L1 caches. For example, in a processing system that implements multiple processor cores, each processor core may have an associated L1 instruction (L1-I) cache and an L1 data (L1-D) cache. The L1-I and L1-D caches may be associated with a higher level L2 cache. When an instruction is scheduled for processing by the processor core, the processor core first attempts to fetch the instruction for execution from the L1-I cache, which returns the requested instruction if the instruction is resident in a cache line of the L1-I cache. However, if the request misses in the L1-I cache (because the requested instruction is not stored there), the request is forwarded to the L2 cache. If the request hits in the L2 cache (because the requested instruction is stored there), the L2 cache returns the requested line to the L1-I cache. Otherwise, the L2 cache may request the line from a higher-level cache or main memory. Similarly, the processor core may attempt to fetch data used by the instruction from the L1-D cache, which returns the requested data if it is resident in a cache line of the L1-D cache. Otherwise, the data may be requested from a higher-level cache or main memory.
Many programs that are executed on a processing device issue instructions that reference memory locations in a repeating pattern. For example, a program may include a sequence of load or store instructions that access memory locations that are separated by the same number of bytes. Performance of the processing device can be improved by predicting one or more future accesses based on access patterns in the address stream of previous accesses. Data from the predicted memory locations can be pre-fetched from the main memory (or a higher level cache) into one or more caches such as the L1-D cache so that the data is available in the cache if subsequent instructions access the predicted memory location.
An access pattern can be defined by a stride sequence that indicates the number of bytes (typically referred to as the stride) between addresses of successive memory accesses in the access pattern. The stride sequence for the access pattern may only include one value when each memory location is separated from the previous memory location by a constant number of bytes. For example, the address stream may access the addresses A, A+16, A+32, A+48, A+64, etc. The stride sequence for this address stream is therefore +16 and the stride sequence has a length of 1. The stride sequence for a sequence of instructions may also include more than one stride. For example, the address stream may access the addresses A, A+16, A+24, A+40, A+48, A+64, A+72, A+88, A+96, etc. The stride sequence for this address stream is therefore +16, +8 and the stride sequence has a length of 2 because it includes two different strides.