In order to reduce the latency associated with accessing data stored in main memory, processors typically use a memory hierarchy which comprises one or more caches. There are typically two or three levels of cache, denoted L1, L2 and L3 and in some examples the first two caches (L1 and L2) may be on-chip caches which are usually implemented in SRAM (static random access memory) and the third level of cache (L3) may be an off-chip cache. In other examples, such as in a System on Chip (SoC), all the memory may be implemented in the same piece of silicon. The caches are smaller than the main memory, which may be implemented in DRAM, but the latency involved with accessing a cache is much shorter than for main memory, and gets shorter at lower levels within the hierarchy (with the L1 cache being considered the lowest level cache). As the latency is related, at least approximately, to the size of the cache, a lower level cache (e.g. L1) is typically smaller than a higher level cache (e.g. L2).
When a processor, or more particularly the MEM stage of the processor operation, accesses a piece of data or an instruction, the piece of data or instruction is accessed from the lowest level in the hierarchy where it is available (where the lowest level is the level closest to the processor). For example, a look-up will be performed in the L1 cache and if the item (i.e. data/instruction) is in the L1 cache, this is referred to as a cache hit. If however, the item is not in the L1 cache, this is a cache miss and the next levels in the hierarchy are checked in turn until the item is found (e.g. L2 cache, followed by L3 cache, if the item is also not in the L2 cache). In the event of a cache miss, the item is brought into the cache.
The traversing of the memory hierarchy which results from a cache miss in the lowest level cache (e.g. L1 cache) introduces a latency and to overcome this, processors may fetch data and/or instructions ahead of when they are required and this process is referred to as ‘pre-fetching’. The pre-fetching may be of items (i.e. data/instructions) which are definitely going to be required by the processor in the future, items which may be required by the processor if a particular branch is taken in a program and/or items which are pre-fetched based on an alternative prediction method. Branch prediction may be used to predict which branch is likely to be taken and reduce the amount of wasted pre-fetching (i.e. where an item is pre-fetched, but is not actually used by the processor).
Out-of-order processors, for example, use branch prediction and speculative pre-fetching to allow the instructions in the predicted branch to be speculatively executed out-of-order.
The embodiments described below are not limited to implementations which solve any or all of the disadvantages of known methods of managing pre-fetch traffic.