In order to reduce the latency associated with accessing data stored in main memory, processors typically have one or more caches, as shown in the example memory hierarchy 100 in FIG. 1. There are typically two levels of on-chip cache, L1 102 and L2 104 which are usually implemented with SRAM (static random access memory) and one level of off-chip cache, L3 106. The caches are smaller than the main memory 108, which may be implemented in DRAM, but the latency involved with accessing a cache is much shorter than for main memory, and gets shorter at lower levels within the hierarchy (i.e. closer to the processor). As the latency is related, at least approximately, to the size of the cache, a lower level cache (e.g. L1) is smaller than a higher level cache (e.g. L2).
When a processor, or more particularly an ALU (arithmetic logic unit) within a processor, accesses a data item, the data item is accessed from the lowest level in the hierarchy where it is available. For example, a look-up will be performed in the L1 cache 102 and if the data is in the L1 cache, this is referred to as a cache hit. If however, the data is not in the L1 cache (the lowest level cache), this is a cache miss and the next levels in the hierarchy are checked in turn until the data is found (e.g. L2 cache 104, followed by L3 cache 106, if the data is also not in the L2 cache). In the event of a cache miss, the data is brought into the cache (e.g. the L1 cache 102) and if the cache is already full, a replacement algorithm may be used to decide which existing data will be evicted (i.e. removed) in order that the new data can be stored. Typically, this replacement algorithm selects the least-recently used (LRU) line within the cache.
In a multi-threaded processor, some of the resources within the processor are replicated (such that there is an instance of the resource for each thread) and some of the resources are shared between threads. Typically the cache resources are shared between threads but this can lead to conflicts where one thread fills the cache with data. As described above, as data is added to an already full cache this results in the eviction of data which is being used by other threads. A solution to this is to partition the cache between threads, so that each thread has a separate, dedicated portion of the cache which is not visible to other threads.
The embodiments described below are not limited to implementations which solve any or all of the disadvantages of known methods of managing access to memory.