1. Field
This disclosure generally relates to techniques for pre-fetching data into a cache in a computer system. More specifically, this disclosure relates to a pre-fetch mechanism that pre-fetches data into a cache different from the cache that issues the pre-fetching request.
2. Related Art
Modern processor architectures are often composed of two or more independent processor cores. Such multi-core processor architectures may include one or more caches that are shared among the multiple cores. For instance, a level one (L1) cache may be shared by multiple threads executing on different cores.
When two threads share a common cache, one thread can “help” the other thread by pre-fetching data into the shared cache. For example, in one pre-fetching technique (called “software scouting”), a separate software scout thread can speed up another (“main”) thread by pre-fetching data needed by the main thread into a shared L1 cache. In such scenarios, this scout thread does not directly contribute to computing actual computational results, but instead strives to determine and send out pre-fetch requests for memory addresses that will be needed by the main thread in the near future. Hence, when the main thread attempts to access such data, the needed data has already been pre-fetched into the shared cache by the scout thread, thereby improving the performance of the main thread.
While there are a multitude of situations in which pre-fetching into a shared cache is beneficial, executing two threads in the shared context can also introduce limitations. For instance, two threads that share an L1 cache can also suffer from pipeline resource contention that can reduce the performance of both threads.
Hence, what is needed are techniques for pre-fetching cache data without the above-described problems of existing pre-fetching techniques.