This disclosure relates generally to a data processing system and, more specifically, to managing multiple speculative assist threads for differing cache levels for data pre-fetching within a data processing system.
When an assist thread is used to pre-fetch data for a main thread executing within a program, there are typically issues of controlling the pace of execution between the main thread and the assist thread, and selecting a binding between the main thread and the assist thread to processor cores. Thread performance may be dependent on control of the pace between the assist thread and the main thread. In one instance, the assist thread needs to run ahead of the main thread far enough so delay of a cache miss can be hidden by the pre-fetch instructions. It is desirable to have the pre-fetched data arrive just before the main thread needs to use the data. In such cases, the latency of the cache miss is fully hidden. On the other hand, the assist thread should not run too far ahead of the main thread because the pre-fetched data may cause useful data to be evicted from the cache. The eviction of useful data is known as cache pollution by pre-fetch.
Current solutions attempt to determine and use a parameter to control the pace difference between the assist thread and the main thread. The solutions attempt to select an ideal value for use in minimizing latency and cache pollution. A difficulty of such solutions is that the selected pace distance may be not an optimal pace distance to reduce the latency and cache pollution at the same time. In some cases, while attempting to avoid cache pollution with a smaller pace difference, the solution cannot fully hide the latency of memory accesses. While attempting to fully hide the latency by using a larger pace difference, cache pollution is introduced as a consequence.
With respect to binding of the main thread and assist thread to a processor core, a main thread and assist thread can be bound as chip multiprocessor (CMP) threads or simultaneous multithreaded (SMT) threads. When simultaneous multithreaded threads are used, resource contention may occur. When chip multiprocessor threads are used, the data can be only pre-fetched into a cache level shared by chip multiprocessor threads. In some systems, the shared cache is quite far away from a processor and benefit of pre-fetch is not fully utilized. Current solutions force a choice between either chip multiprocessor threads or simultaneous multithreaded threads.
The same processor that will consume the pre-fetched data typically issues pre-fetch instructions, thereby adding to the load on the processor. Therefore, the location in the memory hierarchy to which the data should be pre-fetched is assumed to have affinity to the processor or processor core from which the pre-fetch instruction is issued. Usually, the data is brought into the memory component closest to the processor core, which is typically level 1 cache.