1. Field of the Invention
The invention relates to computing systems and, more particularly, to multithreaded processing systems.
2. Description of the Related Art
With the widening gap between processor and memory speeds, various techniques have arisen to improve application performance. One technique utilized to attempt to improve computing performance involves using “helper” threads. Generally speaking, a helper thread is a thread which is used to assist, or improve, the performance of a main thread. For example, a helper thread may be used to prefetch data into a cache. For example, such approaches are described in Yonghong Song, Spiros Kalogeropulos, Partha Tirumalai, “Design and Implementation of a Compiler Framework for Helper Threading on Multi-core Processors,” pp. 99-109, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05), 2005, the content of which is incorporated herein by reference. Currently, prefetching is generally most effective for memory access streams where future memory addresses can be easily predicted—such as by using loop index values. For such access streams, software prefetch instructions may be inserted into the program to bring data into cache before the data is required. Such a prefetching scheme in which prefetches are interleaved with the main computation is also called interleaved prefetching.
Although such prefetching may be successful for many cases, it may be less effective for two kinds of code. First, for code with complex array subscripts, memory access strides are often unknown at compile time. Prefetching in such code tends to incur excessive overhead as significant computation is required to compute future addresses. The complexity and overhead may also increase if the subscript evaluation involves loads that themselves must be prefetched and made speculative. One such example is an indexed array access. If the prefetched data is already in the cache, such large overheads can cause a significant slowdown. To avoid risking large penalties, modern production compilers often ignore such cases by default, or prefetch data speculatively, one or two cache lines ahead.
A second class of difficult code involves pointer-chasing. In this type of code, at least one memory access is needed to get the memory address in the next loop iteration. Interleaved prefetching is generally not able to handle such cases. While a variety of approaches have been proposed to attack pointer-chasing, none have been entirely successful.
In view of the above, effective methods and mechanisms for improving application performance using helper threads are desired.