When threads of execution share data in a computer system, the threads may have to stall execution waiting for the data to be retrieved from its last accessed location. On an multi-processor system, chances are high that each thread of execution sharing the data is executing on different processors. Thus, as each thread accesses the data it needs it has to retrieve it from another processor. This operation can take several cycles and may result in the requesting thread to stall execution while the data is being retrieved. Performance can then be reduced as threads stall waiting for shared data. Contention can also increase if these extra stalls occur during execution of critical sections of code.
Prior attempts at co-locating threads have been more specific in nature and require manual intervention. For example, gang scheduling techniques have been applied to schedule all threads in a specified “gang” at the same time on separate processors. An administrator has to determine what threads belong to a gang. Other techniques manually bind a group of threads to given areas of a computer system (a processor, a locality, etc.) but these are manual and require operator intervention. Other prior attempts have relied on noting a “home” locality for a thread based on where most of the memory accessed by the thread resides and trying to keep that thread in the home locality. These techniques rely on operator intervention to indicate which threads to co-locate and how to perform the co-location. Often, the access patterns between threads may change or the operator has no idea how the threads should be organized resulting in non-optimal allocation and performance.