This invention relates to multiprocessor computer systems and to methods and apparatus for managing shared resources in such systems. One central challenge of implementing scalable multi-threaded programs is efficiently managing shared resources, such as memory. The traditional way to manage shared resources is to use a blocking synchronization operation provided by the operating system, such as a mutex. Blocking synchronization allows one thread at a time to safely operate on a shared-resource, while blocking any other threads that attempt synchronized access to the same resource. However, if the shared resource is frequently used by many threads, the use of blocking synchronization can quickly become a bottleneck. Another solution on uniprocessors is to use kernel-assisted non-blocking synchronization, such as restartable atomic sequences. These schemes do not prevent several threads from starting a transaction for a shared resource at the same time, but they detect contention, and cause interrupted transactions to either roll-forward or roll-back to a consistent state.
One conventional solution to the bottleneck caused by blocking synchronization is to partition the resources among threads into resource “pools”, so each thread has a resource pool that is dedicated to that thread. The thread can then access and manipulate its local resource pool without using blocking synchronization because only that thread can access the pool. However, when local resource pools are used, it is important to efficiently partition resources among the pools, so that resources are available to the threads that need them, and not wasted on the threads that do not need them.
Per-thread resource pools work well for many applications, such as those applications that have relatively few threads or whose threads are compute-bound. However, when the number of threads greatly exceeds the number of processors in the multi-processor system, the ability of threads to make use of the pools is diminished as most threads will be suspended for long periods of time with no chance to run. In such cases, the suspended threads may have partially unused resource pools which tie up the resources and the efficiency of pool usage decreases.
Accordingly, another conventional solution is to partition the resources among processors into resource “pools”, so each processor has a resource pool that is dedicated to that processor. Using a technique called “multi-processor restartable critical sections”, a thread can access a per-processor resource in a critical section. If the thread is preempted while in the critical section, it will be notified when it attempts to complete the transaction, and can retry access to the resource. In this way, multiple threads can safely share a per-process resource without using blocking synchronization. This solution has the advantage that a resource pool is available to any thread running on the processor to which the resource pool is dedicated. Such an arrangement implementing processor local allocation buffers for a garbage collection system is discussed in detail in-an article entitled “Supporting Per-processor Local-allocation Buffers Using Multi-processor Restartable Critical Sections”, D. Dice, A. Garthwaite and D. White, available at website: research.sun.com/technical-reports/2004/smli_tr-2004-126.pdf.
However, there are conditions when the use of per-processor resource pools also leads to poor resource utilization. For example, when the number of allocating threads is less than the number of processors, or when threads are entirely compute-bound, threads using resources from processor-dedicated resource pools may be preempted and migrate to other processors, leaving partially-used resource pools tied to idle processors. While the amount of wasted memory with processor-dedicated pools is bounded by the number of processors instead of the number of threads as with thread dedicated pools, it is still a concern.