With proper hardware support, multi-threading can dramatically increase computational performance. For example, multiple threads may execute on multiple Central Processing Units (CPUs) and/or CPU cores to carry out tasks in parallel for one or more applications. However, as processor performance continues to increase, synchronization between threads or processes occupies a larger fraction of overall execution time. As multi-threaded applications begin to use more threads, this synchronization overhead can become the dominant factor in limiting application performance.
From a software standpoint, synchronization is typically accomplished using locks. A lock is usually acquired before a thread enters a critical section of code, and is released after the thread exits the critical section. If a given thread cannot acquire the lock because a preceding thread has acquired the lock, the thread must wait until the preceding thread releases the lock.
Unfortunately, the process of acquiring a lock and the process of releasing a lock can be very time-consuming in modern microprocessors. They typically involve atomic operations, which flush load and store buffers, and can consequently require hundreds, if not thousands, of processor cycles to complete. Moreover, the number of locks may increase with the number of threads and/or CPU cores used by applications increases, which may increase the likelihood of deadlock and other issues.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.