1. Field of the Invention
The present invention relates in general to the field of computers and similar technologies, and in particular to software utilized in this field. Still more particularly, it relates to a method, system and computer-usable medium for a lock-spin-wait operation for managing multi-threaded applications in a multi-core computing environment.
2. Description of the Related Art
Computing environments which include a multi-core processor system are becoming increasingly common and so have multi-threaded applications which exploit this hardware opportunity. An important performance consideration with a multi-threaded application is the salability of the application. Salability of the application relates to achieving a performance gain which linearly approximates the number of cores and number of threads used in the parallel execution of the application. To improve the salability of the application it is desirable to provide the processor system with an efficient locking mechanism. Often the locking mechanism is provided by a system library, usually supported by hardware in the form of atomic update primitives. A spin-wait mechanism, such as where software threads spin-wait to acquire a lock before entering a critical section for exclusive access to shared data, is a common option for implementing this important function due of its simplicity and the relatively short response time of lock acquisition.
However, spin-wait mechanisms may present certain challenges. For example, processor cycles may be wasted by threads spin-waiting for their turns to acquire the lock. Certain techniques have been developed to address some of the issues associated with spin-wait mechanisms. For example, some spin-wait mechanisms provide a non-blocking lock access option. With this type of mechanism, an application can be re-structured such that, a thread checks the status of its associated lock first upon arriving at a predetermined section, which may be critical to the operation. The thread acquires the lock and enters the predetermined section if the lock is available. If the lock is not available (i.e., the lock is already taken by some other thread), the thread retreats to do other productive work and then checks back later. However, one potential issue with this method is that the opportunity for such re-structuring is usually very limited. For example, the predetermined section may be the only place to get the next work item. Furthermore, commonly-accepted software design practice may be contrary to this approach as software is usually structured in such a way that threads are respectively assigned individual, specialized tasks. As a result, one thread dedicated to one task is not allowed to switch to a different task. Such software design methodology has the virtue of simplicity and thus more reliable, easier to maintain, expandable and most of the time has higher performance.
As another example, threads waiting for a lock may be suspended, thus preventing them from running on a processor. A thread can also choose to relinquish the processor that it is running on after spinning for a short period of time without acquiring the lock. The operating system (OS) then puts these threads in a block queue. Threads in a block queue are not scheduled to run on a processor. Instead, they are waiting to be unblocked by a certain hardware event, which in this case would be a lock release. In turn, the OS monitors lock release events and wakes up (i.e., makes a thread run-able) the thread associated with the lock being released in the block queue. The advantage of this approach is that a thread waiting for a lock will not consume any processor cycles. Therefore, the saved cycles can be used by other threads.
Unfortunately, suspending and subsequently unlocking a thread are Operating System (OS) kernel functions. The overhead of these functions, plus the context switching, imposes a high cost in getting a lock. In the worst case, which is not uncommon, a high percentage of processor cycles are consumed by OS activity in managing these block-waiting threads. A more serious drawback of this block-waiting strategy is that the lock latency becomes significantly higher when passing a lock to a suspended thread. In other words, the lock throughput is low. Accordingly, it would be desirable to preserve the high performance lock response time of a spin-wait mechanism while providing an efficient mechanism to minimize processor cycles lost due to spinning within the spin-wait mechanism.