1. Technical Field of the Invention
The present invention relates generally to data processing and in particular to shared-memory multiprocessors. Still more particularly, the present invention relates to scalable interruptible queue locks for a shared-memory multiprocessor and a method of operation thereof.
2. Description of the Related Art
Operating system kernels require efficient locking primitives to enforce serialization. Spin locks and queue locks are two common serialization mechanisms. In addition to scalability and efficiency, interruptability is a desired trait. Because of atomicity requirements, a thread may have to raise its priority level before entering a critical section that manipulates memory. Additionally, enabling the thread to be interrupted while it is waiting for the lock increases the responsiveness of the system to interrupts.
A spin lock is a simple construct that uses the cache coherence mechanism in a multiprocessor system to control access to a critical section. A typical spin lock implementation has two phases. In the spin phase, the waiting computation agents, or threads, spin on a cached copy of a single global lock variable. It should be noted that in the context of operating system (OS) kernels, there is generally a one-to-one correspondence between computation agents, or threads, and processors. In the compete phase, the waiting computation agents all try to atomically modify the lock variable from the available to the held state. The one computation agent that succeeds in this phase has control of the lock; the others go back to the spin phase. The transition from the spin to the compete phase is initiated when the lock holder releases the lock by marking the lock variable as available.
Spin locks have two main advantages: they require only a few instructions to implement and they are easily designed to be interruptible. The main disadvantage of spin locks is that they do not scale well. The compete phase can cause significant contention on the system buses when a large number of computation agents simultaneously attempt to acquire the lock. Spin locks are thus suitable only for lightly contended locks. In addition, since the lock is not necessarily granted in first in first out (FIFO) order, spin locks are typically not fair.
In a queue lock, computation agents queue up to acquire the lock. The lock holder releases the lock by granting it to a computation agent at the head of the queue. If there are no computation agents in the queue, the lock is simply marked available to the next computation agent that tries to acquire it. Queue lock implementations typically involve two phases. In the enqueue phase, a computation agent joins the queue by atomically updating the queue data structure. In the spin phase, queued computation agents spin waiting for the lock to be granted. In contrast to spin locks, the computation agents in a queue lock spins at a distinct memory locations that typically map to distinct cache lines. A lock holder notifies that the lock is available by updating a single computation agents spin location. Since the computation agents spin on distinct memory locations, the lock holder wakes up only one computation agent when it releases the lock.
Queue locks are generally more complicated to implement than spin locks. Their main advantage is that they scale well. Unlike spin locks where a lock release causes a free-for-all among the waiting computation agents, at most one computation agent is woken up in a queue lock. This makes them particularly suited for heavily contended locks. Queue locks can enforce fairness by having the queue data structures preserve the order in which computation agents enqueue themselves. The main disadvantage of queue locks over spin locks is the increased number of memory operations caused by the enqueue-spin-wakeup cycle. For lightly contended locks, these extra operations can significantly increase the time it takes to acquire a lock.
An interruptible lock is one that can handle interrupts between the time when a computation agent expresses a desire to acquire a lock and the time that it actually acquires it. Depending on the contention for a lock and the time spent inside the critical section it controls, a significant period of time may elapse between when a computation agent begins the process of acquiring the lock and when it finally gets control. In order to preserve atomicity of accesses to key data structures, the system may enforce the restriction that the critical section be entered only at a high interrupt request level (IRQL). The priority level is typically an attribute of the operating environment wherein only interrupts above a certain priority will be serviced. On many systems, this is referred to as the IRQL. The ideal procedure for handling this situation is to have a computation agent wait for the lock at a lower IRQL and raise the IRQL only after the computation agent gets control of the lock.
Consider the spin lock algorithm shown above in Table 1. Table 2 below illustrates a mechanism for incorporating interruptability in this algorithm. The idea is to raise the IRQL just prior to entering the compete phase. If the attempt is successful, the lock would have been acquired at the higher IRQL. If the attempt fails, i.e., another computation agent has acquired the lock, the IRQL is restored to its original level. This ensures that any interrupt that would have been serviced at the original IRQL will continue to be serviced while the computation agent waits for the lock in the spin phase.
Making queue locks interruptible is not as simple as in the case of spin locks. The problem lies in the fact that in a queue lock, a computation agent can be granted the lock at any time after it joins the queue. Contrast this with a spin lock where a computation agent knows that it will not get control of the lock unless it initiates the compete phase. The straightforward approach of raising the IRQL after a queue lock has been acquired creates a window of vulnerability between when the lock is acquired and the IRQL is raised. During this transition period, a deadlock condition may occur. For instance, consider a low priority level external interrupt that occurs within this window. Furthermore, in the process of servicing this interrupt, let it be necessary to acquire the queue lock under consideration. The deadlock situation is now apparent. In order for the interrupt service handler to obtain the lock, the lock needs to be released. However, in order for the lock to be released, the interrupt service handler must finish, enabling the original lock acquire-release cycle to complete. Thus, a more sophisticated mechanism is required.
It is therefore an object of the present invention to provide an improved multiprocessor system.
It is another object of the present invention to provide scalable interruptible queue locks for shared-memory multiprocessors and a method of operation thereof.
To achieve the foregoing objects, and in accordance with the invention as embodied and broadly described herein, a method for a computation agent to acquire a queue lock in a multiprocessor system that prevents deadlock between the computation agent and external interrupts is disclosed. The method provides for the computation agent to join a queue to acquire a lock. Next, upon receiving ownership of the lock, the computation agent raises its priority level to a higher second priority level. In response to a receipt of an external interrupt having a higher priority level occurring before the computation agent has raised its priority level to the second higher priority level, the computation agent relinquishes ownership of the lock. Subsequent to raising its priority level to the second higher priority level, the computation agent determines if it still has ownership of the lock. If the computation agent determines that it has not acquired possession of the lock after raising its priority level, the computation agent rejoins the queue to reacquire the lock.
The present invention introduces a novel methodology of implementing queue locks that allows for interruptability from external interrupts while eliminating any deadlock conditions. The present invention accomplishes this by permitting the computation agent that has been given ownership of a lock to be able to relinquish ownership to another waiting computation agent when an intervening interrupt is encountered during the transition period when it is raising its priority level.
In one embodiment of the present invention, the computation agent""s priority level is restored to its original, i.e., first priority level, when it rejoins the queue to reacquire the lock. In a related embodiment, the second priority level of the computation agent is higher than the priority level of the external interrupt.
In another embodiment of the present invention, the queue is implemented using a global bitmask wherein the number of bits in the global bitmask is equal to the number of computation agents in the multiprocessor system. It should be noted that, in other advantageous embodiments, the number of bits in the global bitmask is greater than the number of processors in the multiprocessor system allowing for scalability in the system.
In yet another embodiment of the present invention, the computation agent relinquishes ownership of the lock by releasing the lock to a second computation agent. In a related embodiment, the second computation agent is the next computation agent whose bit position in the global bitmask (that tracks requests for the lock) is to the left of the bit position on the left of the first computation agent, which has ownership of the lock, in the global bitmask that wants ownership of the lock. It should be readily apparent to those skilled in the art that, in other advantageous embodiments, the ownership of the lock may be passed to the next computation agent on the right of the first computation agent that indicates a desire to acquire the lock. The present invention does not intend to limit its practice to any one particular direction but requires that which ever direction (right or left) is selected be maintained to prevent starvation by eventually granting the lock to every waiting computation agent. Furthermore, in non-uniform memory (NUMA) system, the lock holder may give precedence to computation agents that reside on the same node as itself, taking care to avoid starvation by not doing this too many times in a row.
The foregoing description has outlined, rather broadly, preferred and alternative features of the present invention so that those skilled in the art may better understand the detailed description of the invention that follows. Additional features of the invention will be described hereinafter that form the subject matter of the claims of the invention. Those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiment as a basis for designing or modifying other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.