1. Field of the Invention
The present invention relates to inter-process synchronization mechanisms in computer systems. More specifically, the present invention relates to a method and apparatus for implementing inter-process locks that provides for robust recovery in the event a process fails while holding a lock.
2. Related Art
Computer systems often support multiple processes that can work together on a single computational task. One of the challenges in using multiple processes is to synchronize the processes so that they do not interfere with each other. This is typically accomplished through mutual exclusion locks (mutex locks), which are used to ensure that only one process at a time performs a particular task or has access to specific items of shared data.
A process typically attempts to "acquire" a lock before executing a critical section of code or accessing specific items of shared data. If no other process presently holds the lock, the process acquires the lock by setting the lock to a locked state. After acquiring the lock, the process is free to execute the critical section of code or manipulate the items of shared data without interference from other processes. While the process holds the lock, other processes attempting to acquire the lock will "block" waiting for the lock, and will not be able to proceed until the lock is released. After the process completes the task, it releases the lock, thereby allowing other processes to acquire the lock.
Mutual exclusion locks create complications for computer systems that operate robustly. Such "robust" computer systems are designed to continue processing even if some of the processes or processors involved in a computational task fail during program execution. If a process fails while holding a mutual exclusion lock, other processes attempting to acquire the lock will "hang," waiting for the failed process to release the lock. The computational task will consequently come to a halt.
To remedy this problem, some computer systems simply release a lock if a process holding the lock fails. This allows other processes that are blocked on the lock to continue processing. However, the process that failed may have left the data protected by the lock in an inconsistent state. This may cause the remaining processes to produce an incorrect result or to fail at some time in the future.
Other systems notify the remaining processes that the state protected by the lock was left in an inconsistent state when the process died. This allows the remaining processes to take action to make the state consistent again. However, the remaining processes may not be successful in making the state consistent again. In this case, the remaining processes will ultimately produce an incorrect result or will fail at some time in the future.
What is needed is a method or an apparatus that provides robust recoverable locks that do not let other processes hang if a process holding a lock fails, and which allows the other processes to restore the state protected by the lock to a consistent state.