Computer processes running in a multi-processing system often compete for “shared resources” available in the multi-node system. Examples of shared resources may include volatile or non-volatile storage media, shared printers, shared scanners, or other input/output devices.
Requests from the processes may be coordinated using “locks.” Once an appropriate lock is granted to the process, the process may access the shared resource, and use the resource until, for example, the lock is relinquished, revoked, or otherwise terminated.
“Lock-based” system may include two types of locks: locks permitting concurrent access to a shared resource, and locks permitting exclusive access to the shared resource. For example, a “shared read lock” may be granted to one or more processes to allow the processes to concurrently read data from a persistent storage, whereas an “exclusive write lock” may be granted only to one process at the time, and to allow only one process at the time to write data to, for example, a persistent data storage medium.
Once an appropriate lock is granted to the process, the process may hold on to the lock until the process decides to release the lock. On occasion, the process may hold on to the lock even if the process does not need the lock any more.
The process may continue holding on to the lock for various reasons. For example, the process may hold on to the lock to avoid a time consuming procedure of reapplying for the lock. When the process anticipates that it may need the lock some time in the near future, instead of relinquishing the lock and then reapplying for it, the process may just continue holding on to the lock. However, such anticipation may be false, and the process may end up holding on to the lock but have no actual need for it in the future.
A process may also hold on to the lock because the process simply “forgot” to release the lock. This may occur in cases of poorly written process handlers, or complex multi-process-applications.
When one process holds on to the lock, other processes may have to wait until the lock is released before they can receive the lock. This dependency may be particularly inconvenient to the processes in multi-node systems competing for an exclusive lock.
In current implementations, a distributed lock manager (DLM) may be used as a mediator between the process holding a lock and processes requesting the lock. A DLM may operate in a number of modes. For example, in case of an “exclusive lock” mode, a DLM may use a blocking asynchronous system trap (BAST) function to request the process holding on to the lock to release the lock because other processes are waiting for that very same lock. This function provides a mechanism for sending a message to the process holding on to the lock to request a lock release.
However, even upon receiving such a request, it is still up to the process holding on to the lock to decide whether to release the lock. For example, the process holding on to the lock may ignore all BAST requests, and continue holding on to the lock despite receiving numerous BAST requests.
Some processes holding on to the lock may consider BAST requests, but may continue holding on to the locks for optimization reasons described above.
In some situations, even if a process holding on to a first lock receives a BAST request, the process may be unable to release the first lock because the release of the first lock depends on receiving a second lock for a shared resource. Situations where one process holds on to the first lock and cannot release the first lock unless it receives the second lock held by a second process which awaits the first lock may lead to system deadlocks, system “hanging,” and other system execution errors.
When a requesting process requests the lock held by another process, the requesting process may set up a timer that can be used to determine how long the requesting process will wait for the lock. In this situation, upon the expiration of the timer, the requesting process may just give up the wait for the lock. In such a case, the requesting process may never receive the lock and may never finish its task.
There are mechanisms for detecting deadlocks caused by the processes that are not willing to release their locks. However, these mechanisms are helpful to detect a deadlock after the deadlock actually occurs, but they are less helpful in preventing deadlocks from happening.
Inability to coordinate access to lock among multiple processes may slow down overall system performance because most of the time, the processes hold on to their locks longer than they should, causing various problems, such as timeouts, deadlocks, and system “hanging.” Those problems are usually difficult to detect and debug. Moreover, these problems may spur a sequence of complicated “chain” reactions within the multi-node systems.
As clearly shown, techniques are needed to manage how a process may hold on to to a lock in a multi-process system.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.