In distributed data processing systems, data objects or resources such as database tables, indexes, files, and other data structures are often shared by multiple processes. If a data object or resource is accessed by two or more processes during the same time interval, problems may arise depending on the nature of the access. For example, if one process attempts to write a data object while another process is reading the data object, an inconsistent set of data may be obtained by the reading process. Similarly, if two processes attempt to write to the same data object during the same time interval, data corruption may result. In both cases, the accessing processes are said to have a “resource conflict,” the resource being the shared data object.
A long standing challenge in computing is the detection of deadlocks. A deadlock is a state assumed by a set of entities wherein each entity in the set is waiting for the release of at least one resource owned by another entity in the set. Entities capable of owning a resource may be referred to as possessory entities. In the context of operating systems, for example, possessory entities may include processes and applications. In the context of a database system, for example, possessory entities may include processes and transactions. A transaction is an atomic unit of work.
For example, a transaction T1 may seek exclusive ownership of resources R1 and R2. If R1 is available and R2 is currently exclusively owned by another transaction T2, transaction T1 may acquire exclusive ownership of R1 but must wait for R2 to become available. A deadlock will occur if transaction T2 exclusively owns R2 but seeks ownership of R1, and T2 is suspended to wait for R1 without releasing R2. Because both T1 and T2 are waiting for each other, and neither can proceed until the resource needed by each is released by the other, they are deadlocked.
Computer systems employ a variety of deadlock handling mechanisms (deadlock handlers) that detect deadlocks. Some deadlock handlers employ a “cycle” technique to detect deadlocks. In the cycle technique, after a process waits a threshold period of time for a resource, a wait-for graph may be generated and examined for cycles. If any cycles are identified, then the deadlock detection mechanism has identified a potential deadlock.
FIG. 7 depicts an exemplary wait-for graph 700, which includes entity vertices 711 and 712 and resource vertices 721 and 722. Wait-for graph 700 was generated when a deadlock handler detected that a process represented by entity vertex 711 had waited a predetermined threshold period of time for the resource represented by resource vertex 721. Arc 731 indicates that the process represented by entity vertex 711 is waiting for the release of the resource represented by resource vertex 721. Arc 732 indicates that the resource represented by resource vertex 721 is owned by the process represented by entity vertex 712. Arc 733 indicates that the process represented by entity vertex 712 is waiting for the release of the resource represented by resource vertex 722. Arc 734 indicates that the resource represented by resource vertex 722 is owned by the resource represented by entity vertex 711.
Arcs 731, 732, 733, and 734 form a loop that both extends from and leads to entity vertex 711, and thus indicates a cycle. The processes represented by the entity vertices of wait-for graph 700 are therefore potentially deadlocked.
In a distributed computer system, a deadlock may involve entities and resources that reside on multiple nodes. Potential deadlocks may be detected through the generation of distributed wait-for graphs. The vertices of a distributed wait-for graph may thus represent entities (“distributed entities”) and resources (“distributed resources”) that reside on different nodes. To generate a distributed wait-for graph, each node generates a portion of the wait-for graph that corresponds to entities that reside on the node. In particular, each node examines lock data structures that reside on the node, and generates wait-for graph elements that represent the node's portion of the distributed wait-for graph.
A cycle in a distributed wait-for graph may thus include vertices that represent entities and resources distributed over multiple nodes. In addition, the cycle may be represented by wait-for graph elements distributed over multiple nodes and which refer to lock structures that reside on the multiple nodes. An exemplary deadlock detection technique employing cycles is described in commonly-owned U.S. Pat. No. 6,304,938, entitled “Detecting a State Change in a Lock Structure to Validate a Potential Deadlock” to Srivastava.