A deadlock condition occurs in a multi-threaded computer program when each of two or more threads has an action pending that is dependent on the action of the other finishing before it can proceed. Needless to say, the occurrence of a deadlock condition causes problems for users. Unfortunately, due to non-determinism in multi-threaded programs, effectively detecting potential deadlocks ahead of their happening remains a great challenge in software testing.
A deadlock is one type of concurrency bug caused by a synchronization error. Mutually exclusive (mutex) locks are becoming pervasively used to protect shared data thus requiring synchronizing the accesses to them from different threads. Mistakes made in using mutex locks can cause multiple threads to each enter a deadlock situation, where each thread is blocked when requesting a lock held by another blocked thread.
For purposes of background explanation, referring now to FIG. 1A, deadlock-free operation occurring between two threads T1 and T2 is visually presented. Here, where time is moving from left to right, thread T1 first obtains a lock on resource A, +L(A), then obtains a lock on resource B, +L(B). Thread T1 then releases, −L(B), −L(A), these locks before thread T2 needs access to the resources A and B.
As shown in FIG. 1B, however, a deadlock occurs where thread T1, for example, has obtained the lock on resource A, +L(A), and thread T2 has obtained the lock on resource B, +L(B). Subsequently, when thread T1 attempts to obtain a lock on resource B, +L(B), it cannot because thread T2 has not yet released resource B and thread T2 cannot obtain a lock on resource A, +L(A), because thread T1 has not yet released resource A. As neither of threads T1 and T2 can continue without, respectively, +L(B) and +L(A), they are deadlocked with respect to one another and no processing is taking place.
In practice, deadlock is generally hard to predict or detect mainly due to an inherently imperfect knowledge of synchronization within a multi-threaded environment. Sometimes acquiring multiple locks in a different order is deadlock-prone, while sometimes the acquisition order does not matter.
Deadlock can cause severe problems for end users. A software system in deadlock usually stops responding to client requests, which can directly cause customer dissatisfaction. In addition, a deadlock in one software component can cause other related components to hang and thereby lead to a chain reaction of hanging. Detecting and diagnosing deadlocks is usually non-trivial in software deployed for a production run. In complex execution environments, there is often no good way to determine whether a system is deadlocked or just running very slowly. Even if is it determined that the system is deadlocked, it is still very difficult to identify the root cause, given there are many components interacting with each other.
Although it is possible to develop deadlock-free code, it is quite challenging to do so in practice. Applying a set of strict coding rules for avoiding deadlocks, for example, may cause other complexities and inefficiencies in the design and may not be favorable in many situations. In addition, a large number of developers in different teams must all follow the coding rules to make them effective. Furthermore, when third party libraries are involved, allowing some exceptions to the coding rules may be the only choice for making things work together.
Therefore, in order to avoid the deadlock situation, it is imperative for software vendors to be equipped with techniques that not only detect as many potential deadlocks as possible in testing, but also provide helpful information to developers for fixing them.
Relying on traditional software testing to detect deadlock bugs dynamically, however, is not effective due to the limited test coverage. Although a deadlock detection tool based on a dependency graph can accurately detect a deadlock situation, it is only useful when the test case actually triggers the deadlock. For concurrent software, due to thread interleaving, the number of possible execution paths is astronomically large. As a result, the test cases usually can only cover a small percentage of the possible conditions and often fail to expose the specific event sequence to trigger the hidden deadlocks. Furthermore, even if a test triggers the deadlock by chance, it is still very difficult for programmers to debug because the bug may be hard to consistently reproduce.
Another way to detect deadlock is to perform stricter checking and conservatively report the potential deadlocks. One way to check potential deadlock is to detect lock order violations, i.e., multiple threads acquire the same set of locks in different orders. This can be done either through static analysis of source code or dynamic checkers running with the test cases. Although this approach can increase the test coverage, it is likely to generate a large number of false alarms, i.e., those cases where a lock order violation cannot cause a deadlock due to other synchronization mechanisms. Verifying each warning manually could be a very tedious task for programmers.