During the operation of software, users may experience unresponsiveness within a software application due to the following reasons: threads caught in a classic deadlock condition; threads caught in an infinite loop; threads performing blocking I/O calls; threads waiting for an event to be posted; and threads waiting for release of locks owned by other blocking threads.
Identifying such culprit threads from among the other threads in the system can be difficult and time consuming. One existing method for diagnosis involves collecting multiple system/thread dumps at different points in time and comparing the thread states and stack traces in the dumps to identify threads which have not progressed. The threads that have not progressed in all of the dumps would be considered as potentially hung.
This method of diagnosis is manual in nature and may not be completely fool-proof. This is because as part of the normal execution of the program, a thread could be iteratively performing the same task/procedure over and over again, and hence there is a high chance that it would be seen in the same state/procedure in all the dumps. Similarly, the stack trace for a thread performing an intended blocking operation would always appear to be the same across all dumps. Hence, this method does not provide conclusive evidence that the thread is indeed the cause of any problem.
Basic tracking and profiling methods also exist to track the status of threads and how much time is spent in each method or function currently on a thread stack. Further, there are many implementations that store timing information for threads and methods called by those threads. For example, some methods use thread local data to generate and publish the timing values of methods and functions to tools. Alternatively, debuggers or the trace engines of virtual machines (such as the Java VM) are capable of recording method/function events and time information, and there are also a number of tools that use this information to provide the user with method profiling data that shows the elapsed time of methods within a thread. However, each of these techniques requires some external monitoring of the thread, and do not provide the timing information directly in the thread.
What is needed is an efficient and effective method of diagnosing thread hangs within a software application without the use of external monitoring tools and without the need to recreate the hanging condition after tools or settings have been applied.