1. Field of the Invention
The present invention generally relates to computer processing systems, and more particularly to tools, techniques and processes, such as debugging tools and visualization tools, for deterministically replaying the execution of distributed, multithreaded programs (e.g., JAVA™) on such computer processing systems in a multiprocessor environment.
2. Description of the Related Art
Modern operating system platforms support concurrent multiple threads of execution. Some of these operating system platforms support concurrent multiple threads without complete scheduling control by the user. For example, operating system platforms that support the Java™ Virtual Machine Specification fall into this category. In such systems, each time a Java™ program runs, the time allocated to each unsynchronized thread may change as these times are under the control of an external operating system. The apparently random time allocation to threads introduces non-deterministic program behavior. Other events such as windowing events, network events/messages and general input output operations may also introduce non-deterministic program behavior. Such program execution is referred to below as a non-deterministic execution instance.
Such non-deterministic program behavior mitigates the benefits of using re-execution of the program for debugging, performance monitoring, or visualization, or other similar tasks.
For example, repeated execution of a program is common while debugging a program, and non-determinism may disallow a bug that appeared in one execution instance of the program from appearing in another execution instance of the same program. Non-determinism also affects visualization of the execution behavior of a program since execution behaviors can be different for different execution instances of the program. Current cyclic debugging, i.e., repeated execution for debugging, and monitoring-based tools such as visualizers, do not support deterministic re-execution.
Previously, techniques have been developed to deterministically replay a multithreaded distributed (e.g., Java™) program and implemented on Sun Microsystem's hardware/operating system platform (e.g., Sun's JDK 1.2). The replay mechanism developed was platform-independent. That is, it did not rely on the underlying operating system or thread scheduler.
However, such techniques were implemented on a single virtual machine. Hitherto the present invention, the techniques were not used in the context of shared memory multiprocessors. Moreover, while it is believed that such techniques could be implemented in a shared memory environment, the cost of increased instrumentation overhead would be prohibitive, absent some innovative techniques because such techniques rely on capturing and replaying thread schedule information.
On a uniprocessor system, when a thread or an application runs, all the other threads and applications “sleep”, and replaying exact thread schedule information with synchronization information is sufficient for replaying execution behavior.
On a multiprocessor system, however, multiple threads and applications are concurrently running at any time point. Therefore, capturing thread schedule and synchronization information is not sufficient for replay, and any memory operation of each thread that might affect other threads or that might have been affected by other threads needs to be captured. Since there are a much larger number of such memory operations than the number of thread switches (e.g., thousands of times more in general), applying previous techniques to multiprocessors can incur prohibitive runtime costs.
Such techniques have been absent from the previous systems and mechanisms. Further, most previous approaches for re-execution of non-deterministic applications have focused on replaying multiprocessor applications running on shared memory multiprocessor system. Like threads, processes of an application can affect the execution behavior of other processes via accesses to shared variables, synchronization operations and communications. Replaying multiprocessor applications requires capturing interactions among processes (e.g., critical events) and generating traces for them. A major drawback of these approaches is the large overhead in time and space required to generate these traces.
To reduce the trace size, one approach assumes that applications use a correct, coarse-grained operation for concurrent-read-exclusive-write (CREW) to access shared objects and generates traces only for these coarse operations. However, this approach fails if critical events within the CREW are non-deterministic.
Another approach is similar in that it also generates traces only for coarse-grained critical events, assuming shared variables are well guarded within well-defined critical-sections.
Finally, another approach (optimal tracing) reduces the trace size further by applying an execution-time algorithm to find the minimum traces to replay the execution. This approach may significantly reduce the trace size (e.g., by one or two orders of magnitude), but at the cost of substantially increasing the execution time of the application.