Tracing is an approach for logging the state of computer applications at different points during its course of execution. Tracing is normally implemented by inserting statements in the computer application code that outputs status/state messages (“traces”) as the statements are encountered during the execution of the code. Statements to generate traces are purposely placed in the computer application code to generate traces corresponding to activities of interest performed by specific sections of the code. The generated trace messages can be collected and stored during the execution of the application to form a trace log.
Programmers often use tracing and trace logs to diagnose problems or errors that arise during the execution of a computer application. When such a problem or error is encountered, trace logs are analyzed to correlate trace messages with the application code to determine the sequence, origin, and effects of different events in the systems and how they impact each other. This process allows analysis/diagnoses of unexpected behavior or programming errors that cause problems in the application code.
In a parallel or distributed environment, there are potentially a number of distributed network nodes, with each node running a number of distinct execution entities such as threads, tasks or processes, which may comprise of a plurality of threads. In many modem computer applications, these threads perform complex interactions with each other, even across the network to threads on other nodes. Often, each of the distributed nodes maintains a separate log file to store traces for their respective threads. Each distributed node may also maintain multiple trace logs corresponding to separate threads on that node.
Diagnosing problems using multiple trace logs often involves a manual process of repeatedly inspecting different sets of the trace logs in various orders to map the sequence and execution of events in the application code. This manual process attempts to correlate events in the system(s) with the application code to construct likely execution scenarios that identify root causes of actual or potential execution problems. Even in a modestly distributed system of a few nodes, this manual process comprises a significantly complex task, very much limited by the capacity of a human mind to comprehend and concurrently analyze many event scenarios across multiple threads on multiple nodes. Therefore, analyzing traces to diagnose applications in parallel and/or distributed systems and/or single node systems is often a time consuming and difficult exercise fraught with the potential for human limitations to render the diagnosis process unsuccessful. In many cases, the complexity of manual trace analysis causes the programmer to overlook or misdiagnose the real significance of events captured in the trace logs. With the increasing proliferation of more powerful computer systems capable of greater execution loads across more nodes, the scope of this problem can only increase.
An improved approach to diagnosing computer systems and applications uses trace messages that are materialized in a markup language syntax. Hyperlinks can be placed in the trace messages to facilitate navigation between sets of related traces. One method to generate trace messages having markup language syntax is to first generate trace strings from an application having a known set of fixed formats, in which the process for extracting information to create a new version of the trace in a markup language syntax is driven by knowledge of the position and existence of specific data in the trace strings. This type of approach is described in more detail in co-pending U.S. patent application Ser. No. 09/872,647, entitled “Method and Mechanism for Diagnosing Computer Applications Using Traces,” filed on May 31, 2001, which is hereby incorporated by reference in its entirety.
Further, traces with markup language syntax may also be generated using non-fixed format traces. In this approach, each set of traces may correspond to a defined trace format grammar, wherein the process for extracting information to create a new version of the trace in a markup language syntax is driven by the corresponding defined trace format grammar (TFG). Thus, if change to the trace format is desired, then an additional TFG may be defined instead of having to change the code of the corresponding tools to navigate through the traces. This type of approach is described in more detail in co-pending U.S. patent application Ser. No. 09/872,590, entitled “Method and Mechanism for Using a Meta-Language to Define and Analyze Traces,” filed on May 31, 2001, which is hereby incorporated by reference in its entirety.
Embodiments of the present invention provide methods and mechanisms for debugging a series of related events within a computer system. According to an embodiment, when tracing a series of related events that span across a plurality of threads, a token may be passed from one thread to another, thereby allowing a link between the threads to be marked within the one or more traces. The threads may reside on a single node and/or process or a plurality of nodes and/or processes.
With this aspect of the invention, sufficient information will be provided within the one or more traces to allow all the related trace data to be linked together. Further aspects, objects, and advantages of the invention are described below in the detailed description, drawings, and claims.