Tracing is an approach for logging the state of computer applications at different points during its course of execution. Tracing is normally implemented by inserting statements in the computer application code that outputs status/state messages (“traces”) as the statements are encountered during the execution of the code. Statements to generate traces are purposely placed in the computer application code to generate traces corresponding to activities of interest performed by specific sections of the code. The generated trace messages can be collected and stored during the execution of the application to form a trace log.
Programmers often use tracing and trace logs to diagnose problems or errors that arise during the execution of a computer application. When such a problem or error is encountered, trace logs are analyzed to correlate trace messages with the application code to determine the sequence, origin, and effects of different events in the systems and how they impact each other. This process allows analysis/diagnoses of unexpected behavior or programming errors that cause problems in the application code.
In a parallel or distributed environment, there are potentially a number of distributed network nodes, with each node running a number of distinct execution entities such as threads, tasks or processes (hereinafter referred to as “threads”). In many modern computer applications, these threads perform complex interactions with each other, even across the network to threads on other nodes. Often, each of the distributed nodes maintains a separate log file to store traces for their respective threads. Each distributed node may also maintain multiple trace logs corresponding to separate threads on that node.
Diagnosing problems using multiple trace logs often involves a manual process of repeatedly inspecting different sets of the trace logs in various orders to map the sequence and execution of events in the application code. This manual process attempts to correlate events in the system(s) with the application code to construct likely execution scenarios that identify root causes of actual or potential execution problems. Even in a modestly distributed system of a few nodes, this manual process comprises a significantly complex task, very much limited by the capacity of a human mind to comprehend and concurrently analyze many event scenarios across multiple threads on multiple nodes. Therefore, analyzing traces to diagnose applications in parallel and/or distributed systems is often a time consuming and difficult exercise fraught with the potential for human limitations to render the diagnoses process unsuccessful. In many cases, the complexity of manual trace analysis causes the programmer to overlook or misdiagnose the real significance of events captured in the trace logs. With the increasing proliferation of more powerful computer systems capable of greater execution loads across more nodes, the scope of this problem can only increase.
The present invention is directed to a method and mechanism for improved diagnoses of computer systems and applications using tracing. According to an aspect of one embodiment of the invention, trace messages are materialized using a markup language syntax. Hyperlinks can be placed in the trace messages to facilitate navigation between sets of related traces. Specific traces or portions of traces can be emphasized using markup language tools to highlight text. Another aspect of an embodiment of the invention pertains to a method and mechanism for generating trace messages in a markup language syntax. Further aspects, objects, and advantages of the invention are described below in the detailed description, drawings, and claims.