Tracing is an approach for logging the state of computer applications at different points during its course of execution. Tracing is normally implemented by inserting statements in the computer application code that outputs status/state messages (“traces”) as the statements are encountered during the execution of the code. Statements to generate traces are purposely placed in the computer application code to generate traces corresponding to activities of interest performed by specific sections of the code. The generated trace messages can be collected and stored during the execution of the application to form a trace log.
Programmers often use tracing and trace logs to diagnose problems or errors that arise during the execution of a computer application. When such a problem or error is encountered, trace logs are analyzed to correlate trace messages with the application code to determine the sequence, origin, and effects of different events in the systems and how they impact each other. This process allows analysis/diagnoses of unexpected behavior or programming errors that cause problems in the application code.
In a parallel or distributed environment, there are potentially a number of distributed network nodes, with each node running a number of distinct execution entities such as threads, tasks or processes (hereinafter referred to as “threads”). In many modern computer applications, these threads perform complex interactions with each other, even across the network to threads on other nodes. Often, each of the distributed nodes maintains a separate log file to store traces for their respective threads. Each distributed node may also maintain multiple trace logs corresponding to separate threads on that node.
Diagnosing problems using multiple trace logs often involves a manual process of repeatedly inspecting different sets of the trace logs in various orders to map the sequence and execution of events in the application code. This manual process attempts to correlate events in the system(s) with the application code to construct likely execution scenarios that identify root causes of actual or potential execution problems. Even in a modestly distributed system of a few nodes, this manual process comprises a significantly complex task, very much limited by the capacity of a human mind to comprehend and concurrently analyze many event scenarios across multiple threads on multiple nodes. Therefore, analyzing traces to diagnose applications in parallel and/or distributed systems is often a time consuming and difficult exercise fraught with the potential for human limitations to render the diagnoses process unsuccessful. In many cases, the complexity of manual trace analysis causes the programmer to overlook or misdiagnose the real significance of events captured in the trace logs. With the increasing proliferation of more powerful computer systems capable of greater execution loads across more nodes, the scope of this problem can only increase.
An improved approach to diagnosing computer systems and applications uses trace messages that are materialized in a markup language syntax. Hyperlinks can be placed in the trace messages to facilitate navigation between sets of related traces. One method to generate trace messages having markup language syntax is to first generate trace strings from an application having a known set of fixed formats, in which the process for extracting information to create a new version of the trace in a markup language syntax is driven by knowledge of the position and existence of specific data in the trace strings. This type of approach is described in more detail in co-pending U.S. patent application Ser. No. 09/872,647, entitled “Method and Mechanism for Diagnosing Computer Applications Using Traces,” filed on even date herewith, which is hereby incorporated by reference in its entirety.
Trace tools that access fixed format traces expect information in the trace string to appear in a predetermined sequence. However, information in the trace string may not be properly recognized if deviations occur from the exact requirements of the fixed format for the trace. With the fixed format trace approach, changes to the trace string format may require changes in the corresponding tools used to parse and tokenize the trace strings, and these changes could involve significant modification or rewrites to the underlying programming code for the trace tools. Yet it may be highly desirable to allow customization of trace string formats without requiring the burden of modifying or rewriting corresponding trace tools.
The present invention provides a method and mechanism for utilizing a meta-language to define and analyze traces. According to an embodiment, non-fixed format traces are used to generate and materialize traces that incorporate markup language syntax. With this aspect of the invention, changes to a trace format do not necessitate code changes in the corresponding tools for navigating through traces. Further aspects, objects, and advantages of the invention are described below in the detailed description, drawings, and claims.