No software application—no matter how well designed or implemented—can operate completely maintenance-free. At some point during the operating life of a software application, human support personnel (i.e., an “operator”) may need to analyze the software or the hardware environment in which it is running, in order to track down the source of a problem and make appropriate modifications. The software generally facilitates this analysis by emitting “events” as it executes. These events may alert an operator to a potential problem with the software, or may provide a trail that records what the software has been doing during the course of its execution.
While an application's generation of event information can be effective in helping an operator to learn of a problem or track down its source, current systems supporting such event generation have drawbacks. One of these drawbacks is that there is little structure in terms of the types of events that applications generate and the way by which they are generated. Typically, an application developer determines the types of events that an application will emit, and the ways by which those events will be communicated to the operator. Additionally, many applications do not use any structured technique (e.g., a well-defined set of methods) to generate event information, and thus are prone to errors and inconsistencies in the manner in which they generate events. Moreover, there is little uniformity among applications as to the form that the events will take.
Another drawback particularly concerns “trace events”—i.e., fine-grain events that describe key steps in the operation of the program, which are often used by operators to track down the source of a known problem. The generation of trace events by a program is often subject only to coarse, per-process “on/off” controls. This level of control over tracing is generally too coarse for complex applications that are distributed across a large number of processes and machines. For example, an application might process orders for a retail web site. An operator may need to trace a given task, such as the task of receiving an order from a particular customer. This task might be performed by several processes executing on several different machines, where the processes are also performing various operations that are unrelated to the task that the operator is interested in analyzing. However, in order to trace the execution of the customer's order in a conventional system, tracing must be turned on for each of these processes. Such a trace consumes computing resources to generate an enormous amount of event information—much of which is irrelevant to the task that the operator needs to trace. Preferably, the operator should be able to turn on tracing just for one task. Even more preferably, the operator should be able to define what type of trace information (e.g., the value of a particular variable) is relevant, and should be able to receive only the defined type of trace information from the software.
In view of the foregoing, there is a need for a system that overcomes the drawbacks of the prior art.