The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not admitted to be prior art to the claims in this application merely by inclusion in this section.
Computer programs are sets of instructions for controlling processors in computing devices. Computer programs typically are written in one or more high level languages that are human readable. These source language statements then typically are compiled by a compiler program and converted to coded instructions, which often correspond to the actual operations performed by a processor in a computer device.
Frequently the coded instructions are not identical to the native instructions for the processor, but, instead, are instructions for a particular virtual machine. A virtual machine is a process that interprets coded instructions and executes them. The virtual machine itself is an executable sequence of instructions in the native language of the processor. Virtual machines sometimes act as source language interpreters. As used herein, the term “machines” includes virtual machines interpreting virtual machine instructions, operating systems interpreting operating system instructions, and processors executing native instructions.
In general, coded instructions from one or more modules can be linked to form an executable program. Modules can consist of source language statements, coded instructions for a particular virtual machine, runtime executables, or some combination of these, with or without associated data. Modules in a high-level source language may be compiled by a run time compiler to produce corresponding modules in coded instructions.
It is becoming more common to create programs that include heterogeneous modules. For example, a program may include a machine-executable module, a module executable by a first type of virtual machine, and a module executable by a second type of virtual machine. During execution of the program, a routine in the machine-executable module may call a routine in the module running in a first type of virtual machine, and that routine may call another routine in the module running in a second type of virtual machine.
In addition, it is common for programs executing on different processors to interact with each other. For example, an application program, such as a network management program, runs as a first process on one processor on one device on a network. While running, the first process may make a request of a device management server program (the “server”), a second process executing separately on a second device on the network, to provide management information about the second device. In turn, the second process may launch a local configuration program as a third process executing separately on the second device or invoke a user authentication program executing on a separate third device. The local configuration program determines properties of the local device by either getting the current status of the local device or changing the status or both. In many situations, the interacting programs will be written by different programmers in different organizations for different types of virtual machines on the different processors. For example, one will be a Java program, while the other is a C language program. The operations described in this paragraph represent just one example context in which similar problems may be encountered.
Some host computers and operating systems divide a program into multiple threads that can be executed separately. Multiple threads may be swapped, in turn, between memory and each processor of a host computer. In host computers with several processors, several threads may be executed simultaneously, each one on a different one of the several processors.
As used herein a “process” refers to any program or any portion of a program that can run independently, including one or more threads, one or more modules, one or more clients, one or more servers, and combinations of one or more clients and servers and other program portions.
Each process that runs can log certain events in a log file written specifically for that process. Logged events might include, for example, reaching certain milestones in the processing of data, encountering particular user actions, invoking separate processes, receiving results from separate processes, invocations by separate processes, returning results to separate processes, and encountering an error or interruption in processing. The log files are often important in determining what an instance of a process spawned from a module actually does when a particular user under particular circumstances executes the instructions in the module. Sometimes what an instance of a process does contrasts with what the process was designed to do because of equipment failures, external problems, or because some circumstances were unanticipated by the programmers of the process. The log files help developers of each process determine the actual performance of instances of processes. Typically, the log files are specific to a process; the events are identified and event properties are represented in ways specifically designed by the developers of the process.
A problem arises when it becomes useful to correlate data across log files written by interacting processes spawned from separately developed modules, often executing on different processors or in different threads. For example, an error encountered by the third process causes an error event to be entered in the log file for the third process along with data about the error of concern to developers of the third module, the local configuration program or the authentication agent program. The error is identified in the third log file by some error code, X, determined by the developers of the third module. The third process may then enter other information into the log file.
As a result of the error, the third process sometimes returns control to the calling process, the second process, with some indication of error. The error indication returned by the third process to the second process causes an error event to be entered in the log file for the second process along with data about the error of concern to developers of the second module, the device management server. The error is identified in the second log file by some error code, Y, determined by the developers of the device management server and may be unrelated to X, the code that identifies the error in the third log file. The second process may then enter other information into the second log file.
As a result of the error, the second process sometimes returns control to the calling process, the first process, with some indication of error. The error indication returned by the second process to the first process causes an error event to be entered in the log file for the first process along with data about the error of concern to developers of the first module, the network management program. The error is identified in the first log file by some error code, Z, determined by the developers of the network management program and may be unrelated to the way the error is identified in the second log file.
A user of the first process will find it difficult to retrieve information about the error event stored in the log file for the third process. The error identification, X, used in the third log file is unknown to the user of the first process.
In one approach, each current process passes the error identification in the current process, determined by the developers of the module for the current process, to the calling process that called the current process. The current process stores the error code returned from a called process in the log file along with the error identification. For example, the second process stores the value X along with the value Y in the second log file and passes the value Y to the first process. Then, the first process stores the value Y along with the value Z in the first log file. A user of the first process can retrieve information about the error event stored in the third log file by tracking backwards through the second log file. For example, the user finds in the first log file that the error Z is associated with the error Y; then searches the second log file to find that the error Y is associated with the error X; then searches the third log file to find the error X and the information about the error X.
One problem with this approach is that a user must track the error through all the log files of the intervening processes. Another problem is that the error codes may not be unique. For example, the error code X might appear several times in the third log file, every time the third process encounters the same problem; or, the error code Y might appear several times in the second log file; or both error codes might appear several times in their respective log files. There might not be sufficient information to determine which error identification is associated with the event of interest to the user.
Another approach is to require each current process to identify an error in the current process by the error identification returned from a called process, to use the same error identification in the log file, and to pass the same error identification to the calling process. For example, the second process stores the value X in the second log file, and passes the value X to the first process. Then, the first process stores the value X in the first log file. A user of the first process can retrieve information about the error event stored in the third log file by going directly to the third log file. For example, the user finds in the first log file that the error identification is X, then searches the third log file to find the error X and the information about the error X. The user does not have to search the log files of all the intervening processes.
One problem that remains is that the error codes may not be unique. For example, the error code X might appear several times in the third log file, every time the third process encounters the same problem. There might not be sufficient information in the third log file to determine which error identification is associated with the event of interest.
Timestamps can be used to associate events in the different log files. However, use of timestamps does not definitively determine which instance of error code X in the third log file is associated with the event in the first log file. If the processes that write the log files are on different devices, the clocks on the different devices should be synchronized to define a time window in which events are considered correlated. This suffers from the disadvantages that synchronization involves management of clock drift among various processors, requires definition of a time window, consumes system resources, and consumes software development resources. Furthermore, it is often problematic to determine a duration for the time window and to determine whether events are correlated if one of the events falls on or near the boundary of the time window.
Another approach is to generate a unique context identification (“context ID”) for the error or event encountered in the third process, to include this unique context ID in the log file, and to pass this unique context ID to the calling program. The calling program then also includes the unique context ID with any data written to its log file. In this way, data entered in all log files related to the same event include the same context ID.
One disadvantage of this approach is that the context ID does not appear in all log files related to an event unless and until the context ID is returned to all calling processes up to the top process in the hierarchy. Therefore there is inherently a delay in correlating data in one log file to data in another log file, until the context ID is propagated back up to the root calling process. Also, because of the nature of the error or the types of interactions, it might not be possible to pass the context ID back through all the calling processes. For example, a client process may send a request message to an independent server process that only sends a return message upon completion of a service. In that case, the context ID will not appear in all log files of processes that are affected by the event, no matter how long a user waits.
Based on the foregoing, there is a clear need for correlating data through multiple log files, each log file written specifically for one process among multiple processes employed for an instance of an application, that does not suffer the deficiencies described above. More generally, there is a clear need for correlating output associated with multiple processes employed for an instance of an application, that does not suffer the deficiencies described above. Output associated with a process include log files written by the process, other data (such as message, web pages and configuration files) output by the process, and entities, such as threads and other processes, generated from the process.