Tracing is an approach for logging the state of computer applications at different points during its course of execution. Tracing is normally implemented by inserting statements in the computer application code that outputs status/state messages (“traces”) as the statements are encountered during the execution of the code. Statements to generate traces are purposely placed in the computer application code to generate traces corresponding to activities of interest performed by specific sections of the code. The generated trace messages can be collected and stored during the execution of the application to form a trace log.
Programmers often use tracing and trace logs to diagnose problems or errors that arise during the execution of a computer application. When such a problem or error is encountered, trace logs are analyzed to correlate trace messages with the application code to determine the sequence, origin, and effects of different events in the systems and how they impact each other. This process allows analysis/diagnoses of unexpected behavior or programming errors that cause problems in the application code.
In order for this process to be effective, the traces must capture, and the trace logs must reflect, the problems or the events that cause the problems. If not, it may be necessary to try to reproduce the problems again but with more aggressive tracing until all the necessary information is collected in the trace logs. A problem with this approach is that repeated attempts to reproduce the problems can be time consuming. Frequently, the problems are difficult to reproduce, and a large number of attempts are required. To add to the difficulty, often times the problems occur on a computer system at a remote site. This may require having a programmer travel to that site, which can be expensive and even more time consuming. Alternatively, one may turn to a system administrator or user at the remote site for help, but this may also add to the difficulty because they may not be willing, and they may not be as adept to efficiently reproduce the problems and control the tracing.
Another approach is to enable detail tracing at all times in anticipation of problems or failure occurring, but this may consume excessive resources which can negatively affect the performance of the computer system and may fill the trace logs with non-useful information.
The present invention provides a method and mechanism for managing diagnostic traces within a computer system. According to an embodiment, a status indicator for a resource within the computer system is used to determine whether to start a trace. If the status indicator reaches a failure prone threshold, then the trace may be automatically started. If the status indicator falls outside of the failure prone threshold, then the trace may be automatically stopped.
According to another embodiment, status indicators for a plurality of resources within a computer system are used to determine whether to start a trace action. If a particular combination of status indicators reach their respective failure prone threshold values, then a corresponding trace action may be automatically invoked.
With these aspects of the invention, traces can be automatically managed to capture diagnostic data or problems occurring within a computing device. Further aspects, objects, and advantages of the invention are described below in the detailed description, drawings, and claims.