The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, the approaches described in this section may not be prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Business organizations spend large amounts of money, resources and time analyzing computer software errors to determine why the errors occurred and how to correct them. Analyzing computer software errors can be particularly problematic in complex software deployments, such as multi-processor/multi-threaded systems, where it is difficult, if not impossible, to reproduce the exact environment and conditions that existed at the time the errors occurred. As a result, several approaches and tools have been used to analyze computer software errors.
One approach has been to capture and store snapshot data at the time of a software error and then use the snapshot data to analyze the software error. Snapshot data is data that reflects one or more attributes of the state of a system at a particular point in time. For example, snapshot data may specify the contents of memory locations and variables and information about particular errors that occurred. Snapshot data may also specify particular functions that were executing at the time an error occurred. Snapshot data is relatively easy to capture and generally does not require much storage space. One drawback to using snapshot data, however, is that it only provides information relating to a particular point in time and does not provide information about events that occurred prior to the particular point in time. As a result, snapshot data has limited value in debugging software, particularly in complex computing environments.
Another approach has been to use software debugging tools to analyze software errors. Software debugging tools allow a software developer to analyze the behavior of computer software over time in response to various inputs. For example, some software debugging tools provide a sophisticated debugging environment in which a software developer can initialize values of variables and then observe how the values change over time as individual lines of code are selectively executed. Thus, unlike the snapshot data approach, software debugging tools allow a software developer to analyze the behavior of computer software over time, in response to events leading up to an error. This often provides more useful information to a software developer than individual snapshots and allows problems to be identified and addressed in a shorter amount of time. Because of the manual nature of software debugging tools, however, it is difficult, if not impossible, to simulate actual conditions under which computer software operates. For example, it is difficult to reproduce the timing and order in which multiple functions were invoked in a multi-processor/multi-threaded network management environment.
Yet another approach involves the use of a log to record information about the operation of computer system prior to an error. Software systems configured to use a log include a logging function which, when invoked, creates a data file, i.e., a log, on a persistent storage and begins writing log entries to the log. Log entries may contain a wide variety of information and different levels of detail, depending upon the requirements of a particular implementation. For example, log entries may include data that specifies attributes of functions being executed within a software system, such as a function name, source file, line number, a timestamp, as well as key data values. Log entries may be created automatically on a periodic basis or asynchronously in response to events. For example, a particular event might cause a particular function to be initiated. Also in response to the event, a corresponding log entry is made that describes details about the particular function. Since logs are created and maintained on a non-volatile storage, such as one or more disks, a large amount of detailed information can be included in logs. Thus, log entries may contain a high level of detail that is very useful to a software developer in analyzing an error. Despite the benefits provided by logs, creating and maintaining logs on a non-volatile storage can, however, consume a significant amount of system resources. One consequence of this is that logging functions are often used only very selectively and are sometimes turned off completely until after an error has already occurred. Administrative personnel are then faced with the difficult task of trying to reproduce the error under the same conditions that existed when the error originally occurred, which can be difficult to do and can require a significant amount of time.
Based on the foregoing, there is a need for an approach for facilitating the analysis of computer software errors that does not suffer from limitations of prior approaches.