A very common way to debug a software program is by tracing. Tracing of software programs is described in, for example, column 1, lines 52-61 of U.S. Pat. No. 7,284,153 granted to Oakbay et al. on Oct. 16, 2007 and entitled “Apparatus, Method and System For Logging Diagnostic Information” which is assigned to International Business Machines Corporation. The just-described patent is incorporated by reference herein in its entirety, as background.
Traces outputted by a software program are diagnostic information written to some storage media, e.g. memory, or more generally disk. Tracing is normally used to capture state transitions or state changes within a program as it normally executes in a computer, such as database software. For example, traces may be written to identify a transition between regions in a software program, such as from one region to another region in a function of the software program, e.g. when entering and exiting the function and/or when entering and exiting a loop in the function and/or when a decision is made to take a branch in the function (rather than one or more other branches). Note that writing of such traces may be either independent of or may be triggered by external or internal events. Instructions to write traces are typically included in a software program to show to a flow of execution through the software program (also called “program flow”).
As another example, traces may also be written to document transitions between various states of an object, such as a transaction object and/or a SQL cursor in a database. Most commonly, tracing involves a developer writing within the source code of a software program, a mix of one liners, to be output as traces (like “Entering function XXX( ): arg1=YYY arg2=ZZZ”) and/or statements to dump data, such as an explain plan dump at the end of SQL compilation.
When a problem arises in a software program that is tracing its execution, the traces being output can help in several aspects: traces allow developers to reconstruct events that lead to errors, helping developers in hypothesizing root causes of a problem; in some cases, tracing can be used to isolate a bug, by process of elimination, to a smaller region of the software program responsible for its root cause. Tracing, especially in-memory, can also help to resolve bugs related to concurrency/timing issues. Finally, pinpointing the root cause of a performance problem can be greatly simplified by using timed traces.
Many varieties of tracing mechanisms have been implemented by various applications in the prior art. Some tracing infrastructure like the ANSI C library function fprintf write directly to an output file without any additional structure and require the software which calls this function to do trace output control checking before invoking the tracing API. Tracing mechanisms which may be structured typically have code layer specific structures which cannot be shared across multiple code layers. Some tracing mechanisms write traces to disk, and do not support in memory tracing. Others do support in memory tracing, however often the implementation suffers from the problem of evicting important traces. For example, if one component executed by a process is more verbose, i.e. generates traces at a much higher rate than other components executed by that process, then the more verbose components traces can evict the traces of less verbose components which makes it difficult to debug the less verbose components.
On-disk tracing can handle the issue of eviction noted above, but typically it cannot be enabled by default because it has no in-memory component. Hence, it cannot be used to diagnose a first failure (described in the next paragraph). A trace file resulting from use of on-disk tracing is free form and has no defined structure. Also, on-disk tracing has no built-in control mechanism so control is very ad-hoc and each component has its own mechanism to enable and disable tracing.
Even though tracing is very useful in debugging a software program, it is usually very expensive to be enabled by default. Proactive tracing may neither be efficient or effective. For this reason, diagnostic information that is generally available to perform first-failure diagnosis is very limited. First failure diagnostic is the ability, out-of-the box, to diagnose an unexpected error, using diagnostic data that is dumped when the error first occurs. Under this condition, it is sometimes necessary to repeat prior execution of a software program with a run-time flag for tracing enabled, to generate traces and resolve a bug. In some cases tracing must be enabled via a compile-time flag, which requires compilation of a special binary of the software program. The binary is patched for diagnostics and must be installed on a customer's computer. The situation is even worse if a bug is not reproducible.
Accordingly, the inventors of the current patent application believe that there is a need to improve prior art tracing.