The present invention relates to the field of program debuggers. More specifically, one embodiment of the invention provides for a method and apparatus for generating debugging information where source code is not available.
Program debugging is the process of analyzing a program in a test environment to uncover details about the program's operation. Those details can be used to correct errors in program operation or to understand more about the way a program operates. If source code is available for the program being debugged, the source code can be used in debugging. A typical source code debugger presents a user with a listing of the source code of the program being debugged and the debugger indicates the current line of source code. With a source code debugger, a user can "trace" through a program, i.e., execute one source code statement at a time, to see the line-by-line effects of the program. Typically, the effects include program output and changes to program variables. Many debugging systems include variable displays that display the current values of program variables. Using a variable display, the user can see the effects of the program on program variables as the user traces through a program being debugged.
Source code is a format of a program that is easily readable and edited by humans. Well-written source code is an unambiguous expression of the instructions making up a program, but source code is often not the most efficient way for a computer to process these instructions. Because of this, source code is often "compiled" into "executable" code by a compiler. With a good compiler, executable code is optimized for performance and/or memory usage. Executable code may be readable by humans, but it is usually not as easily understood as source code and it usually not editable except for very simple programs or very complex editing processes.
Another reason for compiling source code into executable code has to do with program distribution. Where an author of a program wants to distribute a program for execution by others, but does not want them to know about details of the program and/or does not want them to be editing the program, the author will compile the program's source code and only distribute the executable code output by the compiler.
When a recipient of the executable wants to understand the program's operation or wants to edit the program to create a modified version of the program, the recipient might be able to run a decompiler on the executable code to generate an approximation of the source code. A decompiler cannot typically regenerate the original source code exactly, as some information from the source code is not carried over to the executable code and the compiler may have some optimizations which lose information as the compiler makes the executable code more efficient. The lost information includes variable names and source to executable line correspondences. Variable names are lost when the source code includes descriptive variable names and the compiler replaces them with more concise variable references, such as consecutive numbers or pointers.
Variable references and source to executable line correspondences are not necessary to execute the program (only the executable code is necessary, by definition), but are useful when debugging the executable code. The line correspondences allow a debugging system to indicate, using highlighting or other well known methods, which line of source is being executed, i.e., which line of source corresponds to the executable instruction being executed. Variable references are used by the debugging system as labels to identify variables being watched.
FIGS. 1-2 illustrate two systems of debugging that have been used in the past. As shown in FIG. 1, a source code file 10 containing source code is passed to a compiler 12 which generates an executable code file 14, a line number map file 16 and a symbol table file 18 for the source code in file 10. Files 14, 16 and 18 are passed to a debugging system 20, which a user uses to debug the program represented by the source code and the executable code. As should be apparent, the scheme of FIG. 1 requires that line number map file 16 and symbol table file 18 be accessible by debugging system 20. As those two files are not needed for execution, they are generally not provided with the executable code provided to end users.
FIG. 2 shows a system that allows debugging without having access to the original line number map file and symbol table file. As shown there, a source file S is passed to a compiler X, which generates an executable file E, a line number map M and a symbol table T. Source file S, compiler X, line number map M and symbol table T are shown with dotted lines to indicate that they are not available to the operator of debugging system 20. To overcome the lack of these files, a decompiler Y is used to generate source file S' from executable file E. Source file S' is then passed to a compiler Z which generates executable file E', line number map M' and symbol table T'. Debugging system 20 then uses source file S', executable file E', line number map M' and symbol table T' in its debugging process.
In FIG. 2, similar elements are noted with primes (e.g., S, S') to point out where the similar items are not identical. Source S' is not identical to source S because some information is lost, but also because decompiler Y is not an exact inverse of compiler X. In addition to converting source code into executable code, a compiler will often rearrange instructions to optimize the program. For example, if a compiler encounters a loop with an instruction to set a variable to a constant value, the compiler might move that instruction to a point before the loop so that the value does not get set on every pass through the loop. When a decompiler then generates source code from that executable code, the instruction to set the variable will appear before the source code for the loop. It may be possible to design a decompiler to be the exact inverse of a compiler if the compiler does not perform irreversible optimizations, but in practice, the user of the debugging system does not even know which compiler the program distributor used for compiler X.
As an additional complication to the scheme shown in FIG. 2, debugging system 20 operates not on executable file E, but on executable file E', so the differences between the two executable files might cause bugs to disappear only during debugging or cause bugs to appear in executable file Et that were not in the original executable file E.
From the above it is seen that an improved method and apparatus for debugging executable code is needed.