1. Technical Field
The present invention relates generally to an improved data processing system and in particular to a method and an apparatus for providing a debugging environment. Still more particularly, the present invention provides a method to generate optimized object code and a mapping to debugging information in a separate data file, thus allowing for efficient debugging of optimized xe2x80x9creleasedxe2x80x9d object code.
2. Description of Related Art
A compiler provides a translation from source code, written in a high level programming language such as C++, into machine level code that can be executed on a particular microprocessor. The compilation process is generally viewed as a sequence of six major steps: lexical analysis of the source code into a sequence of tokens with attributes, syntactic analysis and semantic analysis of the token stream to form an abstract syntax tree, generation of intermediate code that is independent of a particular processor, optimization of the intermediate code, and generation of machine level code for the target processor. Sometimes the final step produces assembly level code that is automatically assembled to create the machine level code. In most applications, the source program is split into many different files that are compiled separately. Each compiled file produces an object file, but this code cannot be executed by itself. All the object files are linked to produce a standalone executable code file.
The process of debugging a program can be long and complex. Many bugs are found before a software product is released but other bugs are discovered after release, which results in either installing program patches or a more recent program release, e.g., version 1.1.6 to version 1.1.7. Having sophisticated debugging tools is important to produce reliable software. However, in order for these tools to be user friendly and highly functional, it is necessary to embed a large amount of additional information in the executable code. For example, a programmer in a high level programming language uses a variable name meaningful in the context of the program, e.g., customer_name, that the compiler translates into a memory address. In a similar manner, a subroutine is given a meaningful name, e.g., calculate_balance, that becomes a memory address for the first instruction of the subroutine. When a developer wants to debug a program, the developer wants to be able to use the symbolic names in the source code to control the actions of the debugger, e.g., retrieve the current value stored in customer_mane or execute the program until the subroutine calculate_balance is called. Embedding all of this additional information in the executable code causes it to be much larger and execute much more slowly.
One approach to solve this problem is to provide a switch in the compiler to tell the system to generate either debug-level code or production-level code. This approach has several drawbacks. Developers will tend to work with debug-level code that may, in fact, mask some critical timing problems in the production-level code. When errors occur after the code is released in production-level form, there is no embedded support for debugging. Even if the code is recompiled in debug mode, it may be very difficult to recreate the exact conditions that caused an error to occur. The size and speed penalty of the less optimal debugging version of the code comprise yet another drawback, and in general prohibit widespread use of object code compiled with the debugging information.
Therefore, it would be advantageous to have a method and system to have the compiler produce highly-optimized production-level code yet, if a bug is detected, have the symbolic information available to allow immediate invocation of a symbolic debugger to discover the source of the bug without having to recompile the source code.
This system sets up a framework that allows for separating debug information from executable code in a way that enables efficient, symbolic debugging of optimized production-level executables.
The system contains three major components: a compiler, a linker, and a debugger. The compiler produces highly optimized production-level object code. Optimization includes techniques such as memory sharing, loop unrolling, constant folding, peep hole optimization, parallelization, etc. The compiler also produces a debugging information file that contains names of variables and their locations, names of subroutines and their locations, program statements and their locations, and it also produces information about code optimizations such as information about memory sharing between variables, information about iteration labeling for loop unrolling, and register locations used by variables during program execution.
The linker merges the optimized object files to produce an optimized executable file. The linker also merges the debugging information files and adds additional information to produce a composite debugging information file. This debugging information file can be further refined when the program is executed to include runtime dependencies.
The debugger provides a source code debugging environment and a core dump debugging environment for the optimized code. The debugger allows monitoring of variable values, placement of watches on changes in variable values, setting of breakpoints by location or by name, and performing program debugging at a source code level. It can undo memory sharing optimization by providing separate copies of each variable location as the optimized code is executed. Other optimization techniques are also xe2x80x9cundonexe2x80x9d by the debugger. The debugger can be used for symbolic analysis of core files from optimized code dumps. It is also possible to verify the correctness of code optimizations by using the debugger.