The present invention relates to program debugging, and more specifically, to a method and a system for dynamic code switching in debugging process.
In programming, most applications are debugged at a source code level. The source code is generally written in a high level language. The high level language is defined in comparison with the assembly language and is a kind of programming closer to the natural language and mathematic formula. The high level language is basically independent of the hardware system of the machine, and is used write programs in a way that is more readily understood by the people. Therefore, all of programmers' debugging operations like step through, set break points, etc are based on the view of high level language source code. However, actually, what a debugger receives, runs and operates on is a binary of the program (a compiled version of the source code), so as to generate a result of running. The debugger is responsible for mapping source codes to binary or mapping binary to source codes with the help of debug information generated by the compiler. Debug information provides, for example, the following information: line number of each instruction in the binary, data type of each memory location in binary, etc.
FIGS. 2-5 show a simple debugging process. In FIG. 2, the debugger handles debugging operations on a source code view. Specifically, a break point at line 8 of the source code is set. In FIG. 3, the debugger searches debug information for related binary instructions. Here, it is noted that in order to make binary instructions readable to understand the present invention, FIGS. 3-4 show an assembly language view using mnemonic symbols, which have a direct correspondence with binary instructions. However, in reality, such intermediate assembly language view does not necessarily exist. For example, in FIG. 3, three lines of instructions with the line number 8 correspond to the source code “a—;” at line 8 in FIG. 2. A case where the debugger runs the binary codes and hits the break point is shown in FIG. 4. In FIG. 5, the breakpoint is reflected on the source code view to show to the user. In the whole debugging process, debug information is critical for an accurate and smooth debug. More specifically, the base of this method lies in mapping each instruction in binary to a source code line accurately.
The process of transforming source code to binary is called compiling. To make the execution of binary faster, a compiler usually reduces execution time and size of binary as much as possible but keeps logical identity by compiling, which is called compiler optimization. However, optimization creates lots of troubles for debugging, for example, it moves, changes, splits, merges, or eliminates codes all around, which makes order of source code information totally a mess.
For example, there are the following optimization technologies: a scheduler may disorder instructions to avoid hardware pipeline bubbles, and it's side effect on debugging is that continuous stepping will jump randomly in source code view; loop invariant motion moves computations irrelevant to loop variable out of a loop, and it's side effect on debugging is stepping in and out of a loop body randomly.
Both performance and debuggability are important, but they conflict with each other. Computer professionals have tried many ways to make optimized debugging possible, for example making new debugging information standards. No matter which debugging information standard is used, a common way is to maintain source code information in binary and feed it to the debugger. However, this will not resolve the problem, because in fact binary does not align with source code any more by nature of optimization.
Some techniques have been proposed to provide more accurate debugging experience. For example, a compiler may have an option—qoptdebug. The use of this option for compiling will generate pseudo codes showing optimized high level language. For example, for the following function:
void foo(int x, int y, char* w){ char* s = w+1; char* t = w+1; int z = x + y; int d = x + y; int a = printf(“TEST\n”); for (int i = 0; i < 4; i++)  printf(“%d %d %d %s %s\n”, a, z, d, s, t);}the following pseudo codes will be generated:
1  3 | void foo(long x, long y, char * w)2  9 | {3a = printf(“TEST/n”);4 12 |@CSE0 = x + y;5printf(“%d %d %d %s %s/n”,a,@CSE0,@CSE0,((char *)w+1),((char *)w+1));6printf(“%d %d %d %s %s/n”,a,@CSE0,@CSE0,((char *)w+1),((char *)w+1));7printf(“%d %d %d %s %s/n”,a,@CSE0,@CSE0,((char *)w+1),((char *)w+1));8printf(“%d %d %d %s %s/n”,a,@CSE0,@CSE0,((char *)w+1),((char *)w+1));9 13 |return;10}
In this case, debugging will be directly based on pseudo codes. This method maps pseudo codes with binary very well, since both of them are optimized codes. However, mapping from original codes to pseudo codes is not readily to be understood. Users may still be confused with debugging unrecognized codes.
To achieve a balance between performance and debuggability, in one known method, for some subroutines, a complier generates both of optimized object code and debuggable object code. According to configuration file, compiler switch, user input, etc, it is determined that, for which subroutines two versions of object codes are generated. When one subroutine is determined to be debugged, it is possible to jump to debuggable object code by inserting a jump instruction in optimized object code of the subroutine. There are problems in the method as follows: upon all subroutines or most of the subroutines are ever debugged, optimized object codes of respective subroutines are jumped to debuggable object codes gradually, until all jump to debuggable object codes, accordingly, the execution speed of the program will become more and more slow; additionally, when the method is used, it is necessary to load both of the two versions of object codes into memory, which greatly consumes memory resource.