This invention relates generally to source-code debuggers, and more particularly to the composition of symbol tables and management of breakpoints.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawings hereto: Copyright (copyright) 1999, Microsoft Corporation, All Rights Reserved.
Computer program debuggers are used by programmers to find problems that occur during the execution of a program. Debuggers can be used to control the execution of a program using breakpoints to stop execution of the program at desired points. This allows the programmer to examine variables and a call stack during execution in such a manner that the user of the debugger can view snapshots of the execution of a program and determine if the program is behaving as expected. Users of debuggers can also browse source files, set breakpoints, watch variables, and examine data structures.
Symbol tables, also known as debugging tables, are used by program debuggers to provide detailed information during the execution of the program. A symbol table is generated or emitted by the compiler and linker when the program source code is compiled and linked. Symbol tables are associated with a file that contains the generated executable code of the program. The symbol table maps source statements to byte addresses of executable instructions, which provide guidance in setting breakpoints and examining data during execution. More specifically, debugger symbol tables contain information describing the source code, such as line numbers, the types and scopes of variables, and function names, parameters, function scopes and name/attribute bindings specified by the declarations in a program. Debugger symbol tables also contain information describing the generated executable code. The symbol table enables the debuggers to map source-level variables and data structures to a specific location in the memory of the program being debugged. Debugger symbol tables are not the same as a symbol table that is used internally by the compiler during compilation.
Conventionally, debuggers have been considered notoriously machine-dependent programs. Many conventional debuggers, such as the GNU debugger, gdb, described in R. M. Stallman and R. H. Pesch, xe2x80x98Using GDB: A guide to the GNU source-level debugger, GDB version 4.0xe2x80x99, Technical Report, Free Software Foundation, Cambridge, Mass., Jul. 1991, do indeed depend heavily on a specific operating system or on a specific platform or compiler. In conventional debuggers, symbol tables are encoded ad-hoc, in which the information in the symbol table is machine-dependent, in which at least a portion of the information in the symbol table is unique to, or characteristic of, a particular computer environment. More specifically, the machine-dependence pertains to machine architectures, operating systems, compilers, and linkers, in which specific or unique features of the computer environment that cannot be easily used, if at all, in a different environment. Beyond the direct consequence of a lack of portability of the symbol table between platforms, a machine-dependent ad-hoc symbol table also has the consequence of the debugger being machine-dependent because the debugger must have the ability to parse and process the machine-dependent information in the symbol table, which in turn requires that the debugger be revised or at least be re-compiled for each specific computer type. While most debuggers are notoriously machine-dependent, recent research prototypes have achieved varying degrees of machine-independence with novel designs, such as by embedding symbol tables and debugging code in the target program. However, embedding symbol tables and debugging code in the target program results in relatively slow execution and a larger symbol table.
Two nearly machine-independent debuggers, ldb and cdb, are source-level debuggers for C. However, neither ldb nor cdb are completely machine independent. Ldb is described in N. Ramsey and D. R. Hanson, xe2x80x98A retargetable debuggerxe2x80x99, Proceedings of the SIGPLAN ""92 Conference on Programming Language Design and Implementation, SIGPLAN Notices, 27(7), 22-31 (1992). Ldb is easier to port to a different architecture, but it uses its own symbol-table format and thus requires cooperation from compilers. Cdb is described in D. R. Hanson and M. Raghavachari, xe2x80x98A machine-independent debuggerxe2x80x99, Software-Practice and Experience, 26(11), 1277-1299 (1996). Cdb explores perhaps the extreme reaches of this design space: It is nearly completely independent of architectures and operating systems, but it achieves this independence by loading a small amount of code with the target program and by having the compiler emit a non-standard, but machine-independent, symbol table. Furthermore, cdb embeds symbol tables and debugging code in the target program. Cdb does illustrate how focusing on retargetability can simplify a debugger dramatically.
Furthermore, conventional symbol tables are designed as file formats and symbol tables are documented in torturously detailed specifications. Symbol table file formats are also difficult to change. For example, conventional debuggers can set breakpoints only on discrete lines of code, because the symbol-table format provides information only about lines even though the syntax of most languages is not line-oriented and includes operations that have embedded flow of control. Java""s class files are described as a file format, and class files include metadata that map locations to line numbers as described in T. Lindholm and F. Yellin, The Java Virtual Machine Specification, Addison Wesley, Reading, Mass., 1997.
FIG. 1 shows a diagram of a debugger nub 110 in a conventional scheme. A nub 110 is the central feature of conventional designs. The nub 110 enables a debugger 130 to debug a target program 120 that is being debugged and that is running on the same computer or another computer as the nub 110. The nub 110 is a small program that controls the target program 120, and is responsible for actions such as setting breakpoints and stepping through code. The nub 110 provides a layer between the main debugger application 130 and the low level system operations. The nub 110 also provides debugging primitives. The nub 110 provides facilities for communicating with the debugger 130 and controlling the target 120. Low-level operations of the debugger 130 are performed by communicating with the nub 110 which is a small set of machine-dependent functions that are embedded in the target program 120 at compile-time.
As depicted in FIG. 1, all communication between the target program 120 and the debugger 130 goes through the nub 110. The nub 110 is a program loaded into memory with the target program 120. The debugger 130 can be either in the same memory address space as the target program 120, or in a separate memory address space. The latter configuration is a common one, because it protects the debugger 130 from corruption by the target program 120. Furthermore, the debugger 130 and the target program 120 can execute in the same computer in which the debugger 130 and the target program 120 communicate through a system bus. The debugger 130 and the target program 120 can also execute in different computers in which the debugger 130 and the target program 120 communicate through a relatively slow communication link, such as a Remote Procedure Call (RPC) channel.
Furthermore, in a conventional debugger 130, the management of user breakpoint information is performed by the debugger 130. In implementations where the debugger 130 and the target program 120 are implemented as separate processes, the debugger 130 process and the target program 120 process are burdened by communication overhead. More specifically, the target program 120 communicates to the debugger 130 which statement is being executed at any given point in time, and the target program 120 cannot proceed with execution until the debugger 130 determines that the target program 120 can proceed based on whether or not a breakpoint is set at that statement.
Interaction with the nub 110 is defined by an interface summarized below in Table 1. The interface is minimal because, while the interface itself is machine-independent, an implementation of the interface is not machine-independent. Furthermore, an implementation for a specific platform is dependent on all aspects of the platform. For example, the nub 110 used with debugger 130 depends only on a compiler, such as lcc, and an operating system, such as Unix variants or Windows NT/95/98 and is a relatively small component. The lcc compiler is described in C W. Fraser and D. R. Hanson, A Retargetable C Compiler: Design and Implementation, Addison Wesley, Menlo Park, Calif., 1995. The nub 110 has been implemented with other debuggers for other languages, as described in D. R. Hanson and J. L. Korn, xe2x80x98A simple and extensible graphical debuggerxe2x80x99, Proceedings of the Winter USENIX Technical Conference, Anaheim, Calif., Jan. 1997, pp. 173-184.
The two data types Nub_coord_T and Nub_state_T and the seven functions _Nub_init(), _Nub_src(), _Nub_set(), _Nub_remove(), _Nub_fetch(), _Nub_store(), and _Nub_frame() defined in Table 1 permit a debugger 130 to control a target program 120 and permit a debugger 130 to read and write data from a target program 120. The nub 110 is mainly a conduit for opaque data. For example, the nub 110 has no information on specific symbol-table formats, but the nub 110 does provide simple mechanisms for reading specific symbol-table formats.
Function _Nub_init() is called by the start-up code and initializes the nub 110. The arguments of function _Nub_init() are pointers to callback functions that are called by the nub 110 to initialize the debugger 130 and to trap to the debugger 130 when a fault occurs. As disclosed below, the type Nub_state_T describes the state of a stopped target program 120, which occurs at start-up, breakpoints, and faults. Functions _Nub_set(), _Nub_remove(), and _Nub_src() collaborate to implement breakpoints. Stopping points define program locations at which breakpoints can be set in terms of xe2x80x98source coordinatesxe2x80x99 specified by the type Nub_coord_T(). A coordinate consists of a file name, a line number (y) and a character number in that line (x). The set of allowable stopping points depends on the language and the compiler. Most embodiments of a conventional debugger 130 limits breakpoints to lines, while cdb and lcc permit breakpoints to be set at any expression. Function _Nub_src() enumerates the stopping points, calling an apply() function of the debugger 130 supplied for each point, function _Nub_set() sets a breakpoint, and function _Nub_remove() removes a breakpoint. When a breakpoint occurs, a breakpoint handler passed to function _Nub_set() as onbreak. is called with a Nub_state_T value that describes the current state of the target program 120. Onbreakxe2x80x9d is a formal parameter namexe2x80x94the name of the actual argument, which is a pointer to a function that""s called when a breakpoint occurs.
Function _Nub_fetch() and function _Nub_store() read and write bytes from the address space of target program 120 and return the number of bytes actually read and written. The target program 120 can have many abstract address spaces. For example, one abstract address space refers to the memory of target program 120, while other abstract address spaces refer to metadata about the target program 126, including its symbol table. The implementation of the compiler, the debugger 130, and the nub 110 define the conventions about address spaces. The nub 110 interface specifies only a way to access those spaces.
Finally, function _Nub_frame() traverses the call stack of the target program 120. The top stack frame is numbered 0 and increasing numbers identify frames higher up the call chain. Function _Nub_frame() moves to frame n and fills the Nub_state_T value with the state information describing that frame. The fields fp and context in the Nub_state_T value are opaque pointers that describe the state of the target program 120. For example, the pointers are typically passed to function _Nub_fetch() to fetch symbol-table entries and the values of variables.
The nub 110 interface does not require a machine-independent implementation. It is possible, for example, to provide an implementation that is specific to one architecture, operating system, and compilation environment.
Conventionally, the debugger 130 and nub 110 execute on the same computer, even when the target 120 is executing on a different computer, such as two different clients in a network. In this case, the nub 110 must communicate with the target 120 over significantly slower communication lines (not shown) than if all components were communicating across a common bus. This results in slow performance.
The above-mentioned shortcomings, disadvantages and problems are addressed by the present invention, which will be understood by reading and studying the following specification.
An abstract notation, such as a grammar, is used to specify a symbol table. Tools are used to generate computer-readable code for constructing, reading, and writing the symbol table from the abstract notation. More specifically, a first aspect of the present invention is directed to encoding a symbol table in an abstract notation, supported by an abstract notation interface component that generates code that constructs, reads and writes symbol tables in some concrete representation. In one embodiment, the contents of the external symbol table are defined by, or encoded in, a machine-independent grammar. The symbol table is stored separately from the executable target The abstract notation interface component is used as an interface between a nub and the symbol table. The nub provides an interface between the debugger and the executable target and the abstract notation interface component.
Using an abstract notation automates implementation of parts of the debugger. Furthermore, the abstract notation documents the symbol table concisely. Using a machine-independent grammar as an abstract notation also yields simplifications to the interface between the debugger and the target program. Furthermore, machine-independent grammar emphasizes that symbol tables are data structures, not file formats, and many of the pitfalls of working with low-level file formats are avoided by focusing instead on high-level data structures and automating the implementation details. Machine independent grammars provide debuggers and compilers that require less development time, use less storage space, and have faster performance and further provide symbol tables that use less storage space.
A second aspect of the invention is directed to dividing the management of breakpoints. Divided management of breakpoints is accomplished by using a split nub, a nub client associated with the executable target and a nub server associated with the debugger. Debugging performance is improved by storing the user breakpoint information in the nub client, so that the debugger does not need to be invoked in the determination of where to break execution. Divided management of breakpoints provides faster execution during debugging and is particularly valuable when communication between the executable target and the debugger is relatively slow, such as through a RPC channel.
Systems, clients, servers, methods, and computer-readable media of varying scope are described. In addition to the aspects and advantages of the present invention described in this summary, further aspects and advantages of the invention will become apparent by reference to the drawings and by reading the detailed description that follows.