1. Field of the Invention
This application relates to the field of computer software and more particularly to the field of computer software for instrumentation of code in order to facilitate debugging.
2. Description of Related Art
Code instrumentation is performed by adding statements to software in order to monitor performance and operation of the software during run time. Code instrumentation is sometimes used to facilitate debugging of run time errors relating to memory accesses. Specifically, since many run time errors are the result of improperly accessing or using memory (e.g., writing beyond an array""s boundaries, not freeing dynamically allocated memory, etc.), then instrumentation may be used to supplement memory accessing portions of the software with additional software that monitors memory accesses and provides an indication when it appears that an improper access has occurred.
Instrumentation may be performed manually by having the programmer insert source code statements that intermittently output or record values related to memory variables, such as array indices and amounts of free space left in the allocation heap. However, such manual instrumentation is often inefficient for a number of reasons. Manual instrumentation requires the programmer to recognize possible sources of error in order to be able to insert the appropriate source code to perform the instrumentation. However, once the programmer has identified possible sources of error, it may be more straight-forward to simply examine the potentially errant code and fix the error rather than perform the additional steps associated with adding source code instrumentation statements. In addition, manually adding source code instrumentation statements requires repeated recompiling of the source code before execution, which adds time and effort to the debugging process. Also, the programmer must remember which statements are instrumentation statements in order to remove those statements once the added debugging statements are no longer needed.
Various systems exist for automating the debugging process. U.S. Pat. No. 5,581,696 to Kolawa et. al (the ""696 patent) is directed to a method of using a computer for automatically instrumenting a computer program for dynamic debugging. In the system disclosed in the ""696 patent, the instrumentation software examines and supplements a parse tree intermediate stage produced by the compiler. The parse tree is a tree having nodes corresponding to tokens that represent individual source code statements. The system described in the ""696 patent traverses the parse tree to locate tokens of interest (e.g., tokens corresponding to memory accesses) and supplements those tokens with additional tokens corresponding to code that monitors the memory accesses. However, since the contents of the parse tree depend upon the particular source programming language used, the system disclosed in the ""696 patent is also source dependent.
U.S. Pat. Nos. 5,193,180, 5,335,344, and 5,535,329, all to Hastings (the Hastings patents), disclose a system for instrumenting computer object code to detect memory access errors. The instrumentation includes providing additional code that maintains the status of each and every program memory location along with supplementing object code instructions that access the program memory with additional code that facilitates maintaining status of the memory locations. To the extent that the object code is independent of the particular source code that is used, the system disclosed in the Hastings patents is also independent of the source code language used.
However, since the system disclosed in the Hastings patents involves modifying object code, then the system is target dependent in that it may only be configured to work with object code that executes a particular target processor""s native language. Although it may be desirable to adapt the Hastings system to work with object code for a variety of target processors, such an adaptation would require significant modifications to the system since object code instructions that access memory may vary significantly between different target processor languages. In addition, monitoring program memory accesses A by maintaining the status of program memory locations allows some improper operations to be performed by the software without being detected. For example, reading a memory location beyond an array""s boundaries may not be detected if the memory location that is read has been allocated and initialized in connection with another memory variable.
Other systems for facilitating debugging exist. For example, U.S. Pat. No. 4,667,290 to Goss et al. is directed to compilers that create intermediate representation (IR) code that is both source and target independent. Column 5, lines 57-60 disclose using the IR code to facilitate debugging by retaining portions of the IR code that would otherwise be eliminated in the course of optimization if debugging is not being performed. Similarly, U.S. Pat. No. 5,175,856 to Van Dyke et al. discloses a compiler that produces an IR code where debugging is facilitated by passing information through the intermediate code file.
U.S. Pat. Nos. 5,276,881, 5,280,613, and 5,339,419, all to Chan et al., disclose a compiler system that produces an IR code. U.S. Pat. No. 5,276,881 is illustrative of the three patents and discloses symbolic debugging support provided in connection with the compiler system described in the patent. Column 59, lines 15-19 indicate that if the symbolic debug option is specified, xe2x80x9c . . . then the Low-level Code Generator 1322 writes additional information to the Low Level CIR 1338.xe2x80x9d. (CIR is an acronym for Compiler Intermediate Representation.) Column 57, lines 59-63 indicate that the Low-Level CIR 1338 is analogous to the compiler intermediate representation 212, but the low level CIR 1338 is not architecturally neutral (i.e., is target dependent). Column 57, lines 63-65 state specifically that the Low-Level CIR 1338 is dependent upon the particular architecture of the target computer platform.
In addition, various systems compile source code into an interpretive language, such as byte code. Ideally, the byte code is machine independent so that it may be run on a computer that uses an interpreter to perform the operations indicated by the byte code. The interpreter is, of course, machine dependent. However, once an interpreter has been provided for a particular machine, it only needs to be subsequently modified or updated when changes to the byte code standard occur.
Note that an interpreted byte code is, ideally, independent of the particular underlying source code. Thus one particular byte code could be used for multiple source code languages. An example of this is P-Code, which is an interpreted byte code that is provided by Fortran, C, and Pascal compilers. Thus, since a byte code may be both machine and source code independent, it would be advantageous to be able to instrument the byte code to provide instrumentation for a plurality of source code languages and a plurality of machines that all make use of the byte code.
According to the present invention, instrumenting a byte code computer program includes examining the byte code, selecting portions of the byte code for instrumentation, and instrumenting the portions to provide instrumented byte code. Selecting the portions may include choosing portions of the byte code corresponding to method entry, method exit, a throw, a method call, or a new line number. Instrumenting a portion of the byte code corresponding to a method call may include instrumenting a local line number of source code corresponding to the byte code being instrumented. Instrumenting the portions may include adding calls to instrumentation runtime functions that pass parameters indicative of the portions being instrumented. At least one of the parameters that is passed may include a line number of the source code corresponding to the portion being instrumented or a thispointer for the method corresponding to the portion being instrumented. As is known in the art, the thispointer is a pointer to an address where data for an object is stored.
At least one of the parameters that is passed may include at least one method parameter provided to a method containing byte code that is being instrumented. The at least one method parameter may be passed in a message buffer from an instrumentation runtime function to at least one viewer routine that displays the data to a user. The message buffer may include scalar data, array data, and object data. An object header or an array header may be placed in the message buffer. The message buffer may be limited to a predetermined size. The data indicative of the parameters may be stored in a message buffer. Data from the message buffer may be passed to at least one viewer routine that displays the data to a user.
A method may be instrumented to provide instrumentation for handling an abort. A native function call may be instrumented by adding a byte code wrapper to the native function and then instrumenting the wrapper. The wrapper may include byte code corresponding to method entry and exit portions. Instrumenting a call to a native function may include providing an native assembly language thunk that captures data passed to and from the native function. The assembly language thunk may be hooked between the virtual machine and the call to the native function. Hooking the assembly language thunk may include intercepting a call that provides an address for a procedure.
The data that is generated by instrumentation may be provided to a routine to pass data via a message stream. A data storage may be provided to store data provided via the message stream and/or a viewer may be provided to allow viewing at least a subset of data from the message stream as the data is being generated.
According further to the present invention, instrumenting a computer program includes examining an initial byte code representation of the program, creating a program counter mapping table corresponding to the byte code representation, selecting portions of the byte code representation for instrumentation using the program counter mapping table, instrumenting the portions by adding calls to instrumentation runtime functions at at least some of the portions, and modifying the program counter mapping table according to modifications to the byte code. According further to the present invention, uniquely identifying an object in an object oriented programming language includes obtaining a unique identifier, such as a hash code, corresponding to the object, creating a data structure having a least a first and a second storage location, storing an identifier for the object class in the first storage location, and storing the unique identifier in the second storage location.