Instrumentation of computer software is well-known and involves insertion, into a particular computer program, of computer instructions which evaluate the computer program during execution. As used herein, a computer program is a series of computer instructions and data stored in a computer-readable memory which collectively define a computer process. A computer processor fetches and executes the computer instructions of a computer program to form a computer process. The computer process includes the computer instructions of the computer program and data representing the execution state of the computer process. Execution of an instrumented computer process causes execution of inserted computer instructions to aid in the evaluation of the instrumented computer process. Both computer programs and computer processes can be instrumented.
Instrumentation of a computer program is generally accomplished by one of three techniques. These techniques are also generally applicable to the instrumentation of computer processes. In the first technique, instrumentation computer instructions, are inserted directly into, i.e., between native computer instructions of, the computer program or computer process. As used herein, an instrumentation computer instruction is a computer instruction inserted into a computer program for the purpose of analyzing the computer program, and a native computer instruction is any other computer instruction of a computer program. During development, a computer program typically includes both instrumentation and native computer instructions. However, when the computer program is released as a commercial product, the computer program will typically include only native computer instructions, and all instrumentation computer instructions will typically be removed from the computer program.
As an example of the first technique, instrumentation computer instructions can be inserted before a native computer instruction which accesses computer memory at a particular address. In this example, the instrumentation computer instructions can determine the particular address, compare the particular address to valid memory address ranges, and report an error if the particular address is not within any of the valid memory address ranges.
In a second technique, two or more native computer instructions of a computer program are replaced with a call to a separate instrumentation sequence which is located remotely within the computer program. Calls to sequences are well known and are described herein only briefly for completeness. Calling a sequence transfers control of a computer process to the sequence and provides the sequence with data, e.g., by pushing such data onto a stack. Thus, a call to a sequence involves a change in the state of the computer process (by pushing data on a stack) and a transfer of control. The sequence includes one or more computer instructions which are fetched and executed upon calling of the sequence.
When the computer program attempts to fetch and execute one of the replaced native computer instructions, the instrumentation sequence is called instead. The computer instructions of the instrumentation sequence, which can include for example the replaced native computer instructions and a number of instrumentation computer instructions, are executed. Following execution of a number of the computer instructions of the instrumentation sequence, processing transfers back to the computer program at the computer instruction immediately following the call to the instrumentation sequence. For example, two or more native computer instructions which access computer memory at a particular address can be replaced with a call to a sequence which includes a number of instrumentation computer instructions which determine the particular address, compare the particular address to valid memory address ranges, and report an error if the particular address is not within any of the valid memory address ranges.
In a third technique, a single native computer instruction is replaced with a branch to a sequence of a number of instrumentation computer instructions. The sequence can include, among other computer instructions, the replaced native computer instruction. The last computer instruction of the sequence is typically a branch to the computer instruction of the computer program which is ordinarily executed immediately following execution of the replaced native computer instruction. For example, a native computer instruction which accesses computer memory at a particular address can be replaced with a branch instruction which causes processing to transfer to a sequence of computer instructions including a number of instrumentation computer instructions which determine the particular address, compare the particular address to valid memory address ranges, and report an error if the particular address is not within any of the valid memory address ranges.
It is generally advantageous to add instrumentation computer instructions to, and remove instrumentation computer instructions from, a computer program quickly. It is therefore generally preferred in the art to instrument computer programs in the form of object code rather than source code. Source code is a collection of one or more computer instructions in a form which is intelligible to humans, and object code is a collection of one of more computer instructions in a form which is intelligible to a computer processor. A computer program is generally created by configuration and combination of computer instructions in source code form by a human software engineer who then causes the source code to be compiled, i.e., translated from source code to object code. Compilation of a computer program can be quite time-consuming and can require substantial resources of a computer system. If a computer program is instrumented while in the form of source code, the computer program must be compiled again before the computer program as instrumented can be executed in a computer system. Instrumentation computer instructions can be added to a computer program without requiring recompilation of the computer program if the instrumentation computer instructions are added to the computer program while in the form of object code, i.e., after compilation of the program. In this way, instrumentation computer instructions can be added to or removed from a computer program quickly, i.e., generally in substantially less time than required to compile the computer program.
In addition, computer processes generally include computer instructions in an object code format. Therefore, the ability to instrument object code enables instrumentation of computer processes. As a result, a computer process can be instrumented as needed by a debugger during execution of the computer process. A debugger is a computer process which controls and analyzes the execution of another computer process.
Instrumentation computer instructions are added to a computer program in object code form in generally one of three ways. First, instrumentation computer instructions are inserted in the computer program at the point at which the instrumentation computer instructions are to be executed, thereby displacing native computer instructions at subsequent positions in the computer program. This technique has the advantage of the most efficient execution possible of the computer program as instrumented. However, since native computer instructions are displaced, references to the displaced native computer instructions throughout the computer program must be located and modified to refer to the native computer instructions as displaced. Location and modification of such references takes nearly as much processing as recompiling the computer program from source code and adds the risk that new errors are introduced into the computer program. Thus, this mechanism for adding instrumentation computer instructions to a computer program in object code form provides little advantage, if any, over adding instrumentation computer instructions to a computer program in source code form.
The second mechanism for adding instrumentation computer instructions to a computer program in object code form is replacing a contiguous block of two or more native computer instructions with a call to a sequence of instrumentation computer instructions. A call to a sequence of computer instructions is typically longer than any single computer instruction. Therefore, to avoid displacing a large number of native computer instructions, the size of the block of replaced native computer instructions is at least the size of the call to the sequence of instrumentation computer instructions. If the size of the call is less than the size of the block of replaced native computer instructions, no-op computer instructions, which have no effect when executed, are inserted before or after the call such that the call and the no-op computer instructions collectively occupy the same amount of address space vacated by the block of replaced native computer instructions. A call to a sequence of computer instructions typically includes computer instructions which place data on a stack which is accessible by the called sequence of computer instructions and a computer instruction which transfers processing to the called sequence of computer instructions.
The called sequence of instrumentation computer instructions typically includes the replaced native computer instructions to preserve the overall behavior of the computer program. References throughout the instrumented computer program to some of the replaced native computer instructions must be located and modified to refer to the call to the sequence of instrumentation computer instructions. While this second mechanism for adding instrumentation computer instructions to the computer program displaces fewer native computer instructions than the first-described mechanism, this second mechanism suffers to a substantial degree from the same disadvantages as the first mechanism described above.
The third mechanism for adding instrumentation computer instructions to a computer program in object code form replaces a single native computer instruction with a branch computer instruction which transfers processing to a sequence of instrumentation computer instructions, which can include the replaced native computer instruction. Since only a single native computer instruction is displaced, any transfer of control to the displaced native computer instruction transfers control to the branch computer instruction which in turn transfers control to the sequence of instrumentation computer instructions, which include the displaced native computer instruction. Therefore, the overall behavior of the computer program is unchanged as a result of the instrumentation. "Transfer of control" is used herein as the term is generally used in the art to refer to the sequence of execution of computer instructions. In other words, if control is transferred from a first computer instruction to a second computer instruction, the second computer instruction is executed immediately following the first computer instruction.
The last instrumentation computer instruction of the sequence is generally a branch computer instruction which transfers control to the native computer instruction whose execution immediately follows execution of the replaced native computer instruction in the computer program without instrumentation computer instructions. The size of the branch computer instruction is typically the same size as the replaced native computer instruction to avoid displacing other native computer instructions. If the size of the branch computer instruction is less than size of the replaced native computer instruction, one or more no-op computer instructions are inserted before or after the branch computer instruction such that the branch computer instruction and the no-op computer instructions collectively occupy the same amount of address space vacated by the replaced native computer instruction.
The sequence of instrumentation computer instructions can be inserted into the computer program at generally any location so long as the sequence of execution of native computer instructions is not changed. If the sequence of instrumentation computer instructions is inserted at a location such that native computer instructions are displaced, this third mechanism for adding instrumentation computer instructions to a computer program suffers from the same disadvantage as the first two mechanisms described above, namely, that references to the displaced native computer instructions must be located and modified. It is therefore generally preferred in the art that such a sequence of instrumentation computer instructions be added to the end of the computer program so that no native computer instructions are displaced. Such is frequently not feasible, however, when the original location of the replaced native computer instruction is too far from the sequence of instrumentation computer instructions to be reached by a branch instruction of the size of the replaced native computer instruction. The following example is illustrative.
Some native computer instructions, e.g., the PUSH computer instruction, are as small as one byte. Such a native computer instruction must generally be replaced with a branching computer instruction whose length is at most one byte. Such a branching computer instruction can typically transfer control to a computer instruction which is displaced from the branching computer instruction by at most 255 bytes. Therefore, a native computer instruction whose length is only one byte can only be efficiently instrumented according to the third technique described above if the native computer instruction is no more than 255 bytes from the last address of the computer program occupied by a native computer instruction. Many computer programs in use today are several orders of magnitude greater than 255 bytes. Thus, many native computer instructions of such computer programs cannot be instrumented by any of the techniques described above without displacing a significant number of other native computer instructions.
Therefore, no satisfactory mechanism currently exists for adding instrumentation computer instructions to particular large computer programs. Current solutions either (i) require excessive time and resources to add instrumentation computer instructions to, or remove instrumentation computer instructions from, computer programs or (ii) cannot efficiently instrument certain native computer instructions of a computer program.