Code Instrumentation
In the field of computer science, instrumenting code refers to placing additional instructions in code, which can be used, for example, to monitor the code or for adding additional functionality. Instrumenting code allows for additional software to take control of the program and monitor it during runtime to determine what the computer is actually doing while the program is executing. The program being monitored may be referred to as the target program, and the additional software may be referred to as the controlling program or meta-program. When code is instrumented, transferring control during execution of the target program to the instrumented code at a particular instruction in the target program is referred to as “trapping” that instruction.
Examples of uses for instrumenting code include, but are not limited to, measuring the level of performance for a piece of software, diagnosing errors, and receiving messages about the execution of an application at run time. Examples of controlling programs include, but are not limited to, tracing infrastructures, debuggers, profilers and virtual machine monitors.
Subroutines and Subroutine Calls
In the field of computer science, a subroutine (also called procedure, subprogram, method, function, or routine) is a portion of code within a larger program which performs a task and may be relatively independent of the remaining code. To simplify the process of programming a large software system, the system is structured as a set of smaller sub-problems. Subroutines are programmed to solve these sub-problems. Examples of subroutines include, but are not limited to, reading from a file, testing for the presence of an entry in a cache, and computing a particular mathematical function on its inputs.
A subroutine comprises a number of program statements (and optionally data structures) to perform the specific task assigned to the subroutine. Large software systems are organized as collections of subroutines. Subroutines allow for code reuse; i.e. once a solution to a sub-problem has been implemented and made available as a subroutine, the subroutine can be used as a building block for solving many different problems. Because each subroutine contains or “encapsulates” the individual statements comprising it, the entire subroutine may be invoked or “called” from elsewhere in the program. The part of code which calls the subroutine is referred to as the “caller.” When a subroutine has been called, the encapsulated statements are executed, and when the last such statement completes, the program continues executing from the point in the program where the subroutine was invoked, i.e., the instruction in the caller following the instruction to call the subroutine. The address where this next instruction is located in computer memory is referred to as the return address for the subroutine. The return address of a subroutine is a type of instruction pointer. An instruction pointer is the location or “addresses” in computer memory of an instruction.
Programs are generally written in high-level programming languages, such as C, C++, or Java, which can be easily understood by programmers. The code written in these languages is referred to as source code. All of these languages provide subroutines in some form, and while the details vary in terms of both syntax and semantics, there are many similarities. For example, all languages include a “return from subroutine” statement.
Because of the frequent use of subroutines in computer programs, instruction set architectures provide explicit support for calling a subroutine and returning from the subroutine through specialized call and return instructions. Architectures also impose an application binary interface, or ABI, which establishes conventions for locating the inputs and outputs to subroutines. ABIs enable dynamic linking, i.e., dynamically calling external subroutines during program execution; programming development across different languages; and debugging tools. ABIs cover details such as the calling convention, which controls how subroutines' arguments are passed and return values retrieved. The ABI general purpose CPU architectures provide a linking convention for specifying the return address of a subroutine, such as reserving a space in memory for the return address.
As discussed above, the return address is the location in the code of the instruction following the instruction to call the subroutine. One example of reserving space in memory for the return address includes placing the return address in a specific register when a subroutine is called, as is done in the MIPS architecture and the DEC Alpha architecture. A “register” is a small amount of storage available on the CPU whose contents can be accessed more quickly than storage available elsewhere. In MIPS the register where the return address is placed is referred to as the $ra register.
FIG. 1A is a diagram illustrating a subroutine call in an architecture that places the return address in a specific register when a subroutine is called. For purposes of illustrating a call to a subroutine, the diagram shows the process of calling a subroutine denoted as “P.” The MIPS implementation is illustrated as an example, but is not meant to be limiting in any way, and it is well understood in the field of computer science how to specify the return address of a subroutine in different architectures using a specific register. FIG. 1B is a diagram illustrating the $ra register used in the MIPS architecture to store the return address for a subroutine. MIPS assembly language includes an instruction for calling a subroutine, the “jal” or jump and link instruction. As illustrated in FIG. 1A, when the jal P is called at 104, the program jumps to the subroutine 102 named P at 108, i.e., the CPU starts executing the first instruction of P 110, and simultaneously stores the address of the following instruction, “R” in register $ra as illustrated at 112 in FIG. 1B.
In MIPS, once P has finished executing its instructions, which are illustrated by the doted lines at 114 of FIG. 1A, the instruction jr $ra is called at 116. This instruction will cause the CPU to jump to the address in the $ra register 112 of FIG. 1B, which as explained above is the return address, or the address of the instruction after the instruction to call P. Thus, when jr $ra is called, the program “jumps” to the instruction 106 after the instruction to call P as illustrated at 118 (i.e., the CPU begins executing that instruction), and continues executing the caller 100 of P.
Another convention for storing the return address is to place the return address in the architecture's in-memory stack when the subroutine is called. The stack is usually implemented as a contiguous area of memory with a pointer to the top of the stack. In some architectures, the top of the stack is the lowest address in use within the area and the stack will grow downwards in memory. In other architectures, the top of the stack is the highest address in use within this area, and the stack will grow upwards in memory. It is an arbitrary design choice whether the top of the stack is the lowest or highest address within this area, but the common convention is for the stack to grow downwards in memory. Examples of architectures using stacks to store the return address include x86, x86-64, and the Power architectures.
Below is an example illustrating the process of using the stack to specifying the return address of a subroutine. The particular architecture illustrated is the x86 architecture. However, the example is provided for illustrative reasons, and it is well understood in the field of computer science how to specify the return address of a subroutine using a stack in different architectures. FIG. 2A is a diagram illustrating the contents of a stack before a call to a subroutine. % esp 202 denotes the top of stack pointer which indicates the current top of the stack. The wn denotes that it would contain some word of memory. FIG. 2B is a diagram illustrating the contents of the stack after execution of the call instruction.
The x86 architecture includes assembly language instructions for calling a subroutine. An example of calling a subroutine denoted as “P” is illustrated in TABLE 1.
TABLE 1Call P//call the subroutine that begins at address SR <some instruction>; //the next instruction following the call is at address R.
When executed, this call instruction will (1) “push,” i.e., place, onto the memory stack the address, “R”, of instruction following the instruction to call P as illustrated at 206 of FIG. 2B. The call instruction will also (2) set the program counter (PC), which on the x86 platform is named % eip, to the address P as illustrated at 212 of FIG. 2C. The program counter is the register containing the address of the instruction in the program that is executing.
Now the subroutine that begins at address P executes. The subroutine may make use of the stack to hold temporary data or make further calls, thereby pushing more items onto the stack. When the subroutine that began at address P has completed and is ready to return, the stack must have returned to the state illustrated in FIG. 2B. To return, the subroutine P executes a return instruction as illustrated below in Table 2. This will “pop,” the topmost element from the stack and place it in the program counter, referred to as % eip. Thus, when Ret is called, it will set the program counter % eip equal to R, the return address, as illustrated at 228 of FIG. 2D. This will cause the CPU to begin executing the instruction at address R, i.e., the instruction following the instruction to call P The instruction will also update the top of stack pointer register so that the stack contents will return to the pre-call state as illustrated in FIG. 2A.
TABLE 2Ret:// return to the caller of this subroutine.
Once source code for a program has been written, the source code is compiled into machine-readable object code, also referred to as machine code, which can be understood by the computer. Object code is in the form of 1's and 0's. Subroutines which are created in human readable source code, are visible to the machine in readable object code once the source code has been compiled. Thus, information presented in terms of subroutines is both meaningful to humans, and machine-visible, which makes monitoring subroutines during runtime useful for a variety of reasons. For example, debuggers may stop the program for inspection at entry to or return from a given subroutine. Additionally, tracing infrastructures can record the value returned from the subroutine. It is well understood in the field of computer science that there are a variety of other reasons that it would be useful to monitor the entry to and return from a subroutine during runtime. Therefore, it would be useful to have a way to instrument the entry and return from a subroutine so that the subroutine may be monitored during runtime.
As is understood in the field of computer software, instrumenting and trapping the entry into a subroutine is straightforward because in typical high-level languages, subroutines are entered through a single instruction pointer. In other words, there is one instruction that is executed at the start of the subroutine, and the CPU jumps to that instruction every time the subroutine is called. Thus, the entry to the subroutine can be trapped by instrumenting the code to trap that instruction pointer.
Additionally, some architectures, such as the x86 and x86-64 architectures, offer debug registers which can be programmed to trap on execution of a particular physical or virtual address.
Trapping the return from a subroutine, however, presents special problems. For example, it is not feasible to decode forward in the instruction stream from entry to the subroutine to find the subroutine's return instruction for several reasons. A single subroutine may be compiled so that it contains multiple returns, making it difficult to determine at what instruction the subroutine will return to its caller during runtime. Therefore, it cannot be known before the subroutine is running when to stop scanning. Additionally, subroutines often contain branches, such as if then statements, that result in the subroutine executing different sections of code at run time. Thus, discovering the actual body of the subroutine that will run at a particular time is impossible before the subroutine is actually running Further, often when a code is compiled, the compiler will include read-only data interspersed with the subroutine's instructions. This data might look like return instructions and determining at run time what is a return instruction and what is data may be undecideable. Also, the compiler might not use, or the architecture may not provide a special instruction for returning from subroutines. The compiler might choose to instead implement subroutines with a memory or register indirect branch. For example, as described above, the MIPS architecture implements returns from subroutines with ordinary register-indirect control transfers.
It would be useful to have a means for trapping the exit to a subroutine. In particular, it would be useful to have a means for dynamically instrumenting the return from a subroutine in binaries, i.e., after the code has been compiled.