Analyzing the dynamic behavior and performance of a complex software system is difficult. Typically, analysis of a software system is achieved by gathering data at each system call and post-processing the data. Data is gathered at each function call by placing a probe at locations of interest in the software (i.e., instrumenting the software to obtain an instrumented program) and gathering data when the probe is encountered by the thread executing the instrumented program.
Probe points are typically implemented in the instrumented code as trap instructions. The location (i.e., address) of each trap instruction is stored in a look-up table and associated with an original instruction (i.e., the instruction that is replaced by a trap when the program is instrumented).
When a thread executing the instrumented program encounters a trap instruction, control is transferred to a trap handler, which calls into the tracing framework and performs the actions associated with the corresponding probe. The trap handler then looks up the original instruction in the look-up table. The trap instruction is then overwritten by the original instruction (i.e., the original instruction is placed back in its original location within the code path replacing the trap instruction that was just executed). The tracing framework then single-steps the original instruction (i.e., the original instruction is executed and then control is returned to the kernel). The original instruction in the code path is then overwritten by the trap instruction that was originally encountered by the thread. The thread then resumes executing the instrumented program
In a system in which more than one thread is executing within a given instrumented program, a particular thread may not trigger a probe (i.e., execute a trap instruction) if another thread has executed the trap and is in the process of single-stepping the original instruction. This situation typically occurs when a first thread encounters the trap instruction and overwrites it with a corresponding original instruction, and while this is occurring, a second thread encounters the original instruction. In this scenario, the first thread calls into the tracing framework to perform the actions associated with the probe, while the second thread executes the original instruction and so does not enter the trap handler and does not call into the tracing framework. The aforementioned method for instrumenting a program is typically referred to as “lossy” (i.e., all the requested tracing information is not obtained because, in certain scenarios such as the one described above, a probe within a give code path may not be encountered by all executing threads).
Alternatively, the original instructions may be replaced with a reserved trap instruction, and when a thread executing the instrumented program encounters the reserved trap instruction, all threads executing in the instrumented program are suspended while the thread that caused the trap single-steps the original instruction, which is temporarily written over by the trap instruction, as defined above. Note that by suspending all the threads executing when the trap is encountered by one of the threads, the execution of the tracing framework is effectively serialized, which can perturb the effects under observation. After the thread has single-stepped the original instruction, the instruction that was encountered by the thread is copied back over the original instruction in the code path. All threads executing in the instrumented program then resume executing the instrumented program. The aforementioned method for instrumenting a program is typically referred to as “lossless” (i.e., all the requested tracing information is obtained because the threads executing the instrumented program encounter all the probes in the code path in which they are executing).
Every location in a computer's memory is given an address. The content at a given memory location can be accessed by specifying the address (known as addressing). In pc-relative addressing, the address of the desired memory location is computed using the current value of the program counter (PC). The AMD64 instruction set architecture developed by Advanced Micro Devices (AMD) Corporation (Sunnyvale, Calif.), and the Nocona architecture developed by Intel Corporation (Santa Clara, Calif.) are both examples of architectures that support pc-relative addressing.