Instrumentation is a technique for program analysis tasks such as profiling, performance evaluation, and bottleneck analysis, and for software engineering tasks such as bug detection, enforcing application programming interface (API) compliance, and finding hot logic and dead logic. Due to runtime overhead, performance evaluation may slow program execution, which may distort execution timing, which may cause concurrency malfunctions such as race conditions. For concurrent programs, an exact ordering of events may be preserved, which may further slow program execution but facilitates debugging race conditions or other dynamic conditions.
Instrumentation can be done at various stages: statically at compile/link time or dynamically at runtime. Instrumentation frameworks may cause extra logic to be inserted and executed along with an application to monitor and observe behavior of the application. Existing instrumentation frameworks can be either static or dynamic.
With static instrumentation, the compiler inserts extra logic to instrument an application at compile time or at link time. Examples include gprof and gcov functionalities in the gNU (“not Unix”) compiler collection (GCC). Another way to statically instrument the application is to use a binary rewriter to rewrite the application after the full application is built. Some of the advantages that come with static instrumentation tools are low runtime overhead and a more optimized instrumented binary owing to the additional information available at compile time or link time. However, there are several limitations to that approach and, although the compiler has more information about what and where to instrument, that approach requires the entire source code for an application including system libraries to be compiled by the compiler. Similarly, binary rewriters also need to rewrite shared system libraries, which does not scale for multiple tools using a same profiling framework.
With dynamic instrumentation, a tool or driver inserts extra logic into an application at runtime. This approach usually does not require source code for application or library to instrument the application since it works with running code directly. However, since dynamic instrumentation works at runtime, it does not have full information about the running program (because building was somewhat lossy) and must work at an instruction sequence level, which is very invasive. Dynamic instrumentation also requires an additional process to monitor and instrument the running program. Dynamic instrumentation may disturb or destroy the concurrency of the running program because the instrumentation logic is typically executed sequentially.
Since inserting extra logic into an application hurts performance, dynamic instrumentation frameworks are typically implemented using a JIT (Just-In-Time) compiler. Intel's Pin is one such framework. Pin is a popular dynamic instrumentation framework for general purpose programming environment that JITs X86 binary logic as it inserts instrumentation logic during the runtime. Although Pin is a dynamic instrumentation framework, it is infeasible for low power (capacity) embedded processors since JIT requires too much processing power. In particular, Pin and similar dynamic instrumentation frameworks have several drawbacks for low power embedded processors:                Virtual Memory/Process abstraction Requirement: Pin requires at least two execution processes and support from the operating system to implement various functionalities.        Memory Requirement: Since all instructions are instrumented by Pin, and Pin stores all the instrumented instructions in memory for efficiency, too much memory is needed. This might not be a problem for some systems or servers but poses a serious limitation for memory-constrained systems.        Compute Requirement: Pin requires a powerful processor since the JIT runs on the same processor as the application and is very compute heavy. Running JIT on a low power, embedded processor interferes with the application execution itself and is infeasible.        Not scalable to multi-core systems: Since the logic cache and instrumentation logic run sequentially in different processes and memory spaces, parallelism in the application is compromised and/or corrupted.        