The present disclosure relates to the field of computers, and specifically to the use of hardware interrupts to drive dynamic binary code recompilation.
Dynamic binary code recompilation or dynamic recompilation is a feature of some emulators and virtual machines in which a Data Processing System (DPS) may recompile parts of a computer application during execution. For instance, Java Virtual Machines (JVMs) (JAVA and JVM are trademarks of Sun Microsystems, Inc.) use dynamic recompilation to significantly improve the performance of Java applications. By compiling during execution, the DPS can (i) tailor the generated code to reflect the computer application's run-time environment and (ii) produce more efficient code by exploiting information that is unavailable to a traditional static compiler.
Dynamic recompilation systems typically instrument (i.e., insert instrumentation code) to monitor the application that is currently executing. For example, FIG. 1 illustrates an exemplary instrumentation system 100 which applies a typical instrumentation approach. In the exemplary case shown in FIG. 1, the instrumentation system is designed to measure Basic Block (BB) frequencies. A BB is the largest unit/block of code before a branch in execution. Original method 102 includes basic blocks “BBentry”, “BB0”, “BB1”, and “BB2”.
Utilizing a statistical sampling approach, an optimizer (not shown) generates cloned program method 102 of original program method 104 that is being optimized, and instruments each cloned BB (e.g., “BB0′”, “BB1′”, “BB2′”) by inserting profiling counters 106. Profiling counters 106 are in the form of instrumentation code that keeps track of BB frequencies. When a particular BB is executed, the profiling counter 106 that is associated with the particular BB is incremented. The optimizer inserts a branch instruction/code in the original program method 104. The branch instruction causes the program execution to jump (represented by arrow 108) under certain instances of execution to cloned program method 102. Since the jump in execution occurs occasionally (i.e., the original program method is usually executed), the performance penalty associated with the instrumentation code is mitigated. Moreover, such a typical instrumentation approach is implemented for coarse measurements such as determining block frequencies, which can contain a considerable number of lines of code which are counted as a basic block.
In contrast to the aforementioned profiling counters, which reside in the software, other types of counters, known as Hardware Performance Monitors (HPMs) reside in the hardware. An HPM provides comprehensive reports of events that facilitate improved performance on DPSs. In addition to the usual timing information, an HPM is able to gather hardware performance metrics, such as the number of branch mispredictions, the number of misses on all cache levels, the number of floating point instructions executed, and the number of instruction loads that cause Translation Lookaside Buffer (TLB) misses, which help the algorithm designer or programmer identify and eliminate performance bottlenecks. Although it is possible to employ hardware performance monitors to drive dynamic recompilation, one drawback of today's hardware performance monitors is their lack of fine-grained measurement support. Such fine-grained support is needed to re-optimize the computer program at the instruction-level granularity.
For example, instead of capturing information about a single, individual instruction, current HPMs merely summarize information, such as the number of cache misses in a code region. One approach is to shrink the code region of cache misses to the granularity of a single instruction such that the system could gather instruction-level miss information. However, such an approach would be expensive given existing interfaces between the processor and the HPMs. Moreover, such an approach presents difficulties for an out-of-order execution processor, where for example, several data storage operations can be in flight at any given time. As a result, any one of the in-flight data storage operations/instructions becomes very difficult to be singled out as an offending instruction.
Another existing approach employs a “pull” approach to how data is communicated to the dynamic optimization system. Under a pull approach, the dynamic optimization system allocates a thread for polling. The execution threads communicate with the polling thread via data storage, or in some cases via the hardware performance counter registers, as described above. The polling thread then determines when recompilation might be beneficial.
Typically, interrupts are handled by an operating system (OS), which can incur a significant performance penalty. If additional hardware support were included to ensure that hardware interrupts were thrown to drive code recompilation/re-optimization for frequently executed and problematic instructions, then the overhead of handling interrupts would not be of paramount concern. However, in the absence of such additional hardware support, a more efficient mechanism is required.