The goal of compiler optimization is to improve program performance. Ideally, an optimization improves performance for all programs, but some optimizations can also degrade performance for some programs. Thus, it is sometimes acceptable for an optimization to improve performance on average over a set of programs, even if a small performance degradation is seen for some of these programs.
This often leads to aggressive optimizations, which can produce substantial performance gains and degradations, turned off by default in production compilers because it is difficult to know when to choose these optimizations, and the penalty for a wrong decision is high.
Therefore, developing a compiler involves tuning a number of heuristics to find values that achieve good performance on average, without significant performance degradations. Today's virtual machines (VMs) perform sophisticated online feedback-directed optimizations, where profile information is gathered during the execution of the program and immediately used during the same run to bias optimization decisions toward frequently executing sections of the program. For example, many VMs capture basic block counts during the initial executions of a method and later use this information to bias optimizations such as code layout, register allocation, inlining, method splitting, and alignment of branch targets. Although these techniques are widely used in today's high performance VMs, their speculative nature further increases the possibility that an optimization may degrade performance if the profile data is incorrect or if the program behavior changes during execution.
The empirical search community takes a different approach toward optimization. Rather than tuning the compiler to find the best “compromise setting” to be used for all programs, they acknowledge that it is unlikely any such setting exists. Instead, the performance of various optimization settings, such as loop unroll factor, are measured on a particular program, input, and environment, with the goal of finding the best optimization strategy for that program and environment. This approach has been very successful, especially for numerical applications. Architectures and environments vary greatly, especially for Java programs, and tuning a program's optimizations to its running environment is critical for high performance.
The majority of the empirical search has been performed offline. With the rich runtime environment provided by a VM, it becomes possible to perform fully automatic empirical search online as a program executes. Such a system could compile multiple versions of each method, compare their performance, and select the winner. Examples of such an online system are the Dynamic Feedback and ADAPT (M. J. Voss and R. Eigemann, “High-level Adaptive Program Optimization with ADAPT. ACM SIGPLAN Notices, 36(7):93-102, July 2001) systems, and the work by Fursin et al. (see G. Fursin, A. Cohen, M. O'Boyle, and O. Temam, “A practical method for quickly evaluating program optimizations” in Proceedings of the 1st International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2005), number 3793 in LNCS, pages 29-46. Springer Verlag, November 2005).
The most significant challenge to such an approach is that an online system does not have the ability to run the program (or method) multiple times with the exact same program state. Traditional empirical search, and optimization evaluation in general, is performed by holding as many variables constant as possible, including: 1) the program, 2) the program's inputs, and 3) the underlying environment (operating system, architecture, memory, and so on). In an online system, the program state is continually changing; each invocation of a method may have different parameter values and different global program states. Without the ability to re-execute each optimized version of the method with the exact same parameters and program state, meaningful performance comparisons seem impossible. Current online systems do not provide a solution for general optimizations to address the issue of changing inputs or workload. See P. C. Diniz and M. C. Rinard, “Dynamic feedback: An effective technique for adapting computing,” ACM SIGPLAN Notices, 32(5):71-84, May 1997 in Conference on Programming Language Design and Implementation (PLDI). See also G. Fursin, A. Cohen, M. O'Boyle, and O. Temam, “A practical method for quickly evaluating program optimizations,” in Proceedings of the 1st International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2005), number 3793 in LNCS, pages 29-46, Springer Verlag, November 2005; and M. J. Voss and R. Eigemann, “High-level adaptive program optimization with ADAPT,” ACM SIGPLAN Notices, 36(7):93-102, July 2001.
Compilers, virtual machines, applications servers, software libraries, and software systems in general often have a number of “tuning knobs,” or parameterized values that can be adjusted or “tuned” to improve performance. In compilers, examples include loop unrolling depth and method inlining heuristics. In application servers, examples include thread pool sizes and max network connections. Often it is difficult to identify the optimal value for these knobs, and the best value can vary depending on a number of factors, such as the hardware, operating system (OS), VM, workload on the machine, benchmark, or benchmark input. All high-performance VMs use a selective optimization strategy, where methods are initially executed using an interpreter or a non-optimizing compiler. A coarse-grained profiling mechanism, such as method counters or timer-based call-stack sampling, is used to find the JIT compiler.
VMs use the JIT compiler's multiple optimization levels to tradeoff the cost of high-level optimizations with their benefit; when a method continues to consume cycles, higher levels of optimization are employed. Some VMs, like J9, will compile the hottest methods twice: once to insert instrumentation to gather detailed profile information about the method, and then a second time to take advantage of the profile information after the method has run for some duration. Thus, in modern VMs a particular method may be compiled many times (with different optimization strategies) during a program's execution. Overhead is kept low by performing such compilations on only a small subset of the executing methods, and often by compiling concurrently using a background compilation thread.
Even in cases where there is a single best value for a given environment, it may never be discovered because so many of these knobs exist and they require developer effort to perform the offline tuning. Even in production systems these knobs are often set using a “best guess,” or minimally tuned value and there is generally little time or incentive for a developer to expend effort in searching for a best case.
It would be ideal to tune these knobs at run time, but this approach has a number of challenges. The optimization could be tried with multiple tuning values, and the highest performing version could be selected. However, the biggest problem with this approach is that, unlike in offline tuning, the program cannot be stopped and restarted to execute the same code multiple times with the same input for comparing performance. The online system continues executing new code, thus making performance comparisons of two versions difficult.
One approach would be to “sandbox” the execution. Sandboxing refers to taking a snapshot of the program state, and re-executing a particular program path multiple times to evaluate the impact of different tuning knobs, or different optimization decisions. However, this is difficult to engineer because executing the code changes the program state; one cannot always execute code repeatedly without having unacceptable side effects. In addition, executing in a sandbox may skew the performance, even if just slightly, making it hard to predict subtle performance anomalies. Therefore, there is a need for a method and system that overcomes the above shortcomings of the prior art.