In order to execute computer software written in a high level language, there are two generally accepted means. The source code may be first compiled by using a compiler into the object code (machine executable instructions) which can then be executed on a specific hardware (or simulated hardware) platform. Examples of computer language which typically involve compilation are C or C++ and Fortran. Alternatively, the source code may be read by an interpreter, one line at a time, which directly causes the underlying hardware platform to carry out the instructions. LISP is one example of an interpreted computer language.
Some other computer program languages may be subject to a hybrid approach for achieving portability and performance. For example, Java™ has gained popularity as a computer language for producing “write once, run anywhere” software, in addition to its object oriented nature. (Java is a trademark of Sun Microsystems, Inc. in the United States, other countries, or both.) Java source code is first compiled into bytecode, an intermediate representation of the software, on any platform. This bytecode can then be taken to another platform for execution. At the second platform, Java byte-code is interpreted by a Java virtual machine (JVM), which supports Java components on the platform with basic Java functionality. The interpretation process causes execution of the program to be slow. What has become widely available to improve the performance of Java code execution is to use a Java Just-in-Time compiler (JIT compiler). This compiler will then convert bytecode into native code which can be immediately executed on the platform. The term JIT is also used to describe any runtime compiler in a virtual machine, even if it is used selectively to compile some Java class methods while others are interpreted (see later).
Virtual machines (VM), such as the Java virtual machine, execute the program code (bytecode) dynamically and typically incorporate both an interpreter and an optimizing JIT compiler to speed-up the program execution. When the JIT compiler is included in the VM package it starts optimizing the “hot” program code, that is, the program code that executed some given number of times. If an interpreter is present in the VM package it usually is the first component that starts executing the program code, and if the code is “hot” enough it invokes the JIT compiler to compile the executing method (segment of the program code, also denoted in this document as “code method”). Once a segment of the program code is compiled, the VM uses the compiled version to speed-up program execution.
The interpreter is slower executing the code than a JIT compiled version of the same code, but it doesn't need time to compile and optimize the bytecode. Therefore, running the interpreter and JIT for compiling code involves a tradeoff in performance, i.e. for methods that execute infrequently it is more efficient to interpret, since the compilation overhead is higher than the gains obtainable from better optimized code. On the other hand, for methods that run frequently the JIT compiler should be employed, and thereby results in gain on performance in the long run. The definition of “hot” code, that is, code that is frequently executed is dependent on the VM implementation.
In general a typical VM, containing both a JIT compiler and an interpreter, runs in a mixed mode execution environment. For each piece of code that is executing, the VM knows whether the code is compiled (previously JIT compiled) or in raw bytecode format. When the code is called to be executed, the VM decides what to invoke, e.g. call the JIT compiled binary version of the code or call the interpreter component to process the raw bytecode.
The optimizing JIT compiler performs a number of different optimizations on the program code which heavily depend on the amount of time the JIT compiler is allotted to spend optimizing the code. In general the more the optimizing compiler knows about the program execution, the better code it can produce. For example, if the compiler knows the internal program execution flow, which conditional branches were taken and which weren't, it could lay out the code so that the most common flow path is favored. The code layout is usually very important for maximal program execution performance, reducing the CPU branch mis-prediction rate and improving the CPU instruction cache locality.
Since the “hot” code segments are usually compiled almost immediately after the program starts executing, the performance of the interpreter component of the VM becomes less important. Therefore, a moderate slowdown in interpreter performance is acceptable if it results in superior performance in JIT compiled code. In particular, the interpreter component can be modified to collect some information about the program execution. This information will later be used by the JIT compiler to produce better code and better performing program in the long run. This process, called profiling, selects a set of inputs for a program, executes the program with these inputs, and records the run-time behavior of the program. By carefully selecting inputs, one can derive accurate estimate of program run-time behavior with profiling.
One of the biggest challenges VM designers face when designing profiling frameworks is how to efficiently collect and use the profiling data so as to not affect application performance. For example, if the interpreter took a long time to collect the profiling information, or if it took a long time to process the interpreter collected data, the initial start-up performance of the application would be severely impeded. Another important aspect of the data collection process is the memory footprint overhead: the data collected by the profiler could take up a significant amount of memory. On the other hand, having more profiling information and more precise information is the key to better code and better run-time execution performance.
Many of the present dynamic or static profiling frameworks are collecting only one type of information, restricting the profiling on branches and calls. This is usually an artifact of their implementation and it is hard to change the framework to collect more complex dataset of information. It would be advantageous to collect arbitrary profiling data about the program execution.