1. Field of the Invention
The present invention generally relates to compilation processing of computer code. More specifically, a modification to the conventional NET (Next-Executing-Tail) selection provides a new selection referred to as N-E-C (Next-Executing-Cycle) selection, in combination with an adaptive trace selection mechanism, permits production of high quality traces without excessive code explosion.
2. Description of the Related Art
A compiler is a computer program that transforms source code written in a programming language, the source language, into another computer language, the target language, often having a binary form known as object code. Compilation is typically executed to create an executable program from a computer program stored in higher level language.
For compilers, trace-driven systems are becoming important in binary translation systems and dynamic optimizers. Examples include Dynamo, a binary optimizer; TraceMonkey, a trace-based JIT (Just In Time compiler) for Javascript (in Firefox v3.5); and DynamoRlO, a dynamic software translator.
Trace-based code optimizers compile traces (sequences of executed instructions formed at runtime) instead of methods. In the context of the present invention, the term “methods” is used to broadly refer to functions, subroutines, and methods that are defined in conventional programming languages. Traces are easier to optimize due to the simple topology, which is mostly linear. Traces can naturally incorporate runtime properties, making it an attractive approach to optimize highly dynamic languages.
Trace selection is the process to form traces out of executed instructions at runtime and is an active area of research and patent activities because formation of traces is key to the effectiveness of a trace-based system.
The ability of a trace-driven language compiler to produce quality codes depends heavily on the traces formed by its selection engine. The present invention describes a trace selection method to form quality traces that are amiable to effective code optimizers while avoiding excessive code explosion and still being able to achieve a high trace cache hit-ratio.
In trace-based compilation or dynamic translation systems, a trace is treated as a unit of compilation and optimization. A trace is a single-entry, multiple-exit entity. Trace selection is the process of forming traces, typically a sequence of instructions or basic blocks, out of hot execution paths at runtime. Once a trace is compiled, generated binary code is placed in a memory area called a “trace cache.” When the interpreter reaches the entry address of a compiled trace (trace-head), the interpreter transfers control to the generated binary code in the trace cache. When the program reaches the end of the generated binary (trace-exit), or branches out from the middle of a compiled trace (side-exit), it resumes the interpreter to execute or jumps into the next compiled trace if it exists in the trace cache.
Many trace selection algorithms follow the two-step approach pioneered by the next-executing-tail (NET) selection in Dynamo and in U.S. Pat. No. 6,470,492 to Bala, et al, referred to as “NET-like trace selection”, as exemplarily shown in the flowchart 100 of FIG. 1.
In a first step 101, trace-head selection identifies the likely starting point of a hot region using a lightweight mechanism. It starts with some heuristics to identify an initial set of targets 103 as potential trace-heads to monitor. The identified target is allocated a hotness counter 104, which is incremented every time the target is encountered.
One important set of potential trace-heads are loop headers or targets of backward branches since program execution time is mainly spent in loops. Another important set of potential trace-heads are ones that follow the exit of an already formed trace (exit-heads). The ability to select trace-heads out of exit-heads ensures that execution that does not directly originate from the initial set of targets (e.g., loops) have the chance to be captured into a trace. Trace-heads are selected from potential trace-heads 103 when their associated counters 104 exceed a predefined threshold.
In the second step 102, trace recording is triggered to form a trace immediately after a hot trace-head is identified. The mechanism records every instruction or every branch following the execution of the selected trace-head until an end-of-a-trace condition is met. While the speed and extent of a trace selection algorithm to capture the program working set is largely determined by trace-head selection, the length and shape of the traces being formed are largely determined by the end-of-a-trace conditions.
Three end-of-a-trace conditions (or their more restrictive forms) are commonly present in prior trace selection algorithms, as demonstrated by US Patent Application Publication No. 2007/0079293, now U.S. Pat. No. 7,694,281, to Wang, et al., where the algorithm stops recording a trace:                1. When encountering the head of an already formed trace; or        2. When detecting a likely cycle in trace recording, e.g., when revisiting an entry already recorded in a trace buffer, or when encountering a possible backward taken branch; or        3. When the trace recording buffer exceeds a pre-defined length threshold.        In the following discussion explaining the present invention, the first termination condition identified above is referred to as the “stop-at-a-trace-head” condition. This condition has a known limitation that the present invention aims to address.        
Trace grouping is a step to group single traces to form larger regions for a compiler to process. Trace grouping is a post-processing step of a trace selection engine as runtime recording can only produce linear traces (called single traces). U.S. Patent Application Publication No. 2007/0226700 to Gal, et al, demonstrates the construction of tree structures on top of single traces that resemble nested loops (called Trace-Tree selection).
A current limitation of Trace-Tree selection is that it can only capture computation in loops that are reasonably sized. As such, trace-tree selection may not achieve a very high coverage for codes that are not loop intensive. Another drawback of trace grouping is that it adds complexity to the optimizer because grouped traces are no longer single-entry, multiple-exit entities. If optimized as a group, the optimizer can no longer exploit the simple topology of traces. If traces are optimized individually, secondary traces of a trace tree are fragmented as in the NET approach because they are cut short by the head of the primary trace.
Application-specific trace selection defines an application-specific point to terminate a trace. For example, PyPy's Tracing JIT only terminates a trace when it has traced one loop iteration in a user-define Python program such that the trace essentially unrolls the interpreter loop.
The drawback is that such kind of tracing requires special knowledge of the application being traced. For PyPy, it traces a python interpreter that is written in a restricted subset of Python.
Path-profiling-based trace selection profiles branch histories (or execution paths) and identifies hot traces based on occurrences of the recorded branch histories.
U.S. Pat. No. 6,351,844 to Bala uses a translator to interpret the program and generate branch history data that includes a starting address and a branch history sequence. The branch history sequence records every taken branch from the starting address until it encounters a backward taken branch or an indirect branch. A branch history data is selected as a hot trace if within a time frame the occurrence of the branch history data exceeds a predefined threshold. The selection algorithm also combines traces to form cyclic traces.
US Patent Application Publication No. 2005/0081107 to DeWitt, et al., and U.S. Pat. No. 6,647,491 to Hsu, et al., use special hardware to record branch history and form traces based in software on branch history information.
When profiling is done in software, path profiling is more expensive than counter profiling for trace-heads used in NET approach. Most software translation and optimization systems today follow the NET approach. When profiling is done in hardware, the selection algorithm applies to binary traces only and requires special hardware support.
The present inventors have recognized that there are drawbacks of these known solutions.
More particularly, one drawback of NET-like trace selection algorithm, as recognized by the present inventors, is the stop-at-a-trace-head termination condition described above. This condition was introduced to avoid duplication across traces and to capture most of the execution quickly into traces which is required by most binary optimizers and translation systems.
However, when applying the trace-driven approach to a language dynamic compiler (that is, a compiler that compiles language intermediate representation directly to binaries), the ability to form quality traces becomes much more important as better formed regions can be translated into more efficient codes.
With the stop-at-a-trace-head termination condition, many traces are prematurely cut short. For instance, when a trace is formed at the entry-point of a method, stop-at-a-trace-head condition prevents any subsequent traces from “inlining” part of the method into the trace. This is important for object-oriented languages such as Java, C++, Python, and others, where programming styles encourages writing small methods which are called from multiple places.
Therefore, an exemplary objective of the present invention is to provide a selection algorithm that can produce quality single traces that can be effectively optimized by language compilers. The qualities being aimed at are longer traces and traces that cover an entire path through a method invocation or an entire cyclic path through a loop. These qualities provide higher efficiency in compilation.
Thus, the present inventors have recognized that a need exists for improving efficiency of a trace-driven language compiler and have developed a solution that modifies the conventional NET-like compilers in the manner explained below to arrive at an adaptive mechanism that reduces a disadvantage of this conventional method while utilizing its advantage.