1. Field of the Invention
The present invention relates generally to computer program (i.e., software source code) compilers and more particularly to computer program compilers that perform optimizations.
2. Related Art
Compilation of a computer program consists of a series of transformations from source code through a series of intermediate representations and eventually resulting in binary executables. This series of transformations includes translation and lowering, inlining and linking, and optimizations at all levels.
Optimizing compilers attempt to analyze the software program code in order to produce more efficient executable code. Thus, aggressive compiler optimization is the cornerstone of modern microprocessors. Such compilers may perform one or more of the several types of optimizations which are known to those skilled in the relevant art(s) (e.g., dead code elimination, dead store elimination, branch elimination, partial redundancy elimination, procedure inlining, loop unrolling, etc.).
Ideally, each transformation of the computer program should be chosen to obtain maximum efficiency of the resulting executable binary code. Static analysis performed on the code at compile time can produce great improvements, but compile time indeterminacies prevent the attainment of maximum efficiency. Thus, dynamic analysis attempts to minimize the non-deterministic nature of compiler speculation by gathering information about the behavior of the program during run time.
Because compile-time indeterminacies prevent the attainment of maximum efficiency, one common optimization technique is feedback directed optimization. Feedback directed optimization involves compiling a program, executing it to generate profile (or “feedback”) data, and then re-compiling the program using the feedback data in order to optimize it. Such optimization often results from making frequently executed paths of the program (i.e., “hot spots” or “hot regions”) execute more quickly, and making other (i.e., less frequently executed) paths execute more slowly. In essence, feedback data measures dynamic data use and control flow during sample executions in an attempt to minimize the non-deterministic nature of compiler speculation.
Two methods exists for identifying these so-called “hot regions” of a program—sampling and instrumentation.
First, sampling is the periodic measurement of the targeted computer's register contents during execution of the program binary. Sampling thus identifies where most of the time is spend during execution of a particular computer program. The drawback of sampling, however, is that it only obtains information about the behavior of the binary. It is difficult to translate this data into information about the source code or any of the series of intermediate representations that are produced during the compilation process. An example of sampling-based code analysis is described in detail in Jennifer M. Anderson et al., “Continuous Profiling: Where have all the cycles gone?”, ACM Transactions on Computer Systems, pp. 357–390 (November 1997), which is incorporated herein by reference in its entirety.
Second, instrumentation is the insertion of extra instructions into the code during compilation in order to collect information at run time. The disadvantages of instrumentation follow from its intrusive nature. That is, instrumentation increases code size and slows down the binary execution time. Also, instrumentation may alter program behavior. From the compiler's standpoint, however, the primary advantage of instrumentation is that it provides data about the behavior of the code as it is represented at the moment of instrumentation. Conventional instrumentation schemes typically instrument the binary code of a computer program, although the source code and the intermediate representations can also be instrumented. Instrumentation provides run-time measurements for a snapshot of the code taken at the point during compilation that instrumentation was performed. An example binary instrumentation scheme is described in detail in Amitabh Srivastava and Alan Eustace, “ATOM: A System for Building Customized Program Analysis Tools,” ACM SIGPLAN Notices, 29(6), pp. 196–205 (June 1994), which is incorporated herein by reference in its entirety.
Sampling is faster than instrumentation and can be performed by the hardware. However, once run-time data has been obtained for a particular snapshot of the code representation, maintaining and updating that data to reflect any later compiler transformations is generally less difficult than projecting data obtained from sampling backwards through the compilation transformations.
Compilers typically represent any given program (i.e., the immediate representation) by a control flow graph. Compiler optimizations performed on the program result in changes to the flow graph, so that the program is represented by many different, though semantically equivalent, flow graphs during the compilation process. The problem, however, is that feedback data gathered during sample runs corresponds to only one instance of the program's flow graph. As a result, feedback data, when mapped to the flow graph representation, is correct at only one point during compilation.
When optimizations result in transformations of the flow graph, the associated feedback data often cannot be precisely and correctly updated in the new flow graph. As the compiler continues to optimize, the discrepancies worsen, hence defeating the original intent to deterministically measure the program flow.
Therefore, what is needed is a method and computer program product, within an optimizing compiler, for precise feedback data generation and updating for compile-time optimizations. The method and computer program product should perform instrumentation on the source code of the computer program and maintain accurate feedback data especially when dealing with inlined procedures in such programming languages as C++ and when code is cloned during ceratin optimizations.