Field of the Invention
This invention relates to the field of optimizing compilers. Specifically, this invention is a method, apparatus and computer program product for providing a high level language compiler with the capability to re-optimize a previously compiled binary executable.
Background
FIG. 1 illustrates a prior art optimizing compiler, indicated by general reference character 100, for compiling a source program to create an optimized binary executable. The compiler 100 consumes source information 101 through a compiler front-end segment 103. The compiler front-end segment 103 processes the syntax and semantics of the source information 101 according to the rules of the programming language applicable to the source information 101. The compiler front-end segment 103 generates at least one version of an intermediate code representation 104 (IR) of the source information 101. The intermediate code representation generally includes data structures that either represent, or can be used to create, data dependency graphs (DDGs) and execution flow graphs. The intermediate code representation 104 is then optimized by an intermediate representation optimizer segment 105. The intermediate representation optimizer segment 105 operates on, and adjusts, the intermediate code representation 104 of the source information 101 to optimize the execution of a program in a variety of ways well understood in the art. The intermediate representation optimizer segment 105 generates an optimized intermediate representation 106. A code generator segment 107 consumes the optimized intermediate representation 106, performs low level optimizations, allocates physical registers and generates binary module 109 (and conditionally assembler source code) from the optimized intermediate representation 106. The binary module comprises binary computer instructions (binary code) in a module that can be linked with other modules to create a binary executable. The assembler source code is a series of symbolic statements in an assembler source language. Both the assembler source code and the binary code are targeted to a particular computer application binary interface (ABI).
DDGs embody the information required for an optimizer to determine which statements are dependent on other statements. The nodes in the graph represent statements in a programmed block and arcs represent the data dependencies between nodes. In particular, the scope of a variable extends from a "def" of the variable to a "use" of the variable. A "def" corresponds to an instruction that modifies a variable (an instruction "defines" a variable if the instruction writes into the variable). A use corresponds to an instruction that uses the contents of the variable. For example, the instruction "x=y+1;" "defs x" and "uses y". An arc in the DDG extends from the def of a variable to the use of the variable. DDGs are described in chapter 4 of Supercompilers for Parallel and Vector Computers, by Hans Zima, .COPYRGT. 1991, ACM press, ISBN 0-201-17560-6, 1991.
As mentioned above, the code generator segment 107 performs low level optimizations and generates either (or both) binary code or assembler source code. The intermediate representation of the program generally references virtual registers. That is, the intermediate representation optimizer assumes that the target computer contains an unlimited number of registers. During the operation of the code generator segment 107, these virtual registers are assigned to the physical registers of the target computer. This resource management is performed in the code generator segment 107 by a register allocation (expansion) process.
One aspect of the register allocation process is that the contents of physical registers are often "spilled" to memory at various points during the execution of the program so that the limited number of physical registers can be used to hold values of more immediate relevance to the program at those various points. Those values that are spilled to memory are often restored to the registers when the program advances to different points of execution. An example of register allocation and register spilling techniques is provided in Compilers: Principles, Techniques and Tools by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman, Addison-Wesley Publishing Co. .COPYRGT. 1988, ISBN 0-201-10088-6, pages 541-546.
Execution flow graphs represent the sequence of operations executed by the program. Information can be included on the graph's edges to provide scheduling information such as dependency information, frequency of execution information or other information that is useful for optimizing the operations represented by the execution flow graph.
Software pipelining is a technique for scheduling the execution of instructions. In the cage of simple basic block loops, software pipelining schedules different overlapping iterations of the loop body to exploit a computer's underlying parallel computation units. The execution schedule includes of a prologue, a kernel, and an epilogue. The prologue initiates the first p iterations thus starting each iteration, A steady state is reached after the first p*II cycles, where II is the initiation interval where each initiated iteration is executing instructions in parallel. In this steady state or kernel, one iteration of the loop is completed every II cycles. Once the kernel initiates the last iteration in the loop, the epilogue completes the last p iterations of the loop that were initiated by the kernel. Often the instruction schedule requires that a particular instruction be initiated after some delay--thus, unfilled instruction slots in the instruction schedule are filled with "no-operation" (NOP) instructions.
Computer manufacturers often make a family of computers with similar architectures. One problem for both computer manufacturers and computer application developers is the conflict between the desire of computer manufacturers to provide more powerful computers with extended capabilities and that of the program application developers who tend to optimize an application to execute on the largest number of computers of a particular family. Although the models in the architecture family are similar, they often have differences. (For example, the SPARC.TM. architecture includes different application binary interfaces (ABIs), numbers of pipelines, and other differences between the V8, V8+, and V9 SPARC based models.) These differences are generally the result of cost/performance trade-offs or new architectural features added to later models. Commercial applications are generally compiled and optimized to execute using only the capabilities of the architecture that are shared by each model of the architecture family. Thus, the application does not use the advanced capabilities available to the more advanced models. Because application developers generally do not provide source code, the user of the application is unable to optimize the application to take advantage of the additional features of the more advanced models--thus, the application will not perform as efficiently as if it were optimized specifically for the computer model that executes the application.
Another problem is that compiler optimization generally assumes that each execution flow path is equally likely to be executed during the operation of the application. This means that the compiler does not optimize the execution flow path according to how the program actually operates. However, applications can be instrumented to capture an execution profile when operating on a particular data set. This profile information could be captured, and used to optimize an application specifically for use with that particular data set. In addition, some computer architectures provide memory performance information, such as cache-miss information for the memory caches. This information could also be used to optimize memory organization for a particular data set and usage pattern. These optimizations could include restructuring and rescheduling the code, using pre-fetch instructions and non-faulting loads, if the result of a branch instruction can be predicted, and other optimizations based on how the application performs when it executes. However, because application developers do not provide the application's source code for the user to compile, these optimizations are not available to the user of the application.
Yet another problem is that operating system facilities sometimes become obsolete. These obsolete services are usually retained and can be invoked by an obsolete operating system service invocation, but newer more efficient services are also provided. These newer services are invoked by preferred operating system service invocations. An application compiled to use the obsolete services cannot use the newer services.
Thus, it would be advantageous to provide a high level language compiler with the capability to re-optimize a binary executable, originally not optimized, partially optimized, or optimized for a particular computer system, so that the binary executable is optimized for another targeted computer system.