The present invention relates to a method and apparatus for the dynamic optimization of computer programs.
The source code of a computer program is generally written in a high-level language that is humanly readable, such as FORTRAN or C. The source code is translated by a compiler program into an assembly language. The binary form of the assembly language, called object code, is the form of the computer program that is actually executed by a computer. Object code is generally comprised of assembly language or machine language for a target machine, such as a Hewlett-Packard PA-RISC microprocessor-based computer. The object code is generally first produced in object code modules, which are linked together by a linker. For the purposes of the present invention, the term xe2x80x9ccompilexe2x80x9d refers to the process of both producing the object code modules and linking them together.
Because computer programs are written as source code by humans, it is usually not written in a way to achieve optimal performance when eventually executed as object code by a computer. Computer programs can be optimized in a variety of ways. For example, optimizing compilers perform static optimization before the code is executed. Static optimization can be based on particular rules or assumptions (e.g., assuming that all xe2x80x9cbranchesxe2x80x9d within a code are xe2x80x9cTakenxe2x80x9d), or can be profile-based. To perform profile-based optimization (xe2x80x9cPBOxe2x80x9d), the code is executed under test conditions, and profile information about the performance of the code is collected. That profile information is fed back to the compiler, which recompiles the source code using the profile information to optimize performance. For example, if certain procedures call each other frequently, the compiler can place them close together in the object code file, resulting in fewer instruction cache misses when the application is executed.
Dynamic optimization refers to the practice of optimizing computer programs as they execute. Dynamic optimization differs from static optimization in that it occurs during runtime, not during compilation before runtime. Generally, dynamic optimization is accomplished as follows. While a computer program is executing, a separate dynamic optimization program observes the executing computer program and collects profile data. The dynamic optimization program can be implemented as a dynamically loadable library (DLL), as a subprogram inserted into the computer program by a compiler before runtime, or by a variety of other means known in the art.
The profile data can be collected by xe2x80x9cinstrumentingxe2x80x9d the object code. Instrumentation of code refers to the process of adding code that generates specific information to a log during execution. The dynamic optimization program uses that log to collect profile data. Instrumentation allows collection of the minimum specific data required to perform a particular analysis. General purpose trace tools can also be used as an alternative method for collecting data. Instrumentation can be performed by a compiler during translation of source code to object code. Those skilled in the art will recognize that the object code can also be directly instrumented by a dynamic translator performing an object code to object code translation, as explained more fully in U.S. patent application No. 5,815,720, issued to William B. Buzbee on Sep. 29, 1998, and incorporated by reference herein.
Once the code is instrumented and executed, the dynamic optimization program collects the profile data generated by the instrumentation. The dynamic optimization program analyzes the profile data, looking, for example, for xe2x80x9chotxe2x80x9d instruction paths (series of consecutive instructions that are executed often during execution). The dynamic optimization program then optimizes portions of the computer program based on the profile data.
Dynamic optimization is generally accomplished without recompilation of the source code. Rather, the dynamic optimization program rewrites a portion of the object code in optimal form and stores that optimized translation into a code cache. A hot instruction path, for example, might be optimized by moving that series of instructions into sequential cache memory locations. Once the optimized translations are written into the code cache, the dynamic optimization program switches execution flow of control to the optimized translations in the code cache when any of the optimized instructions are thereafter called.
Prior art dynamic optimizers have several disadvantages. First, because the dynamic optimization program is located in user memory space, it has limited privileges. Computer memory is allocated by the computer""s operating system into several categories. One demarcation is between user memory space and kernel memory space. Kernel memory space is generally reserved for the computer operating system kernel and associated programs. Programs residing in kernel memory space have unrestricted privileges, including the ability to write and overwrite in user memory space. By contrast, programs residing in user space have limited privileges, which causes significant problems when performing dynamic optimization in the user space.
Modern computers permit a computer program to share its program text with other concurrently executing instances of the same program to better utilize machine resources. For example, three people running the same word processing program from a single computer system (whether it be a client server, multiple-CPU computer, etc.) will share the same computer program text in memory. The program text, therefore, sits in xe2x80x9cshared user memory space,xe2x80x9d which is accessible by any process running on the computer system.
As explained above, however, in order to perform dynamic optimization, the dynamic optimization program must be able to alter the program text to direct flow of control to the optimized translations in the code cache. Because the dynamic optimization program sits in user memory space, it does not have the privilege to write into the program text. Accordingly, as illustrated in FIG. 1, the program text must be emulated so that the dynamic optimization can take place in private memory space dedicated to the particular process being executed.
Referring to FIG. 1, computer program text 10 (object code after compilation), is emulated by a software emulator 20 that is included within a dynamic optimization program 30. xe2x80x9cEmulationxe2x80x9d refers to the xe2x80x9csoftware executionxe2x80x9d of the computer program text 10. The computer program text 10 is never run natively. Rather, the emulator 20 reads the computer program text 10 as data. The dynamic optimization program 30 (including the emulator 20) ordinarily takes the form of a shared library that attaches to different processes running the computer program text 10. In the example shown in FIG. 1, three different instances of the same computer program 10 are being run via processes A, B, and C.
Focusing on Process A, for example, the dynamic optimization program 30 collects profile data during emulation of instructions for Process A. Based on that profile data, the dynamic optimization program 30 creates optimized translations of portions of the instructions called by Process A and inserts them into an optimized translation code cache 40 stored in private memory space allocated for Process A. Thereafter, when a previously optimized instruction is called by Process A, flow of control is forwarded to the optimized translation for that instruction in the code cache 40. Optimization of instructions for each of the other processes (B and C) works in the same manner.
There are several problems with this approach. First, running a computer program 10 through an emulator 20 is on the order of fifty times slower than running the computer program 10 natively. Second, because the optimized translations are stored in private user memory space specific to particular processes, they are not shared across the computer system, thereby causing significant duplication of work. Individual processes do not have access to other processes"" optimized translations. Finally, because the dynamic optimization program 10 does not have the privilege of writing into kernel space, there is no way to optimize the computer operating system kernel.
What is needed is a method and apparatus to dynamically optimize computer programs that permits optimized translations to be shared across a computer system.
What is needed is a method and apparatus to dynamically optimize computer programs without degrading the performance of the programs.
What is needed is a method and apparatus to dynamically optimize computer programs that permits optimization of the computer operating system kernel.
The present invention solves the problems of the prior art by using a kernel module to perform dynamic optimizations both of user programs and of the computer operating system kernel, itself. The kernel module permits optimized translations to be shared across a computer system without emulation because the kernel module has the privileges necessary to write into the computer program text in shared user memory space. In addition, the kernel module can be used to optimize the kernel itself because it, too, is located in the kernel memory space.
The method of the present invention generally comprises loading a computer program to be optimized into shared user memory space; executing the computer program; analyzing the computer program as it executes, including the substep of collecting profile data about the computer program; providing profile data to a kernel module located in kernel memory space; generating at least one optimized translation of at least one portion of the computer program using the kernel module; and patching the computer program in shared user memory space using the at least one optimized translation as the computer program continues to execute.
The apparatus of the present invention generally comprises at least one processor, adapted to execute computer programs, including computer programs that are part of a computer operating system program kernel; and a memory, operatively connected to the processor, adapted to store in kernel memory space a computer operating system program kernel and a code-rewriting kernel module, wherein the code-rewriting kernel module is adapted to receive profile information regarding a computer program while it is executing and to optimize at least a portion of that executing computer program.