1. Field of the Invention
The present invention generally relates to software systems, and in particular to methods for improving the tuning of such systems to improve their performance.
2. Description of the Related Art
Building a typical software system encompasses a number of steps and culminates in the production of a bound module (or “executable binary”) which has been tuned to execute on a specific hardware and software configuration. This process, known as “static optimization”, may include the steps of compiling the program with optimization and inserting instrumentation, static or dynamic binding with pre-built runtime libraries, testing with some range of expected user input on some sample of expected user configurations, and recompilation with optimization based on data collected during testing (this is referred to as “optimizing by means of profile directed feedback”).
The program which results is expected to be both robust and optimal across the range of expected user configurations and input data. In reality, however, the spectrum of user environments may be so broad as to prevent such programs from achieving their optimum performance in all cases.
It has long been recognized that static techniques (i.e., the gathering of information about code execution outside the operational environment to allow fine-tuning of code for subsequent executions) are inadequate for generating efficient code because it is difficult, if not impossible, to generate test data representing all possible end-user data and hardware configurations.
Moreover, as instruction-level parallelism increases and pipelines deepen, this inadequacy becomes increasingly an irritant.
An alternative to static optimization is “dynamic optimization.” Dynamic optimization is similar to static optimization with profile-directed feedback in that it includes the steps of instrumenting the code to be optimized, compiling it, and statically or dynamically binding it with run time libraries.
However, dynamic optimization differs from static optimization in that the instrumented code is observed during live execution with actual user data and that the executing code is re-optimized in situ (i.e., the information about program execution thus gathered is used during the same execution to re-optimize the code for subsequent (within the same run) execution).
The resulting optimized code from dynamic optimization is often more efficient than achievable with static optimization techniques because optimization can be focused, for instance, on those heavily utilized portions of the code whose increased efficiency will affect overall performance as indicated by actual program use.
In addition, with dynamic optimization there is no requirement for a customer to perform the labor-intensive tasks of generating hypothetical data sets.
However, in prior dynamic optimization systems, the drawback to dynamic optimization is that the information gathering and compilation work that must be performed is interspersed sequentially with the execution of the application code, thus adding to the execution time of the application. This additional cost must be outweighed by the improvement achieved in the running time of the application in order for any benefit to accrue from the dynamic optimization.
Examples of dynamic optimization include the techniques that have been proposed which seek to optimize the execution of Java® applications by performing dynamic compilations and optimization of Java® bytecodes. The key with such techniques is that since the typical bytecode execution is significantly slower than most compiled applications, the overhead of the runtime processing may be absorbed by a modest improvement in execution. Such opportunities are rarely available in the execution of highly optimized non-Java (binary code) applications.
Moreover, many of the techniques employed, such as optimizing method calls, are not generally applicable.
Other projects, such as the one described in V. Bala, et al., “Transparent Dynamic Optimization: the design and implementation of Dynamo” HPL 1999-1978 990621, have addressed the issue of dynamically improving runtime performance by rearranging code layout to improve instruction cache locality, under the assumption that the overhead can be repaid by the resulting improvements in execution time.
The Dynamo technique, however, does not take advantage of instrumentation information, nor, since it is designed for uniprocessor systems, does it consider multiprocessor applications.
However, computer systems today are typically comprised of more than one processor. Even at the low end, single chip multiprocessors are becoming ubiquitous. It is also frequently the case that in many of these configurations, at least one of the processors is underutilized. Earlier attempts to exploit multiprocessor capability to improve program execution have focused on techniques such as automatic parallelization of applications. These techniques have met with limited success for a variety of reasons, the most significant of which are: automatic parallelization is most relevant to numeric intensive applications, especially those written in Fortran; commercial transaction processing applications, typically written in C, have proven less amenable to this approach; and automatic parallelization has proven difficult to implement in practice, even for the more regular types of code.
Thus, clearly there is a need for an optimization process that can take advantage of the unique properties of multiprocessor systems for a broad category of codes.