Most computer systems utilize cache memories and the like to improve the performance of these systems while reducing the cost of the memory components. For example, when a program accesses a particular word of memory, the cache subsystem checks to see if the contents of that word are already stored in the cache. If the contents are stored in the cache, they are delivered to the processor. If not, the contents of that word as well as neighboring words are transferred to the cache subsystem. Since the cache subsystem has a much faster access time than the main memory of the computer this strategy substantially improves the run-time performance of the program. Other forms of hierarchical memory structures that operate in a manner analogous to cache memories are known to those skilled in the art, and hence, will not be discussed in detail here.
The degree of improvement provided by hierarchical storage systems depends on the degree of correlation of memory accesses. If there is a high probability that the next instruction to be executed after the currently executing instruction is stored close to the currently executing instruction in memory, the cache memory strategy will be more effective than if the next instruction is far from the current instruction. If computer code is always executed in a predetermined sequence, the compiler and linker sub-systems of the operating system could arrange the code such that the cache system operates at the optimum efficiency. Unfortunately, the vast majority of computer programs include branches and subroutine calls that interrupt the sequential nature of the code, and make it impossible for the compiler and linker to predict the instruction sequence from the code without additional information.
One method of obtaining the information needed to optimize the layout of the program is to use profile based optimization. In profile based optimization systems, the program code is run using representative test data and the sequence in which the code executes its instructions is determined. The operating sequence is determined by inserting instructions at key points in the code. These "instrumentation" instructions report data that may be used to determine the execution sequence of the program. This information is then used to rearrange the code so as to increase the correlation between memory accesses.
A computer program may be viewed as being separated into a plurality of basic blocks. A basic block is a sequence of instructions which contains no branches within the sequence. The basic block is entered from a branch and ends with a branch. Since the sequence of instructions within a basic block already operates with the maximum correlation in memory accesses, the profile based optimization system need only know the sequence with which the basic blocks are executed. This substantially reduces the number of instrumentation instructions that are needed to determine the sequence in which instructions actually execute when the program runs with real data. In addition to instrumenting the basic blocks, profile based optimization systems also typically instrument procedure calls.
In profile based optimization systems, the program is first compiled with options that lead to the insertion of the instrumentation code. The resulting executable code is run with sample data. The reports from the instrumentation instructions are stored in a data base which keeps track of the transfer of control among the basic blocks. After the execution of the instrumented program is complete, the data is used by an optimizer to rearrange the order in which the basic blocks are stored in memory. The rearrangement assures that when a first basic block often calls a second basic block, the two basic blocks are stored close to one another in memory.
While profile based optimization systems have been successfully utilized with conventional programs, the use of these techniques with shared libraries, or dynamic load libraries, has been hampered by the differences in which these two classes of programs are used. For the purposes of this discussion, a shared library will be defined to be one or more program libraries that may be accessed from a number of different programs in multi-tasking systems. These libraries are sometimes referred to as dynamic load libraries (DLLs). These libraries may be part of the operating system. Other service programs that are designed to be called by programs in a particular operating environment will also be apparent to those skilled in the art. For example, a spread sheet or word processing program may be setup as a shared library so that it may be launched by a client program to provide a function required by the client program. For example, the client program may call the word processing program to generate a printout of a document in that word processing program's format.
Shared libraries differ from simple client programs in that the library may not be loaded at the time the client program is loaded. Similarly, the library may not be unloaded when the client program is finished. Instrumentation of the shared library is also complicated by the fact that the library may be needed to service other programs while its instrumented version is being run with profiling data. If the instrumented library does not support multiple instances of the library operating simultaneously, the system may need to be shut down to load the instrumented version.
Broadly, it is the object of the present invention to provide an improved method for operating a computer to provide profile based optimization data for a shared library.
It is a further object of the present invention to provide a method for obtaining profile based optimization data that allows the instrumentation functions to be externally controlled.
These and other objects of the present invention will become apparent to those skilled in the art from the following detailed description of the invention and the accompanying drawings.