In order to improve performance of code generated by various families of computers, it is often necessary to determine where time is being spent by the processor in executing code, such efforts being commonly known in the computer processing arts as locating "hot spots". Ideally one would like to isolate such hot spots at the instruction and/or source line of code level in order to focus attention on areas which might benefit most from improvements to the code.
For example, isolating such hot spots to the instruction level permits compiler writers to find significant areas of suboptimal code generation, whereby they may thus focus their efforts to improve code generation efficiency in these areas. Another potential important use of instruction level detail is to provide guidance to the designer of future systems. Such designers with appropriate profiling tools may find characteristic code sequences and/or single instructions requiring improvement to optimize the available hardware for a given level of hardware technology.
In like manner, isolating hot spots to the source line of code level would provide the level of detail necessary for an application developer to make algorithmic tradeoffs. A programmer's a priori guesses about where a program is spending significant time executing are frequently wrong for numerous reasons. First the programmer seldom has a comprehensive understanding of the complex dynamics of the hardware and software system. Secondly, the compiler itself often does not generate code that corresponds to the programmer's assumptions. It was accordingly highly desirable to provide a system for feeding back information to the programmer about the execution dynamics of a program in terms that the programmer could easily understand.
Thus various methods had been developed for monitoring aggregate CPU usage known as "profiling". One approach was to simply add instructions to the program being analyzed to enable it to essentially assess itself. This however introduces the undesirable characteristic of invasiveness wherein the possibility arose that necessary changes for profiling may introduce changes to the dynamics of the very thing one is attempting to measure. Yet another approach to providing for profiling was to develop external specialized hardware monitors. However, this approach also entailed numerous drawbacks, not the least of which was the expense associated with development of such specialized hardware and questions of feasibility in even doing so.
In some environments, the need for such profiling was particularly acute and yet was not satisfied by the existing methods due to the unique characteristics of the environments. An example of such an environment is the RISC System/6000.TM. line of computers operating the AIX.TM. Operating System of the IBM Corporation (RISC/6000 and AIX are trademarks of the International Business Machines Corporation). A more detailed description of this hardware and software is provided in "IBM RISC System/6000 Technology", first edition 1990, publication SA23-2619, IBM Corporation.
One specific attempt at providing profiling for such environments was a system known in the art as "Gprof", described in the article "Gprof: A Call Graph Execution Profiler", Proc. ACM SIGPLAN Symposium on Compiler Construction, June, 1982, by S. L. Graham, P. B. Kessler, and M. K. McKusick. Several problems were associated with this profiling system. First there was no shared library support, thus requiring the compilation of program with exclusively non-shared libraries. The system did not provide support for the simultaneous profiling multiple processes, all processes which could be run had to be recompiled for routine-level profiling, the system was invasive (e.g. modified the executable code to be profiled), and required dedicating to profiling additional memory approximately half of the space of the program to be profiled. Moreover, in addition to the entire set of processes to be profiled having to be rebuilt in order to provide profiling, it was only capable of providing routine-level and no source statement or instruction level profiling, did not summarize all CPU usage but rather only that of one user program at a time, and further often required a substantial increase in user CPU time, sometimes approaching 300%, due to its invasiveness.
For this reason other approaches were suggested for profiling in such environments including, for example, the PIXIE system of MIPS Computer Systems, Inc. described in "Compilers Unlock RISC Secrets", ESD, December, 1989, pgs. 26-32, by Larry B. Weber.
In this system the executable objects of the processes to be profiled are analyzed and reconstructed with every atomic sequence of instructions, known in the art as a "basic block", being preceded with hooks which emit an event reporting the beginning of execution of the basic block from the emitted sequence of the basic block. From the emitted sequence of events the frequency of execution of each basic block can be maintained during run time. In a subject post processing step this frequency of occurrence is correlated to the source statement and routines of the program to provide execution time profiles.
Whereas this method offers the advantage of direct measurement over estimates obtained from sampling the program counter, it offers the disadvantages of no shared library support, no support of multiple processes, requires an increase in program executable space by up to factor of 3 and increase in program executables by factors of 10 or more.
Yet additional developments were made in profiling systems such as those outlined in the following references: "Non-Intrusive and Interactive Profiling in Parasight", Proc. ACM/SIGPLAN, August, 1988, pgs. 21-30, by Ziya Aral and Ilya Gertner. In this development, the invasiveness resulting in additional run time was decreased by selectively modifying code sequences of interest to directly measure the execution time of the selected code sequences and by employing an additional supplemental process to capture and process the run time measures.
From the foregoing it will be apparent that profiler technology to support the various aforementioned environments needed numerous improvements. Specifically, a profiler was needed which would support multiple process and multiple user environments, shared libraries (dynamically loaded shared objects), kernel as well as user execution spaces, and kernel extensions (dynamically loaded extensions to the kernel).
Requirements which became apparent as particularly desirable and greatly needed in a profiler related to the characteristics of convenience and non-invasiveness. These two factors are strongly related as well as having merit in their own right.
As an example of convenience, it was highly desirable to provide a profiling tool which would enable a user to very easily profile existing running code without requiring special procedures, recompilation, relinking, or rebuilding. Moreover, it was further highly desirable to provide a profiling tool which was non-invasive as well. The comprehensive feature simply would provide for profiling of all processes and all address domains for each process--the kernel, kernel extensions, user, and shared objects. The highly desirable feature of non-invasiveness would contemplate that executables and supporting environments would be virtually identical whether profiling or not, requiring no special effort in obtaining this equivalence. Conventional systems required modification of executables in order to profile at the instruction level, for example, resulting often in excessive CPU and memory utilization. The importance of non-invasiveness is that the gathered statistics are not distorted and all instruction streams and referenced addresses are maintained. The latter is particularly important when looking for performance issues that are related to overuse of hardware facilities such as the TLB, data and instruction caches, registers, and memory.
For all of the forgoing reasons, a profiling tool was highly desirable which could report on the aggregate CPU usage of all users of the environment, including all programs (processes) running, including the kernel, during execution of the user programs (as well as the fraction of time the CPU is idle) whereby users might determine CPU usage in a global sense. Such a profiler was further desired as a tool to investigate programs which might be CPU-bound wherein the programmer would find it useful to know sections of the program which were being most heavily used by the CPU. Still further a profiler was further highly sought which could be run using the executable program as is without the need to compile with special compiler flags or linker options whereby a subprogram profile could be obtained of any executable module that had already been built.