Performance monitoring or profiling and analysis of computer systems is an important tool both for hardware and software engineering. In order to evaluate existing and new computer architectures it may be useful to collect data related to the performance of the computer system. A variety of information may be collected by a monitoring or profiling tool, for example: cache misses, number of instructions executed, number of cycles executed, amount of CPU time devoted to a user, and the number of instructions that are used to optimize a program, to name just a few.
Different designs of computer hardware structures, such as a computer memory or cache, may exhibit significantly different behavior when running the same set of programs. A monitoring or profiling tool may be useful in identifying design strengths or flaws. Conclusions drawn from the data collected by the profiling tool may then be used to affirm or modify a design as part of a design cycle for a computer structure. Identifying certain design modification, flaws in particular, before a design is finalized may improve the cost effectiveness the design cycle.
Software engineers and programmers can utilize a profiling tool to identify regions in the software that are critical to performance. The need to identify critical regions applies to many types of application programs as well as operating systems. Compiler designers, for example, may wish to know how a compiler schedules instructions for execution, or how well execution conditional branches are predicted. Information about such performance criteria may in turn be used for optimization of the compilation process.
Two common conventional techniques for collecting runtime information about programs executed on a computer processor are instrumentation based profiling and sampling based profiling. Profiling information obtained with these techniques is typically utilized to optimize programs. Conclusions may be drawn about critical regions and constructs of the program by discovering, for example, what portion of the execution time, of the whole program, is spent executing which program construct.
The method of instrumentation based profiling involves the insertion of instructions or code into an existing program. The extraneous instructions or code are inserted at critical points. Critical points of the existing program may be, for example, function entries and exits or the like. The inserted code handles the collection and storage of the desired runtime information associated with critical regions of the program. It should be noted that at runtime the inserted code becomes integral to the program. Once all the information is collected the stored results may be displayed either as text or in graphical form. Examples of instrumentation based profiling tools are prof, for UNIX operating systems, pixie for Silicon Graphics (SGI) computers, CXpa for Hewlett-Packard (HP) computers, and ATOM for Digital Equipment Corporation (DEC) computers.
The method of sampling based profiling involves sampling the program counter of a processor at regular time intervals. For example, a timer is set up to generate an interrupt signal at the proper time intervals. The time duration between samples is associated with a time duration spent executing the program construct, of the code profiled, that the program counter is pointing at. A program construct may be, for example, a function, a loop, a line of code or the like. Data relating time durations with program constructs provides a statistical approximation of the time spent in different regions of the program. Examples of sampling based profiling tools are gprof by GNU, Visual C++Profiler and Perfmon, by Microsoft, and Vtune by Intel.
The difficulty with conventional profiling techniques are that many are intrusive, inflexible, and inaccurate. A difficulty that is common to all profiling techniques is that the very operations themselves involved in collecting and storing information about a running program change the runtime characteristics of that program. Therefore, it is appropriate to optimize the process of collecting and storing information such that the effect on the measurements of program performance is minimized.
Instrumentation based profiling is in general more accurate than sampling based profiling since it accurately identifies and captures various operating characteristics of program constructs. Instrumentation based profiling is, however, more intrusive as extraneous code must be inserted in all regions of interest in the program. Thus, instrumentation based profiling changes the very nature of the program to be monitored. Moreover, a program must be prepared, by the insertion of the extraneous code, during the compilation process. This causes the profiling operation to be inflexible, as any change in the desired information or in the regions of the program to be profiled may require recompilation. In particular, recompilation of large programs may result in substantial time costs.
Sampling based profiling is less intrusive but also less accurate than instrumentation based profiling. Sampling based profiling provides a statistical approximation of the information collected. Hence, sampling based profiling may be sensitive to a set of assumptions and inaccuracies involved in approximation. For example, sampling based profiling relies on the assumption that inter-sample time durations are related to a particular program construct identified at the sampling instant. It is possible, however, that the sampling instant is such that the inter-sample time duration bears no relationship to the program construct identified. Such inaccurate information adds noise to the statistical approximation. Further, sampling based profiling is less flexible than instrumentation based profiling. The technique is difficult to use for monitoring the behavior of program constructs that are smaller than functions, as it is difficult to accurately associate the value of the program counter with a particular region of the program. Increasing the sampling rate may improve the accuracy and flexibility of sampling based profiling but then the technique becomes intrusive.