Measuring the performance of an operating computer system is a frequent and extremely important task performed by hardware and software engineers. Hardware engineers need performance data to determine how new computer hardware operates with existing operating systems and application programs. Software engineers need to identify critical portions of operating system, kernel, device driver, and application software programs.
It is a particular problem to measure the performance of modern high-speed pipelined multi-processor system. In a pipelined multi-processor system, multiple instructions may be issued for each processor cycle. A processor cycle is the basic timing interval for processor operations. Barring stalls, instructions advance one stage through the pipeline for each cycle. Then, ideally, the system is operating most efficiently. This means, that at any one time multiple instructions can concurrently be executing in the various pipelines. Generally, the operator code and operands of an instruction determine how many processor cycles are required to completely execute the instruction. However, the operator code does not indicate how many cycles will be required to issue the instruction, this can only be determined dynamically while the instruction is executing.
However, some instructions may interfere with each other. For example, a next load, or conditional branch instruction may require the results of an as yet uncompleted instruction. In other cases, instructions may be waiting for a processor resource such as a particular floating-point arithmetic unit. In these cases, execution of the later instruction is stalled until the other instruction completes or the resource becomes available. While an instruction is stalled, decoding and processing are suspended, and the processor is operating less efficiently. That is, the number of cycles required to complete the execution of an instruction may be greater than ideally determined.
A profiling system can be used to collect performance data on how frequently instructions are executed. Some prior art profiling systems require that source or object programs be modified to insert instructions which can collect the data when the programs are executing. Modifying the programs means that the programs need to be recompiled and/or relinked.
In addition, prior art profiling systems generally only determine the frequency of execution of particular instructions, and not the number of cycles that are required to issue the instructions. In a pipelined multi-processor system, the number of cycles required to issue the various instructions is a significant indicator of the performance of the system.
As a further restriction of some known systems, profiles can only be generated for instructions of an application program. This means that when the application program calls an operating system procedure no performance data on actual instructions executed by the system procedure are collected. Some profiling may measure the amount of time it took to process the system call, and attempt to infer performance data by dividing the time for processing a system call by some "average" execution time of instructions, however nothing concrete is learned about the actual execution of instructions of system procedures.
Therefore, it is desired to profile machine executable programs without having to modify source or object code files so profiled programs do not need to be recompiled or linked. Furthermore it is desired to profile both application and system (kernel) level programs. In addition, it is desired to profile not only the number of times each instructions is executed, but also the average number of stall cycles incurred when each instruction is issued, and the reasons why those stalls occurred.