1. Technical Field
The present invention relates generally to an improved data processing system and, in particular, to a method and system for monitoring performance within a data processing system.
2. Description of Related Art
In typical computer systems, system developers desire optimization of software execution for more effective system design. Usually, studies are performed to determine system efficiency in a program""s access patterns to memory and interaction with a system""s memory hierarchy. Understanding the memory hierarchy behavior helps optimize the system through the development of algorithms that schedule and/or partition tasks as well as distribute and structure data. In addition, utilization of a processor can be studied to understand the manner in which the execution of a program invokes various functions within the processor.
Within state-of-the-art processors, facilities are often provided which enable the processor to count occurrences of software-selectable events and to time the execution of processes within an associated data processing system. These facilities are known as the performance monitor of the processor. Performance monitoring is often used to optimize the use of software in a system. A performance monitor is generally regarded as a facility incorporated into a processor to monitor selected characteristics to assist in the debugging and analyzing of systems by determining a machine""s state at a particular point in time. Often, the performance monitor produces information relating to the utilization of a processor""s instruction execution and storage control. For example, the performance monitor can be utilized to provide information regarding the amount of time that has passed between events in a processing system. As another example, software engineers may utilize timing data from the performance monitor to optimize programs by relocating branch instructions and memory accesses. In addition, the performance monitor may be utilized to gather data about the access times to the data processing system""s L1 cache, L2 cache, and main memory. Utilizing this data, system designers may identify performance bottlenecks specific to particular software or hardware environments. The information produced usually guides system designers toward ways of enhancing performance of a given system or of developing improvements in the design of a new system.
Events within the data processing system are counted by one or more counters within the performance monitor. The operation of such counters is managed by control registers, which are comprised of a plurality of bit fields. In general, both control registers and the counters are readable and writable by software. Thus, by writing values to the control register, a user may select the events within the data processing system to be monitored and specify the conditions under which the counters are enabled.
To evaluate the efficiency of a processor, it is necessary to determine how much work is performed and how many resources are consumed on behalf of executing instructions. Many modern processors have the ability to execute instructions in an execution pipeline consisting of multiple stages. An instruction is fetched into a first stage and progresses from one stage to the next stage. Each unit along the pipeline operates on a different instruction by performing a single task for a particular stage of execution of the particular instruction. In addition, many modern processors execute instructions out-of-order with respect to the sequence in which the programmer coded the instructions or in which the compiler generated the instructions. As a result, instructions are completed, or retired, in order but execute as their data dependencies allow.
The optimization of software for a particular processor and the optimization of hardware for a particular software workload requires knowledge about the use of processor resources. Most modern processors implement performance monitor counters that count the occurrence of predefined events associated with the use of resources. However, in a processor with out-of-order execution of instructions, the out-of-order characteristic increases the difficulty of debugging the execution of a set of instructions. This may be especially difficult when one attempts to debug the execution of a set of instructions by interpreting an aggregation of events in a performance monitor counter that includes the execution of some instructions out-of-order. The ability to process instructions out-of-order may be disabled, but this attempt to debug an instruction may mask or avoid the very problem being debugged.
Therefore, it would be advantageous to have a method and system for accurately monitoring the use of resources within a processor that performs out-of-order execution of instructions. It would be further advantageous to have a method and system for providing knowledge of when the stages of a pipeline execute and how much time is spent in the various stages of the pipeline in a manner that distinguishes such execution at the level of a single instruction.
The present invention provides a method and system for monitoring the performance of an instruction pipeline. The processor may contain a performance monitor for monitoring for the occurrence of an event within a data processing system. An event to be monitored may be specified through software control, and the occurrence of the specified event is monitored during the execution of an instruction in the execution pipeline of the processor. A particular instruction may be specified to execute within a threshold time for each stage of the instruction pipeline. The specified event may be the completion of a single tagged instruction beyond the specified threshold interval for a stage of the instruction pipeline. The performance monitor may contain a number of counters for counting multiple occurrences of specified events during the execution of multiple instructions, in which case the specified events may be the completion of tagged instructions beyond a threshold interval for any stage of the multiple stages of the execution pipeline. As the instruction moves through the processor, the performance monitor collects the events and provides the events for optimization analysis.