1. Field of the Invention
The present invention relates to performance monitoring of a computer system or of some aspect of a computer system, such as a processor or memory or software running on the system, and, more particularly, to managing counters for such performance monitoring.
2. Related Art
According to the IBM AIX operating system, a performance monitor function of the operating system (“OS”) services a performance monitoring API. This servicing includes accessing 64-bit performance monitoring accumulators. (The AIX operating system is a product of, and “AIX” is a trademark of, International Business Machines Corporation.) The accesses to the accumulators are by means of operations in the “system” state since the accumulators are conventionally located in system memory. The Power and PowerPC processor architectures provide a set of 32-bit performance monitor counters. These counters are registers on the Power and PowerPC processors. (Power and PowerPC processors are products of, and “Power” and “PowerPC” are trademarks of, International Business Machines Corporation.) Conventionally, all the counter registers on the processor are used for storing performance-measurement-related counts for a single processing thread. Consequently, each time there is a thread switch the OS performance monitoring function reads the 32-bit performance monitor counters for the thread losing control and adds the counter values to respective 64-bit performance monitoring accumulators. The OS performance monitoring function then resets the 32-bit counters so that the counts all start over at zero for the thread that is gaining control. This resetting tends to prevent the counters from overflowing.
Also, according to the Power and PowerPC processor architectures, a first such 32-bit counter may affect another 32-bit counter if the count value of the first counter exceeds a certain limit. For this architecture, resetting of a counter value by the performance monitor is also useful to avoid unwanted counter interaction.
It is known to use the performance counters and accumulators in connection with measuring for a wide variety of events, such as measuring how many instructions have completed for a subroutine. Ideally the sampling time for measuring performance of an event is small in comparison with duration of the event itself. However, some measured events occur very quickly. For example, some subroutines are only a few instructions long. As previously stated, the conventional performance monitoring operation that manages the 64-bit performance monitoring accumulators involves the system state. Unfortunately, the overhead for invoking the system state involves perhaps thousands of instructions.
If an arrangement for measuring duration of a performance event cannot provide fast sampling time in comparison with the measured event, then the delay associated with measurement sampling time should at least be consistent from one measurement instance to the next. However, the above described arrangement does not provide consistent measurement overhead. That is, the above described system-state-related operation is required for measurement overhead, but in comparison with the execution time for running a subroutine of a few instructions, variation in execution time can be significant from one instance to the next for a system call involving 1000 instructions. Thus, the previously known arrangement for measuring performance of short-duration events is problematic.
The related case discloses an arrangement that addresses this problem. According to an embodiment of an invention disclosed therein 32-bit hardware registers on a processor are architected as performance monitor counters and are used with logic for maintaining coherent counts despite thread switching. This enables the reading of coherent values directly from the 32-bit hardware registers in the user state, which can be done very quickly. Also, the related case discloses a way to read performance counters from 64-bit, system memory in which values from the 32-bit hardware registers are accumulated, and discloses a way to do so with reduced sample time overhead. However, a need still exists for a way to very quickly read a performance monitor count that is larger than the number of bits in a single one of the architected performance monitoring hardware registers.