1. Technical Field
The inventions described herein relate to computers and computer programs. In particular, the inventions relate to using highly skewed clocks for application based tracing and application based normalization of processor clocks in a symmetric multiprocessor (SMP) environment.
2. Description of Related Art
A computer central processing unit (CPU) may include a high frequency clock used for various functions and applications. For example, a high frequency clock can define a step in a fetch, decode, and execute cycle for the processor. High frequency clocks may be distinguished from other system clocks, which provide date and time facilities for a computer system. The computer system uses different, high frequency clocks because high frequency clocks are updated at a relatively high frequency and system clocks do not need to use an extremely high frequency. The precise frequency of a high frequency clock is dependent upon the operational clock speed of a particular processor. For example, a processor configured to operate at a clock speed above one gigahertz will include a high frequency clock capable of providing a timing resolution of about a nanosecond or less.
High frequency clocks in CPUs have many applications. For example, such clocks are useful for the precise measurement of elapsed time and therefore have useful applications in the measurement of performance statistics for computer programs executing in a processor. For example, high frequency clocks may be used for application based tracing to determine the performance of an application.
The high resolution of the clock allows the measurement of elapsed time for very short program fragments, such as fragments requiring only a few hundred processor cycles. A typical approach to making such a measurement is illustrated in the following pseudo-code:
start_time = getHighFrequencyClockTicks<program fragment>end_time = getHighFrequencyClockTickselapsed_time = end_time − start_timeThe <program fragment> is the program fragment to be measured. The pseudo-code “getHighFrequencyClockTicks” corresponds to processor instructions to obtain a value of the high frequency clock and is typically implemented as a few instructions in order to avoid consuming a significant amount of processor time. For example, in the Intel IA32 processor, “getHighFrequencyClockTicks” corresponds to the RDTSC (read time stamp counter) instruction.
While the use of such high frequency clocks is advantageous for measuring elapsed time on a single processor, in a multiprocessor system problems can arise because it is not possible to guarantee that the clocks in each processor are synchronized in the sense that they express an identical standard time. The difference between a time value of one processor clock and a time value of another processor clock is termed clock skew. This characteristic of multiprocessor systems coupled with a possibility that a running program fragment can be switched between processors during execution makes accurately measuring an elapsed time very difficult. This characteristic arises because the start_time and end_time may be measured on different clocks in different CPUs. For example, the start_time may be measured on a clock in a processor on which the program fragment commenced execution, and the end_time may be measured on a clock in a processor on which the program fragment ceased execution. In this situation, the elapsed time includes not only the time taken to execute the program fragment, but also the unwanted clock skew.
One solution to this problem is to identify the processor on which the program fragment commences execution and to identify the processor on which the program fragment ceased execution. Thus, it is possible to determine when the elapsed time measurement is based on clock values for the same processor. For example, the pseudo-code could be amended to:
start_processor = getProcessorIDstart_time = getHighFrequencyClockTicks<program fragment>end_time = getHighFrequencyClockTicksend_processor = getProcessorIDelapsed_time = end_time − start_timeif start_processor = end_processor then elapsed_time is validInstructions or operating system facilities are known for obtaining an identifier for a processor (nominally indicated as “getProcessorID”). However, such instructions may require operating system support or may be synchronizing instructions which interfere with the measurement of time. Further, it is possible that the performance measurement program is switched to a different processor between the “getProcessorID” instruction and the “getHighFrequencyClockTicks” instruction. Consequently, the “getHighFrequencyClockTicks” instruction will obtain a clock value for a processor which is different than the processor identified by the “getProcessorID” instruction.
Thus, it would be advantageous to provide a method and mechanism for identifying a time value of a high frequency clock in a processor and for identifying a processor in a single indivisible operation. By using such a method and mechanism, an intervening operation, such as a redispatch operation that causes the program to be performed on another processor, can be accounted when determining the time required to execute the program.