The present invention relates to measuring an elapsed period of time for the execution of a software routine. More particularly it relates to identifying a valid measure of an elapsed period of time for the execution of a software routine in a multiprocessor system.
A computer central processing unit (CPU) may include a high frequency clock. For example, such a high frequency clock can define a step in a fetch, decode and execute cycle for the processor. Such clocks are to be distinguished from other system clocks which provide date and time facilities for a computer system since high frequency clocks are updated at a relatively high frequency. The precise frequency of such a high frequency clock is dependent upon the operational clock speed of a particular processor. By way of example, a processor configured to operate at a clock speed above one gigahertz will include a high frequency clock capable of providing a timing resolution of the order of magnitude of a nanosecond. This compares with a system clock which may provide a resolution of a thousandth or less of such high frequency clocks.
High frequency clocks in CPUs can be used to precisely measure elapsed time and therefore have useful applications in the measurement of performance statistics for computer programs executing in a processor. The high resolution of the clock allows the measurement of elapsed time for very short program fragments, such as fragments requiring only a few hundred processor cycles. A typical approach to such a measurement is illustrated in pseudo-code below:
start_time = getHighFrequencyClockTicks<program fragment>end_time = getHighFrequencyClockTickselapsed_time = end_time − start_time
The <program fragment> above is the program fragment for measurement. The pseudo-code “getHighFrequencyClockTicks” corresponds to processor instructions to obtain a value of the high frequency clock and is typically implemented as a few instructions in order to avoid consuming a significant amount of processor time. For example, in the Intel IA32 processor, “getHighFrequencyClockTicks” corresponds to the RDTSC (read time stamp counter) instruction.
Whilst the use of such high frequency clocks is advantageous for measuring elapsed time on a single processor, in a multiprocessor system problems can arise because it is not possible to guarantee that the clocks in each processor are synchronized in the sense that they express an identical clock time. The difference between a value of one processor clock and a value of another processor clock is termed clock skew. This characteristic of multiprocessor systems coupled with a possibility that a running program fragment can be switched between processors during execution makes it very difficult to accurately measure an elapsed time for a program. This arises because the start_time and end_time may be measured on different clocks in different CPUs. For example, the start_time may be measured on a clock in a processor on which the program fragment commenced execution, and the end_time may be measured on a clock in a processor on which the program fragment ceased execution. In this situation the elapsed time includes not only the time taken to execute the program fragment, but also the unwanted clock skew.
The clock skew may be anything from zero to billions of cycles. Such clock skew removes the validity of a measurement where two processors have clocks which are highly skewed (i.e. large difference between the clock values). Furthermore, where a clock skew between processors is small but the elapsed time of a program fragment is even smaller, the elapsed time as measured using the pseudo-code above may be negative. Since values of time in such high frequency clocks are often represented as unsigned data types, the measurement becomes meaningless. For example, the Intel IA32 processor stores the high frequency clock value as a sixty-four bit unsigned cycle value.
Various techniques have been used to address this problem with limited success. Some of these are outlined briefly below.
One possibility is to reset clocks before running the program fragment. This may improve the accuracy of measurements in the short term, but the value of clocks can diverge from each other (known as clock drift) causing increasing clock skew. Eventually, as clock skew increases above the elapsed time for a program fragment, invalid negative elapsed time measurements will occur.
An alternative approach is to measure the elapsed time on multiple occasions in order to generate a distribution of elapsed time measurements. Subsequently, those measurements which are outside a “normal” range (which can be defined using a statistical method) can be discarded, and a mean value of the distribution can be used as a statistical measure. This requires the additional overhead of generating and maintaining the distribution of measurements and may result in a less accurate mean measure.
A further alternative is to identify the processor on which the program fragment commences execution and to identify the processor on which the program fragment ceased execution. In this way it is possible to determine when the elapsed time measurement is based on clock values for the same processor. For example, the pseudo-code could be amended to:
start_processor = getProcessorIDstart_time = getHighFrequencyClockTicks<program fragment>end_time = getHighFrequencyClockTicksend_processor = getProcessorIDelapsed_time = end_time − start_timeif start_processor = end_processor then elapsed_time is valid
However, the instruction to obtain an identifier for a processor (nominally indicated as “getProcessorID”) is typically a synchronising instruction which interferes with the measurement. Also, it is possible that the performance measurement program is switched to a different processor between the “getProcessorID” instruction and the “getHighFrequencyClockTicks” instruction. Consequently, the “getHighFrequencyClockTicks” instruction will obtain a clock value for a processor which is different to the processor identified by the “getProcessorID” instruction.
Thus it would be advantageous to provide a mechanism for identifying accurate and valid measurements of elapsed time for a program fragment in a multiprocessor system.