Traditionally, computer software has been written for serial execution. That is, a computer algorithm was constructed and implemented as a serial stream of instructions. These instructions may have been executed on a single central processing unit (CPU) that is part of a computer system to perform a desired function.
More recently, computer systems that include multiple processors have been developed, and may be operative to implement parallel computing functionality. Parallel computing is a form of computation in which multiple calculations or operations are carried out simultaneously, operating on the principle that larger problems can often be divided into smaller ones, which are then solved concurrently.
Generally, parallel computer systems may be classified according to the level at which the hardware of the computers supports parallelism. For example, some multi-processor computers include multiple processing elements (e.g., multiple CPUs) within a single machine. Conversely, other computer systems use multiple individual computers to work on the same task (e.g., clusters, massive parallel processors (MPP), grids, and the like).
Often, it may be necessary or desirable to determine the amount of time that a certain task or process takes to execute on a computer system. One way to achieve this is to measure the number of clock cycles of a CPU clock that have elapsed between the start and end of a task to be measured. Then, using a known frequency of the CPU clock and the number of clock cycles elapsed during the task, software may calculate the elapsed time to execute the task.
While using a CPU clock to measure the elapsed time for a task executing on a single CPU is advantageous, problems may arise in a multiprocessor system because the clocks of each CPU in the multiprocessor system may not be synchronized with each other. For example, during a system reset, the individual CPUs may be reset at slightly different times. The difference between a value of one processor clock and a value of another processor clock is termed “clock skew.” For example, if the “start time” for a process or task is measured on one CPU and the “end time” for the process is measured on a different CPU, the clock skew between the two CPUs may yield an inaccurate calculation for the execution time for the process. Therefore, it may be desirable to account for the clock skew between the CPUs of a multiprocessor system so that accurate process time measurements may be made.