In computer systems, a common method for monitoring the execution of programs is to collect an event trace. An event trace is a record of significant events that occurred during the program execution together with information about the timing of the events. The utility of an event trace is enhanced when the timing measurements, also referred to as the time base, are precise and accurate. Many systems offer highly precise local time bases, such as the cycle counter on Alpha processors. In a distributed computer system with multiple processors, each having an independent clock, a precise global time base is not available. Without a precise global time base, it is difficult to compare the timing of events on different processors. For example, without a precise global time base, it is difficult to determine the time taken to send a message from one processor to another.
A simple model of the relationship between the global time g and a processor's local time t is EQU t=A*g+B (1)
where A is drift, the rate difference between g and t, and B is an offset between readings of g and t. Generally, the drift and the offset will be different for each processor, and the problem of labeling events in an event trace with their global times reduces to calculating drift and offset parameters for each processor so that local times measured on that processor can be converted to a common global time base.
Past attempts to solve the local to global time problem have focused on clock-synchronization protocols in which processors exchange messages in a predetermined pattern before beginning a computation in order to collect data which can be used to estimate the relationship between their local time bases and an agreed-upon global time base. These protocols can be effective, but they perturb the computation being traced. First, they require additional message communication that is not part of the computation being monitored. Second, they require computation at run-time to translate local clock readings to corresponding global clock readings. These extraneous activities can perturb measurements beyond the basic perturbation that results from collecting an event trace. Moreover, the drift values of the local clocks must be known a priori, since they are not calculated by the clock-synchronization algorithm.
It remains desirable to have an efficient method for synchronizing local clocks to a global time base in a computer system.