In distributed computing systems, it is assumed that each processor node consults its own internal clock. Since clocks in distributed systems drift apart, they must be periodically resynchronized, that is, brought very close together in value. This resynchronization is necessary in order to carry out many protocols for distributed systems. One such protocol is described in Strong et al, application Ser. No. 06/485,575, filed on Apr. 18, 1983, entitled "A Method for Achieving Multiple Processor Agreement Optimized for No Faults".
In a seminal article, Lamport, "Time, Clocks, and the Ordering of Events in a Distributed System", Vol. 21, Communications of the Association of Computing Machinery, pages 558-565, July 1982, Lamport uses the concept of one event happening before another in order to define a partial ordering of events. Further, Lamport describes a protocol for extending this partial ordering to a total ordering for synchronizing events and then applying this to a system of physical clocks. This guarantees that a set of correct clocks will differ by no more than a specifiable amount.
A paper by Lamport and Melliar-Smith, "Synchronizing Clocks in the Presence of Faults", SRI Technical Reports, published July 13, 1981, describes clock resynchronization in a distributed system in which each processor is required to broadcast its time value. In turn, each processor receives the clock values from every other processor, discards extreme values, and uses an averaging process for synchronization. In order to achieve clock synchronization in the presence of f faults, Lamport requires (2f+1) processors.
Since an averaging process is used, there must be more non-faulty processors than faulty ones for the described technique to work. Note, clock synchronization in this context is simply the condition that clocks differ by no more than a specified upper bound.
In the previously mentioned copending Strong application, there is described a method for achieving Byzantine Agreement among n processors in a reliable (f+1) connected network with guaranteed early stopping in the presence of faults, and eventual stopping for f+(n/2) faults. Byzantine Agreement is a protocol which guarantees that eventually all correct processors will agree on a value. By way of contrast, clock synchronization protocols must guarantee that all correct processors agree (within a specified margin of error) on a time.
Prior art protocols have required a significant quantity of message passing. In the Lamport and Melliar-Smith case approximately n.sup.(f+1) messages are exchanged, where n is the total number of processors and f is the number of processor faults.
In the copending Halpern et al application, the described method requires that during each period, the network of processors previously agree upon an ordered list of participants, and that at specified time in the period, the first processor on the list attempts to synchronize all to its own clock. The result of this is either a synchronization of all correct processors (clocks) to be within the desired tolerance or an agreement among all other correct processors that the first on the list has failed. If the first fails, then the second tries, etc.