Computer clusters are an increasingly popular alternative to more traditional computer architectures. A computer cluster is a collection of individual computers (known as nodes) that are interconnected to provide a single computing system. The use of a collection of nodes has a number of advantages over more traditional computer architectures. One easily appreciated advantage is the fact that nodes within a computer cluster tend to fail independently. As a result, in the event of a node failure, the majority of nodes within a computer cluster may survive in an operational state. This has made the use of computer clusters especially popular in environments where continuous availability is required.
A fundamental problem with clusters is that the computer clock of each cluster node generally drifts away from the correct time at a different rate. The rate at which a clock drifts is typically measured in parts-per-million (ppm). For example, the clocks used within the Tandem NonStop_UX S4000 computer series are specified to have a drift of less than 25 ppm. This makes the clocks of these systems accurate to approximately 2 seconds per day. Without a correction mechanism, the clocks within a computer cluster will eventually drift far enough that applications that expect synchronized time may begin to work incorrectly.
Several methods have been developed to reduce node-to-node clock differences in computer networks and clusters. One simple method is to set the clock of each node at boot time. This method is useful for reducing large node-to-node time differences. Setting clocks at boot time does little, however, to reduce inaccuracies due to clock drift. Thus, each clock may start at the correct time, but time across the cluster will become increasingly inaccurate over time. A second method for reducing node-to-node clock differences is to periodically synchronize the time of each node against a master clock. If the time between synchronizations is small, each clock will only experience a limited between-synchronization drift. As a result, total node-to-node differences between clocks can be reduced to tolerable limits.
Protocols for synchronizing time against a master clock must account for the propagation delays that exist between the node where the master clock is located (the master node) and the nodes that are to be synchronized (the slave nodes). Otherwise, the clock of each slave node will lag behind the clock of the master node by an amount that is approximately equal to the propagation delay to that slave node. In cases where computers are connected using ethernet-type networks, a relatively simple mechanism exists for accurately calculating propagation delays. To use this mechanism, the master node sends a message to a slave node. The slave node then responds with an acknowledgment message. The master node then calculates a propagation delay by computing the round trip time (of the message and its acknowledgments) and dividing by two. The master node synchronizes time by sending a message including the sum of the propagation delay and its current clock time to the slave node.
The simple mechanism used to calculate propagation delays in ethernet-type networks works because nodes in these networks use a single connection for sending and receiving messages. The use of a single connection means that the propagation times to and from a node are approximately equal. This allows the propagation delay to a node to be computed as round trip time divided by two. Unfortunately, there are highly desirable network types that do not provide the same uniformity of sending and receiving propagation delays. Networks of this type include Tandem Computer's Servernet products. Each node in a Servernet network has separate network connections: a first for sending and a second for receiving. Separate connections means that the propagation delays to and from a node may not be the same. This makes the mechanism used in ethernet-type networks unsuitable for use in networks like Tandem's Servernet.
Based on the preceding discussion, it is not hard to appreciate that a need exists for time synchronization systems that are suitable for use in networks where the ethernet simplification does not apply. There is also a need for new or extended time synchronization systems that fulfill a range of other currently unmet needs. For example, currently available time synchronization systems often fail when faced with significant clock frequency errors. Currently available time synchronization systems may also fail when faced with heavily loaded or congested works. Both of these failures indicate that currently available time synchronization systems lack the ability to provide the type of fault-tolerant operation that is desirable. Currently available time synchronization systems may also require the network to process large numbers of synchronization messages. A large number of synchronization messages steals network bandwidth from other computing tasks.
Thus, there is a need for fault tolerant techniques that synchronize system clocks across the nodes of a cluster that have minimal affect on, and are minimally affected by, communication traffic throughout the cluster.