Multinode multiprocessor computer systems typically have multiple processors in each node. The nodes are connected together through a system interconnect to facilitate communication between the processors. In some applications, the nodes may be divided into physical partitions, or domains, wherein each physical partition is capable of operating as a separate computer. Typically, the processors on the nodes need access to a system clock to determine the time that events start, stop, timeout, etc.
Keeping consistent time among different nodes in a network is a fundamental requirement of many distributed applications. The internal clocks of nodes are usually not accurate enough and tend to drift apart from each other over time, generating inconsistent time values. Network clock synchronization allows these devices to correct their clocks to match a global reference of time, such as the Universal Coordinated Time (UTC), by performing time measurements through the network. For example, for the Internet, network clock synchronization has been an important subject of research and several different protocols have been proposed. However, these protocols are used for various legacy and emerging applications with diverse precision requirements such as banking transactions, communications, traffic measurement and security protection.
In particular, in modern wireless cellular networks, timesharing protocols need an accuracy of several microseconds to guarantee the efficient use of channel capacity. Another example is the recently announced Google Spanner, a globally distributed database, which depends on globally-synchronized clocks within at most several milliseconds drifts.
As another example, as part of a Transmission Control Protocol/Internet Protocol (TCP/IP) protocol, processors must measure a roundtrip time for TCP/IP packets to travel between source and destination computers. Yet another example is the running of a debugging application that places timestamps on events and stores the timestamps in a log file. In such debugging applications, the exact time and sequence of events is important. Because different processors on different nodes store timestamps in the log file, it is important that all the processors have access to a common time base. If the processors access different clocks and those clocks are not synchronized, the timestamps would be meaningless and events would appear erroneously out of order.
Clock synchronization on computer networks has been subject of study for more than 20 years. Standards for IP networks are the Network Time Protocol (NTP), Precision Time Protocol (PTP), and Coordinated Cluster Time (CCT) protocol.
NTP is one of the oldest Internet protocols in use and is intended to synchronize all participating computers to within a few milliseconds of Coordinated Universal Time (UTC). NTP uses a modified version of Marzullo's algorithm to select accurate time servers and is designed to mitigate the effects of variable network latency. NTP is a low-cost, purely software based solution whose accuracy mostly ranges from hundreds of microseconds to several milliseconds, which is often not sufficient.
On the other hand, IEEE 1588 PTP gives superior performance by achieving sub-microsecond or even nanosecond accuracy. However, it is relatively expensive as it requires special hardware support to achieve those accuracy levels and may not be fully compatible with legacy cluster systems.
More recently, new synchronization protocols have been proposed with the objective of balancing between accuracy and cost such as the CCT protocol. The CCT protocol is able to provide better performance than NTP without additional hardware. Its success is based on a skew estimation mechanism that progressively adapts the clock frequency without offset corrections. Another alternative is the Robust Absolute and Difference Clock Project (RADclock) protocol which decouples skew compensation from offset corrections by decomposing the clock into a high performance difference clock for measuring time differences and a less precise absolute clock that provides UTC time.
There are two major difficulties that make the network clock synchronization problem challenging. First, the frequency of hardware clocks is sensitive to temperature and is constantly varying. Second, the latency introduced by the operating system (OS) and network congestion delay results in errors in the time measurements. Thus, most protocols introduce different ways of estimating the frequency mismatch—referred to as “skew” and measuring the time difference—referred to as “offset”. This leads to extensive literature on skew estimation which suggests that explicit skew estimation is necessary for clock synchronization. However, focusing on skew estimation may be unnecessary.
An objective of the invention, therefore, is to provide a clock synchronization system and methods that is able to synchronize to any source of time without affecting the operation of running clocks on other nodes.