The present invention relates generally to multiprocessor computer systems and, more particularly, to a multinode multiprocessor system with distributed local clocks.
Multinode multiprocessor computer systems typically have multiple processors in each node. The nodes are connected together through a system interconnect to facilitate communication between the processors. In some applications, the nodes may be divided into physical partitions, or domains, wherein each physical partition is capable of operating as a separate computer. Typically, the processors on the nodes need access to a system clock to determine the time that events start, stop, timeout, etc. For example, as part of a TCP/IP protocol, processors must measure a roundtrip time for TCP/IP packets to travel between source and destination computers. Another example is the running of a debugging application that places timestamps on events and stores the timestamps in a log file. In such debugging applications, the exact time and sequence of events is important. Because different processors on different nodes store timestamps in the log file, it is important that all the processors have access to a common time base. If the processors access different clocks and those clocks are not synchronized, the timestamps would be meaningless and events would appear erroneously out of order.
The simplest mechanism for providing a common time base in a multinode system is a single system clock accessible by all processors. However, the latency to access such a clock is high and unpredictable. As a result, the clock value read by a processor may be inaccurate. Some applications have provided a local clock on each node that is accessible to each processor or set of processors and that can be read with low and predictable latency. Of course, for such a system to properly operate, all of the local clocks must be synchronized. One synchronization technique is to have all the local clocks operate from a single oscillator source. To synchronize the clocks, they are reset together. The clocks then remain in lock step with each other because they operate from the single oscillator source.
In a multinode computer system, it is often desirable to dynamically add a node or modify a partition after the local clocks are reset. Such a change allows the system to dynamically modify processor resources to maximize processor efficiency. However, after a node is dynamically added or a partition modified, it is not acceptable to reset the local clocks in nodes that are already running. For example, a node may be executing a TCP/IP transfer, and resetting the local clock would result in an inaccurate time measurement for packet transfer.
An objective of the invention, therefore, is to provide a distributed clock synchronization system wherein a local clock can be synchronized without affecting the operation of running clocks on other nodes.
The present invention provides a multinode computer system with distributed local clocks wherein a local clock may be synchronized with other clocks in the system. The synchronization may occur while nodes are fully operational without resetting, stopping, or affecting the local clocks on the fully operational nodes. This synchronization allows for dynamic partitioning wherein processor resources may be modified during operation of the computer system. For example, a node may be added to the system while the system is running and a local clock on the added node may be synchronized to other clocks in the system without affecting the operation of the other clocks.
In one aspect, a local clock to be synchronized is reset and counts an elapsed time since the reset. Substantially simultaneously with resetting the local clock, a clock value is stored from a clock on a source node, which can be any other node in the system. The clock value read from the source node is copied to the node to be synchronized and added to the elapsed time. The resulting summation is then stored in the local clock so that the local clock is synchronized to the clock on the source node.
In another aspect, a local clock includes a dynamic portion and a base portion and an adder adds the two portions together to generate an output of the local clock. For a node being synchronized, the dynamic portion is reset and counts an elapsed time while the base portion is loaded with a clock value copied from the source node.
In yet another aspect, a clock register stores both dynamic and base portions. For a node being synchronized, the clock register is reset and allowed to count an elapsed time. A clock value from a source node is then added to the clock register and the resulting summation is stored in the clock register.