Large-scale shared memory multi-processor computer systems typically have a large number of processing nodes, e.g., with one or more microprocessors and local memory, that cooperate to perform a common task. Such systems often use some type of synchronization construct, e.g., barrier variables or spin locks, to ensure that all executing threads maintain certain program invariants. For example, such computer systems may have some number of nodes that cooperate to multiply a large matrix. To do this in a rapid and efficient manner, such computer systems typically divide the task into discrete parts that are executed by one of the nodes. All of the nodes are synchronized, however, so that they concurrently execute their corresponding steps of the task.
The necessary synchronization in a multi-node system often involves a real time clock (RTC) signal value that the nodes synchronize with so that they can operate synchronously. For example, such a global RTC signal value may be useful for generating various interrupts, network throttle triggers, time stamps for error events, etc. Typically, existing systems distribute RTC signal values using dedicated cable wires, but dedicated RTC wires may waste significant amounts of network bandwidth as network link frequency increases. Prior clock distribution systems usually relied on fanning out a single clock signal value, e.g., using a spanning tree, which means that a single node failure could cause the entire RTC system to go down. In addition, a static RTC distributing tree is usually configured by software for a given system, which generally needs to be reconfigured whenever there is a change in the system. This results in significant down time for the system when such events occur.