1. Field of the Invention
The present invention relates generally to the provision of a central reference for local time-of-day clocks in a computer complex, and more particularly to an improved method and apparatus for providing such a central reference for a complex of processors interconnected through a central switch via fiber optic links.
2. Description of the Related Art
In a computer system employing multiple processors and shared resources, a mechanism for synchronizing the local time-of-day (TOD) clocks and keeping them within a specified tolerance is essential. Reliable operation of such a computer system requires maintaining a log record of all accesses to shared resources in the correct order so that recovery can be effected in the event of a failure. The transactions are ordered in the log by means of a time stamp, and all the TOD clocks in the system must be synchronized closely enough so as to ensure consistency of the log. Lack of synchronization can also cause loss of integrity of shared databases and deadlocks.
Processors in a multiple processor complex have their own sources for the TOD clock. Although these clocks operate at the same nominal frequency, they may drift apart resulting in a significant accumulated error over time, unless they are synchronized periodically. A typical drift rate is 50 parts per million (ppm). The clocks of two computers in the complex may then deviate with respect to each other at a maximum rate of 100 ppm. At this rate, it takes 10 seconds for the clocks to be 1 millisecond apart. How often the clock needs to be synchronized depends on how much deviation can be tolerated between the individual computer systems in the complex.
Two distinct approaches to synchronization of TOD clocks have been proposed in the past. If synchronization is required only at a coarse granularity, for example once every 10 seconds, a distributed algorithm can be used. With one such distributed approach, the processors exchange messages containing the local TOD information at periodic intervals and update their clocks to correspond to the fastest clock in the system as closely as possible. The synchronization messages are sent over the same network that is used for data communication. Hence, the queuing delays in each node should be accounted for in calculating the frequency of updates as well as the correction factor to be applied. Since the queuing delays are difficult to estimate, the algorithm is useful only in a conflict-free network where every node is guaranteed access to the network over a time period. Further, because of the shared use of the links, a synchronization message may have to wait for a long period while a large block of data is being transferred. This large variance in communication time causes the granularity of clock synchronization to be large.
An alternate past proposal is to use dedicated links for passing the TOD information between systems allowing deterministic transmission delays. In one example of this approach, timing information is generated external to the processor complex by a reliable and accurate reference clock and distributed to individual processors in the complex over dedicated fiber optic links. The reference clock is duplicated for reliability. Synchronization of individual TOD clocks is achieved at two levels. At a low level, phase-locked loops (PLL's) at the computers are kept synchronized at the bit level to the time reference by transmitting a continuous signal on the links. At a higher level, messages containing TOD information are encoded into the signal stream; the messages are decoded by the individual processors and used to start or verify clock synchronization. Exemplary systems of this type are shown in the commonly assigned applications of T. B. Smith, U.S. Ser. No. 07/262,416, filed Oct. 25, 1988; now U.S. Pat. No. 5,146,585, issued Sep. 8, 1992 L. H. Appelbaum et al., U.S. Ser. No. 07/392,812, filed Aug. 11, 1989 now U.S. Pat. No. 5,249,206, issued Sep. 28, 1993; and W. A. Moorman et al., U.S. Ser. No. 07/537,389, filed Jun. 12, 1990 now U.S. Pat. No. 5,041,798, issued Aug. 20, 1991, the specifications of which are incorporated herein by reference.
This approach using dedicated links provides clock synchronization at a very fine granularity, but it is achieved at a high cost. It is necessary to install and maintain separate boxes for the timing reference as well as a dedicated network of fiber optic links for distribution of the clock information. However, it may not be necessary to verify synchronization frequently if a phase lock loop is used to continuously maintain low level synchronization. Without a requirement for frequent synchronization verification a dedicated link is not necessary.