1. Field of the Invention
This invention relates to a fault-tolerant clock for use in a multicomputer complex, especially one in which the individual computers are separated by distances that are comparable to the clock wavelength.
2. Description of the Related Art
Multicomputer complexes having computers that are widely separated (e.g., by distances of up to several kilometers) are relatively common in the art. The signal propagation delay between individual computers typically exceeds the period of the high-frequency time-of-day (TOD) clocks which are used by the computers to measure time intervals and to time-stamp events. TOD clocks for high-performance computers are typically driven at a frequency of several tens of megahertz. Accordingly, maintaining absolute synchronism between such high-frequency clocks is difficult, and often not attempted. It is nevertheless desirable, however, for such a multicomputer complex to have synchronized lower-frequency clock signals for such purposes as cross-system time-stamping and the like. Such clock signals may have frequencies on the order of several kilohertz.
What are particularly desirable are TOD clocks that accomplish both objectives simultaneously--that is, clocks that have a high resolution for internal time-stamping purposes (i.e., within a particular computer of the complex), but are also synchronized on a coarser scale with other clocks in the complex for cross-system time-stamping. This may be implemented by providing each computer with a slave TOD clock which is phase-locked to a system-wide clock which runs at a submultiple of the TOD clock frequency.
Synchronized clock signals of the type referred to above are typically implemented by providing one or more clock sources at each computer location and phase-locking each clock source to a consensus signal derived from the other clock sources. Systems containing 3f+1 such mutually coupled clock sources are capable of tolerating f individual points of failure, and are disclosed in such references as Fletcher et al. U.S. Pat. No. 3,900,741, Smith et al. U.S. Pat. No. 4,239,982, and the copending application of applicant T. B. Smith, Ser. No. 262,416, filed Oct. 25, 1988, entitled "Synchronized Fault Tolerant Clocks for Multiprocessor Systems", owned by the assignee of this application.
Of particular interest are quad oscillator systems, consisting of four mutually coupled clock sources, which tolerate any single point of failure. Thus, the Smith application referred to above discloses a quad system in which two clock sources are associated with each of two physically separated clock sites. The clock sites are physically separated to reduce the likelihood that a common source of failure will affect both sites simultaneously. Each computer receives two synchronized TOD clock signals, one from a clock source at each site, so that it may continue to receive a TOD clock signal for such purposes as time stamping even if one of the clock sources or clock sites should fail.
As noted above, the cross-system TOD clock synchronization signals have a much lower frequency than the high-frequency clocks used to drive the high-resolution TOD clocks of the separate computers. However, synchronization of the redundant clock sources can still be a problem if improved synchronization accuracy is desired and the clock sites are widely separated, owing to variations in propagation delay between the two sites. For example, if the separation between the two clock sites is changed from 1 km to 3 km, as may happen if one of the clock sites is relocated, the propagation delay between the two sites is correspondingly changed from about 5 to about 15 microseconds, a difference of about 10 microseconds. This figure does not include such other contributions to variations in delay as might be introduced by the encoder-decoder logic used to transmit the clock signals between the two locations. Moreover, the propagation speed through such dielectric media as fiber optic cables may itself vary and be source of propagation delay variations, even if the distance between the two sites remains unchanged.
Because of such propagation delay uncertainties, clock sources at widely separated locations, even though in phase, may observe leading or lagging clock signals from other sources, owing to uncompensated delays, and may therefore speed up or slow down in a mistaken attempt to match the phase of either a single distant clock signal or a consensus signal derived in part from such signal or signals. As a result, the clock sources may exhibit a significant phase skew relative to one another, on the order of many microseconds, or may shift in frequency, possibly beyond the maximum system design limits, and as a result break lock altogether.
The copending application referred to above discloses the use of a static delay element to delay the local clock signal to compensate for the propagation delay of the distant clock signal. However, such a static delay compensation would have to be selected or adjusted for a particular installation and would have to be readjusted if, as is common, the link path between the two locations were changed. Even then, the static delay element would not readjust itself for such variations in propagation delay as may arise from temperature variations or the like. Further, this would not be a satisfactory expedient in a system where more than two clock sites are used, since the required delay may vary for each pair of sites.