In the telecommunications industry and the computer industry, leading manufacturers are continuing to develop equipment designs employing increased clock rates while simultaneously pursuing architectures with hundreds and even thousands of distributed machine elements. These elements may be, for example, periphery interface modules, time-switch modules in a digital switching machine, individual processor elements in a "connectionist" type machine, pipeline floating point logic units in an array processing supercomputer, or systolic array processors for signal processing or radar applications.
For maximum performance and efficiency in these applications, each equipment module requires a clock signal that is phase-synchronous with the clock in every other module. The clocking of every clement of the machine synchronously at the highest possible rate provides the best chance of approaching system speeds equal to the switching speeds of individual logic elements of the technology employed.
Whether the processing modules involved are circuit packs, meters apart running at 50 Mhz, or subcircuits of a wafer-scale VLSI system, millimeters apart running at GHz, the basic problem is that of "clock distribution" to a large number of state devices distributed over a distance where propagation delays are a significant fraction of the clock period, extending even to multiples of the clock period.
The conventional engineering approach to clock distribution is hierarchical with a tree of increasing fanout at each stage. In this method, a central clock source is distributed either by electrical or optical transmission media though a tree-like structure with each device to be synchronized terminating one leaf of the tree. Intermediate branches of the tree buffer and split the signal incoming to them into a larger number of copies of the signal with which they drive subsequent branches in the tree. The limitations and problems of this approach are well-known but to date have been dealt with through careful and conservative design. Some of the undesirable problems of hierarchical clock distribution are:
1. A significant fraction of total system power consumption can be dissipated in the many clock-driver buffers and transmission lines present in total throughout the system. PA0 2. High pinout count and extensive track layout exist. Usually, balanced transmission is necessary requiring two pins at each end and two tracks per clock signal. Impedance controlled track layout may be necessary, often requiring expensive design iterations. High pincount impacts cost, size and MTBF of the equipment design. PA0 3. For large fanout, with any given technology, the number of hierarchical levels required increases as log.sub.n (N) where n is the fanout per stage and N is the total system population. Each expansion stage is the source of increased clock skew. PA0 4. In-service growth of a system using hierarchical clock distribution can be limited or impossible unless initial provision was adequately made within the clock distribution tree to accommodate new modules without exceeding wiring limits, power limits or maximum skew limits. PA0 5. Hierarchical electrical clock distribution makes a machine design particularly apt to emission at the clock rate or its harmonics that may exceed FCC requirements. If so, expensive redesign or shielding may be required. In addition, long electrical clock distribution paths sometimes lead to electromagnetic susceptibility problems. PA0 injecting reference pulses at a predetermined frequency into an injection site of the reference path such that the pulses travel along the outgoing path to a remote site and return to the injection site along the return path; PA0 determining, at each application module, the time interval for each reference pulse to travel from the predetermined site to the corresponding site associated with the application module; PA0 monitoring at each application module the elapsed time interval for each reference pulse to travel between the predetermined and corresponding sites associated with the application module; PA0 repeatedly producing, at each application module, a local phase reference signal when the elapsed time interval is one-half a last determined one of the time intervals; and PA0 synchronizing the clock of each application module with the local phase reference signal.
Accordingly, present system designs that use hierarchical clock distribution rely critically on the control of wiring lengths and track layout and, ultimately on the limiting of clock rates, as the means to minimize clock skew and to obtain adequate margins against the remaining clock timing variations due to varying loads on fanout devices and device-to-device speed variations in the various branches of the clock distribution tree. The clock skew in such systems often limits the usable clock rate to well below the clock rate that would be usable if the system were limited by the operating times of the logic circuits themselves.
Recently the problem of low-skew high fanout clock distribution has also been treated as a limiting factor on the speed of operation of VLSI circuits.
With conventional clock distribution systems on VLSI, the high fanout clock drivers consume significant circuit area and power and the clock distribution lines that they drive require well isolated low resistance tracks to avoid crosstalk and to control clock signal loading. Such clock distribution lines again consume significant circuit area, particularly if all track lengths are to be equal for minimal skew. Holograms for clock distribution have been proposed but many development problems including mechanical stability over time and temperature must be solved to the level required to suit coherent optical techniques before this approach could be viable.
The specific synchronization problem addressed here is to be distinguished from some related problems and methods in the area of distributed hierarchical timing control. There are several schemes which use similar terminology but actually address only the problem of distributed frequency lock without a requirement for the control of absolute phase. It is to be noted that the terms "absolute" time and phase used herein are with respect to an imaginary perfect clock in the same rest frame as the entire distributed system under consideration. Perfect Synchronization conceptually means that if one could view every clock in the system from one point without the speed of light delay in observation, every clock would appear in step. In practice, perfect synchronization means that if every system clock were connected to an oscilloscope through probes of precisely equal delay, then each trace on the scope would align and each clock taken individually will show the identical phase with respect to one designated master clock.
The phase synchronism of a group of clock signals at spatially distributed locations is characterized by the skew of the system. Skew is defined as the absolute value of the maximum variations, over a distributed phase-synchronous system, of the time at which the active edge of the distributed clock makes its transition at each of the locations requiring the clock signal.
Loop-timing of remote equipment communicating with a telecommunications central office digital switch is a common application in which it is desired that a given channel bank (or other interface equipment) will sample and multiplex the speech waveforms (or data) at its site with the same 8 KHz frequency that is used at the central office. This method avoids speech sample "slips and repeats" that occur if the channel bank were to freerun on its own 8 KHz frame rate. "Loop timed synchronization" is achieved when the remote equipment derives its internal sampling frequency for the high speed bit-timing information received from the transmission signal received by that equipment from the central office.
This loop timing does not address the aforementioned problems because it does not control the absolute phase of the synchronized equipment clock with respect to any other equipment that is similarly loop timed from the same source. If a large number of devices were loop timed from one central hub by this method, their internal clocks would be of the same frequency but the phase skew of this system would be uncontrolled unless the propagation delay to each site was equalized, in which case, this reduces to hierarchal timing distribution using clock extraction from the data signal in place of direct clock distribution.
In telecommunications, it is known to synchronize central office switching machines in order to provide multi-trunk transmission without introducing slips or repeats through digital switching. This is a distributed synchronization situation, but, once again, requires frequency synchronization only and is addressed by a variation of hierarchical clock distribution from one or more central references via loop timing of subordinate central offices, downwards in a tree.
The reviewer in this area may also find reference to "mutual synchronization" schemes in which two or more central office clock controllers exchange phase wander (low frequency drift) information measured between their own clocks and the clocks received from their neighbours. The mutually-synchronizing offices repeatedly average the error values fed back from their neighbours and adjust the frequency of their own clocks slightly in accordance with the phase drift rates with respect to their neighbours. In this manner, a network of central offices can become mutually frequency synchronized. They may further be locked to the wider network by injecting a master reference at a designated site. Once again, however, this method achieves frequency lock amongst a number of distributed sites, but does not control the phase of the clocks at each site. For all phases to be nominally equal at equilibrium of the mutual synchronization net, the propagation delay between all nodes would once again have to be controlled.
Another class of synchronization problems involves distributed synchronization amongst a network of devices for the purpose of coordinated (contention free) access to a shared transmission medium. Canadian Patent No. 1,158,739 entitled "Distributed Synchronization System" is an example of such a system. According to this patent, a number of active communicating devices are attached to a linear bi-directional trunk and branch transmission topology (the CATV network topology to be precise) and employs time division burst multiplexing for communications between the distributed stations and a head-end master site. The head-end controller transmits toward all sites in a continuous TDM format from which all remote sites extract their bit clock, the TDM frame timing at their location, and the payload data for the location. However, a different method is required for upstream transmission and bunt mode TDM is used.
To avoid collisions when the remote sites transmit upstream, the controller maintains a coarse form of synchronization amongst the dependant stations through a continuously active process of adaptive delay adjustment. The method proceeds as follows: The controller times the interval between sending its message and receiving a response as a means to deduce the time-of-flight delay from the site to the head-end. The controller then downloads a delay-adjust value to that remote unit so that, in conjunction with the downstream TDM frame reference, transmissions from that site are timed to avoid collision. When all remote sites are so delay adjusted, an upstream synchronization order is established.
Although there may appear superficial similarities between this scheme and the present invention, this scheme is different in both method and objective from the present invention. First it does not seek to attain, nor does it attain, phase synchronism of the high speed clock at the distributed sites nor does it seek to attain distributed frame synchronization to a timing accuracy that would permit the local generation of a frequency/phase locked high speed clock. The scheme only achieves a level of synchronization necessary to coordinate transmission bursts without collision. Several whole-bit guard bands of time are still required at the start and end of each burst. Second, continuous active control is required to continually update an adjustment to each remote site with at least three message transactions per update. The system crashes if this polling/update processor fails or falls behind. As will become clearer later, the method of the present invention involves no messaging, no central control and no computer processor of any type. Third, the latter scheme distributes the master-rate clock directly in addition to messaging to coordinate time-of-flight measurement and compensation. The present invention has neither of these requirements. Fourth, the latter scheme performs its intended function in a trunk and branch (CATV-type) architecture. The present invention is considerably simpler but is intended only for operation with a synchronization trunk transmission layout that has no branch stubs. The inherent round-trip delay-halving mechanism of The present invention only works as intended if all synchronized points are on the main trunk of the synchronizing paths.