This invention pertains to skew bound in synchronous computing structures, particularly to arbitrarily large computing structures with a constant skew bound.
The simplest way to control pipelined and concurrent computing structures, such as arrays, parallel processors, multiprocessors, and vector processors is to use a global clock to connect computation sequence to time. This method results in "synchronous" computations, and is widely used in many applications, including VLSI implementations. Many computing applications require a consistent view of time across all computing cells or processors, which can be achieved by using a global clock.
The problem of clock skew has long been recognized as a major obstacle to implementing very large synchronous systems--systems employing thousands, millions, or more individual computing units, or processing cells. Ideally, one clocking event should arrive at each affected cell simultaneously. However, a number of factors can cause variations in these arrival times: different threshold voltages, different signal propagation delays on wires due to their resistances and capacitances, different buffer delays, etc. Thus effective clock arrival times may vary from cell to cell, or even from component to component on a single chip. The term "clock skew" refers to the difference in arrival times of a single clocking event or clock pulse at different cells. For general background see Seitz, "System Timing," ch. 7 in Mead et al. (eds.), Introduction to VLSI Systems (1980), which is incorporated by reference.
Larger skews require slower clock rates for a synchronous system to operate properly. A general (but somewhat simplified) condition is that the clock period should be at least equal to the sum of (1) the time to distribute the clock signal, (2) the time to perform the computation, and (3) the maximum clock skew between any two cells which communicate directly with one another. To the knowledge of the inventor, in nearly all prior clock distribution schemes for synchronous systems, clock skew grows without bound as the system size grows--in all prior structures of two or more dimensions, and in nearly all prior one-dimensional arrays as well.
Several studies have estimated the effects of system size increase on synchronization and skew. The most optimistic estimate known to the inventor is found in Kugelmass and Steiglitz, "An Upper Bound on Expected Clock Skew in Synchronous Systems," IEEE Trans. Comput., vol C-39, no. 12, pp. 1475-1477 (December 1990), not admitted to be prior art. The statistical upper bound on skew was theoretically predicted to grow on the order of N1/4(log N)1/2, where N is the number of processors. To the knowledge of the inventor, no prior clock distribution scheme in a structure of more than one dimension predicts or guarantees a constant upper bound on clock skew, regardless of system size.
For very large systems, clock skew is a nontrivial problem which can significantly slow the clock rate which is otherwise possible, or which can render a synchronous discipline impractical. For a discussion of the practical significance of this problem, see Hoshino, "Pax Computer; High-Speed Parallel Processing and Scientific Computing," .sctn.8.3 (1985). Prior approaches to minimizing the effects of clock skew have included equalization of wire lengths, careful screening of off-the-shelf parts, symmetric design of the distribution network, design guidelines to reduce skew due to process variations, and digital phase adjustment.
One method which has been suggested for certain high-speed systems uses a number of delay lines to shift the phase of an incoming signal (at a utilization circuit) by different values. One of the delay line outputs is selected such that the signal arriving at the utilization circuitry has the proper phase to reduce sampling errors. See U.S. Pat. No. 4,700,347. The circuit described in this reference cannot, however, be used for clock distribution. One copy of the circuit is required for every signal arriving at the utilization circuitry with an unknown phase, which makes this method relatively complex and costly. Further, due to lack of synchronism between the incoming signal and the utilization circuitry, a metastable failure is possible.
Fisher and Kung, "Synchronizing Large VLSI Processor Arrays," Proc. 10th Ann. Intl. Symp. Comp. Arch., pp 54-58 (1983); which also appears in IEEE Trans. Comp., Vol. C-34, No. 8, pp. 734-740 (August 1985), investigated clock skew in large two-dimensional arrays clocked by an "H-tree" network. The H-tree structure places each computing cell at approximately the same physical distance from the source of the clock signal. Fisher and Kung concluded that constant skew bound could not be achieved in two-dimensional arrays clocked this way. They found that certain one-dimensional arrays could be clocked such that skew is bounded by a constant.
Alternative methods have been proposed for controlling very large computing structures--these alternative methods include asynchronous modes, hybrid synchronous/asynchronous modes, self-timing, handshake protocols, and analog phase-locked loops. These methods are generally more complex and more costly to implement than is a purely synchronous method. In addition, in most of these alternative methods there is always a finite probability of a metastable failure due to the inability to guarantee the safe timing requirements of latches and flip-flops. Moreover, the synchronous discipline, the most widely used discipline in digital systems, can potentially provide faster processing rates than can its alternatives. To the knowledge of the inventor, no previous work has successfully achieved constant skew bound in an arbitrarily large, synchronous, two- or higher-dimensional network.