Multi-stage switching networks are gaining acceptance as a means for interconnecting multiple devices within modern digital computing systems. In particular, in Parallel Processing (PP) Systems, it is common to use a multi-staged switching network to interconnect N system elements, where N can be several or thousands of processors or combinations of processors and other system elements.
The state-of-the-art switch approaches have many short comings which prevent them from being ideal for PP systems. They tend to be expensive, slow, non-expandable, hard to reconfigure, serial in nature, and, worst of all, have clocking problems. The clocking problems are ever increasing as digital computing systems continue to operate at higher and higher frequencies (usually doubling every several years). This is further complicated by increased system size and distance between system elements, especially in the field of PP systems. This presents an enormous clocking problem of how to keep various elements of the system synchronized for communication purposes. Several state-of-the-art methods have been employed somewhat successfully, but not without being cumbersome, risky, and leaving one wondering if they will work for the next increase in clock frequency or system size.
The state-of-the-art clocking methods include:
a) Distributing a central clock and carefully controlling the delay time to each individual element (processor, I/O device, and switch), so that a synchronized and aligned clock, tuned to within a small tolerance, arrives at each element. In addition, to enable various system elements to communicate without losing synchronization, it is necessary for all communication links to have a transfer time from one element to another which is less than one cycle time of the common clock.
b) Using the same central, synchronized clocking scheme as a), but further allowing the flexibility in the communication time between elements by permitting transfer times to be longer than one clock cycle time. Transfer time is allowed to be any multiple of the common clock cycle time and every connecting wire between elements is hand tuned by individually trimming each wire, so that it produces a delay which is a multiple of the common clock cycle within a specified tolerance.
c) Distributing a central clock and not controlling the delay time to each individual element; however, providing exactly the same frequency to each element. With this approach, each interconnecting wire can be of any length and does not have to be hand tuned. Synchronization between elements is established using a calibration method which is performed at fixed intervals of time with the very first calibration being performed each time power is applied to the system. Each communication link is individually calibrated one at a time by sending test messages over the link and varying the phase of the sending clock. After all possible sending clock phases have been tried, the clock that best suits the length of the individual cable is chosen to be used on a permanent basis (until the next recalibration exercise). The calibration of every interconnection wire in the system can be accomplished automatically, but it is cumbersome, time consuming, and possibly subject to drift over time and temperature.
d) Using two separate chips of different speeds; one to establish and control interconnections and the other to actually pass data between any two elements of the PP system. The control chip operates at a slower, more easily synchronized frequency, while the data chip provides the actual connection between elements when commanded to do so by the controller chip. The data chip can provide data transfer at a faster and even asynchronous rate. There are several disadvantages of this approach. The two chip requirement increases the number of interfaces and connections required making the system more expensive and more complex. Secondly, the selection being controlled remotely in a second chip is usually a slow and serial operation which defeats the important concept of performing operations in parallel, which is required in efficient switching network systems. In addition, the concept is difficult to expand beyond the two chips, so to provide interconnection between as few as sixty-four elements, usually serial data transfer is specified to keep the chip pin count to a reasonable size. Finally, the set-up time for establishing a switch connection via the control chip is usually the dominating and slower factor making the dynamic changing of switch connection almost impossible on a rapid basis. These types of switching networks are not applicable to PP systems, but are being used effectively in I/O areas where switch selections occur infrequently and large amounts of data are transferred over a path once it has been established by the control chip.
As an example of prior art attempts, representative patents are U.S. Pat. No. 4,307,446 issued Dec. 22, 1981; U.S. Pat. No. 4,314,233 issued Feb. 2, 1982; U.S. Pat. No. 4,481,623 issued Nov. 6, 1984; U.S. Pat. No. 4,518,960 issued May 21, 1985; U.S. Pat. No. 4,237,447 issued Dec. 2, 1980; U.S. Pat. No. 4,251,879 issued Feb. 17, 1981; and U.S. Pat. No. 4,307,378 issued Dec. 22, 1981. These patents were followed by U.S. Pat. No. 4,484,325, issued Nov. 20, 1984; U.S. Pat. No. 4,482,996 issued Nov. 13, 1984 and U.S. Pat. No. 4,475,188 issued on Oct. 2, 1984. In U.S. Pat. No. 4,475,188 it will be seen that there was suggested an arbitration switch which strips routing bits from a message, and with a priority of first come, first serve, connects multiple input lines to multiple output lines using self routing path selection followed by data in a clockless environment. As such this development is different from the more standard serial, crossbar or crosspoint switches which have been used. However, the patent requires handshaking and has a different arbitration method than we have found useful. It appears that should the device be used in typical situations it would be slower and slower with lQng cables. Its speed is not independent of cable length. This and other differences will appear upon a review of our improvements in which no buffering, queues, wait periods, or discontinuation of message transmittal is involved when resolving contention. As such, these earlier attempts do not provide the necessary switching network capabilities which are required. In this connection also, reference could also be had to U.S. Pat. No. 4,952,930 issued Aug. 28, 1990, and the references cited therein.
Thus, the state-of-the-art solutions do not provide the switching network characteristics required for modern and future PP systems. The characteristics that are required include the ability to dynamically and quickly establish and break element interconnections, to do it cheaply and easily in one chip, to have expandability to many thousands of elements, to permit any length, non-calibrated interconnection wire lengths, to solve the distributed clocking problems and allow future frequency increases, and to permit parallel establishment and data transmittal over N switching paths simultaneously.