Parallel computing systems consist of a plurality of processors that communicate via an interconnection network. One popular network for providing the interconnection for a plurality of processors is the torus network.
Torus switches and networks are described by William J. Dally in his paper "Performance Analysis of k-ary N-cube Interconnection networks" published in the IEEE "Transactions on Computers"; Vol. 39 #6, June 1990, pg 775. Torus switches have interesting characteristics for providing the interconnection media for parallel processing elements, but their performance is very poor requiring usually in the order of 50 microseconds to establish a connection link between any two nodes of the parallel system. The connection times required in the modern parallel arena is in the order of nanoseconds, not microseconds.
The torus switch has interesting characteristics, such as the ability to avoid the necessity for a centralized switching element. The torus can be completely distributed such that all portions of the switching element associated with a given node can physically reside at the node. This gives both excellent fault tolerant and modular expansion capabilities. Therefore, it is a meaningful goal to try overcome the deficiencies of present torus implementations for the purpose of providing a powerful modern switching apparatus.
Many state-of-the-art switch solutions do not provide the switching network characteristics and low-latency concepts required for modern interconnect systems. The characteristics that are required include the ability to dynamically and quickly establish and break element interconnection, to do it cheaply and easily in one chip, to have expandablity to many thousands of elements, to permit any length, non-calibrated interconnection wire lengths, to solve the distributed clocking problems and allow future frequency increases, and to permit parallel establishment and data transmittal over N switching paths simultaneously.
One switch which does provide the distributed and fully parallel interconnect properties required by modern parallel systems is the ALLNODE Switch (Asynchronous, Low Latency, inter-NODE switch), which is disclosed in U.S. Ser. No. 07/677,543. The Allnode switch as disclosed in U.S. Ser. No. 07/677,543 is a multistage network switch which is not directly applicable to the torus network. The Allnode switch approach is applied to the torus switch in the present invention to bring the low latency and high bandwidth concepts to the torus network. The ALLNODE switch provides a circuit switching capability at high bandwidths, and includes distributed switch path connection set-up and tear-down controls individually within each switch--thus providing parallel set-up, low latency, and elimination of central point failures. We will further describe in the detailed description the adaption of the ALLNODE switch as the parent disclosure to the present invention.
In research report No. AAA92A000704, IBM Research in San Jose, Calif. reported on "A Theory of Wormhole Routing in Parallel Computers" by P. Raghavan and E. Upfal in December, 1991. Raghavan and Upfal claim the the current trend in multicomputer architecture and torus networks is to use wormhole routing. In wormhole routing a message is transmitted as a continuous stream of bits, physically occupying a sequence of nodes/edges in the network. Thus, a message resembles a worm burrowing through the network without interrupting the local node processors. Theoretical analyses of simple wormhole routing algorithms have shown them to be nearly optimal for torus networks. We will further describe in the detailed description the adaption of the present invention to wormhole routing for optimal transfers.
Pin count providing input/output connection to a functional design is always a fundamental constraint when it comes to packaging switches on real chips. The signal pin count for the disclosed torus is (w+c) *(2*D+1)* 2, where w is the width of the data path, c is the number of control lines, and D is the dimensionality of the switch. It can be seen from the equation that for a given number of available pins, the local connectivity of the chip (expressed by the width of the data path (w) can be varied in regards to the dimensionality of the switch (D) to provide various other switch options. The most noteable work in this field was performed by William J. Dally in his paper "Performance Analysis of k-ary N-cube Interconnection networks" published in the IEEE "Transactions on Computers"; Vol. 39 #6, June 1990, pg 775. This is the fundamental paper that took parallel computers from hypercubes (high D, low w) to two-dimensional toruses (low D, high w). It was Dally's PhD research for Chuck Seitz that led Intel to select the torus for use in its parallel machines; however, Intel's torus switches are high latency solutions that require orders of magnitude more set-up time than proposed by the apparatus and approach in this disclosure.
An interesting approach for improving the performance of a circuit switched network is the ability to try multiple paths of a network simultaneously, and to allow the first path to reach the destination to be the winner. This concept was pioneered by Joel Gould et al. for the torus switch and given the name of "Flash-Flooding" in IBM Docket No. SA889-042. A variation of the flash-flooding concept called multipath transmission is applied in this disclosure which is different from the original "Flash-Flooding" concept. "Flash-Flooding" decisions require intelligence in tile individual torus switching units and cause additional overhead and latency; whereas, the multipath decisions of the present invention are made by the sending node and do not require additional intelligence, overhead, or latency at each individual torus switching unit.