1. Field of the Invention
The invention relates to intercommunication networks for use in parallel processing systems and more particular to interconnection networks suited for use in massively parallel processing systems.
2. Description of the Prior Art
In general, distributed processing involves extending a processing load across a number of separate processors, with some type of interconnection scheme being used to couple all of the processors together in order to facilitate message passing and data sharing. Many variants of distributed processing architectures exist. Some entail use of only a relatively small number of interconnected processors, typically two and often less than ten, separate highly sophisticated central processing units as would be used in a traditional mainframe or super-mini-computer. These processors can be interconnected either directly through an interprocessor bus, or indirectly through a multi-ported shared memory. By contrast, massively parallel processing systems involve a relatively large number, often in the hundreds or even thousands, of separate microprocessor-based processing elements that are interconnected in a network by high speed switches in which each such processing element is at a separate node in the network. In operation, the network routes messages, typically in the form of packets, from any one of these nodes to another to provide communication therebetween. The present invention is directed to the manner of interconnecting the switches in such networks of massively parallel processing systems.
The overall performance of a massively parallel processing system can be heavily constrained by the performance of the underlying network. Generally speaking, if the network is too slow and particularly to the point of adversely affecting overall system throughout, it sharply reduces the attractiveness of using a massively parallel processing system.
Given the substantial number of processing elements that is generally used within a typical massively parallel processing system and the concomitant need for any one element in this system to communicate at any one time with any other such element, the network must also be able to simultaneously route a relatively large number of messages among the processing elements. One problem in communication is the lack of paths available to accomplish the efficient transfer between the nodes. This problem can be understood by reference to FIG. 1 which shows a prior art "2-D Mesh" network with sixteen nodes.
As shown in FIG. 1, the nodes 1 each contain a processor 2 and a switch 3. The nodes are arranged in rows and columns and are connected to each of the adjacent nodes by bi-directional connections 4. Communication between nodes 1 is through the bi-directional connections 4 and the switches 3 in the nodes. While communication between processors in adjacent nodes can be quick and efficient, communication to separated nodes must pass through switches in a number of intermediate nodes. For instance, when the node in the top row of the leftmost column communicates with the node in the bottom row in the rightmost column, it must pass through five intermediate nodes. With all nodes in the array communicating at the same time, there can be insufficient links 4 to communicate all messages at the same time. The situation can be best illustrated by the case where each node in the two lefthand columns 5 in FIG. 1 wants to communicate a message with a different node in the two righthand columns 6. There are only four bi-directional paths 4a to communicate eight messages in this sixteen node array. The problem is worse in arrays with more nodes. As a result, a 2D mesh array is said not to "scale" well.
A more scalable network is a folded butterfly variety of multistage network. This type of network decreases the number of nodes that must be traversed between the most distanced processing nodes and provides redundant paths between each of the processing nodes. Like the 2-D mesh network, each node is associated with switches. However, there is more than one processor at a node and the switch sets have two stages. For the simple two processors per node 10 array shown in FIG. 2, the switch sets 12 are made of two four way switches 14 and 16. This arrangement permits either processor to communicate with the other in the source node through the switch 14 and with processors in other nodes through connections between switch sets. The cross-coupling between switch sets gives rise to the "butterfly" in the term folded butterfly array while "folded" in that term comes from the fact that the last column of processor nodes wraps around and is connected to the first column.
While the connections between switches in the 2D mesh arrangement shown in FIG. 1 are in orderly rows and columns, the interconnections 20 between the switch sets in folded butterfly arrays can be chaotic. In the simple system illustrated in FIG. 2, the diagonal cross-coupling of the wiring between nodes appears manageable but in larger arrays with hundreds and even thousands of processors the diagonal cross couplings between switch sets has given rise to what has been referred to as the "ball of wires" problem. In such systems, the wiring during assembly and the tracing of connections during servicing can be daunting.