This invention relates generally to the field of parallel processing computers and, more specifically, to interprocessor communication networks in distributed memory parallel processor computers.
One of several parallel processor computer architectures is the distributed memory, parallel processor computer (DMPPC). A DMPPC has processor-memory pairs called processing elements and information is transferred among the processing elements by the interprocessor communication network. A given problem is "parallelized" by being divided into concurrent tasks which are assigned to the processing elements. These tasks exchange information via the interprocessor communication network. The topology of a network refers to the identification of processing elements pairs between which information can be directly transferred. The primary purpose of the interprocessor communication network is efficient transportation of information between processors. Unless the transport time can be overlapped with the processing time, the transport time becomes computational overhead which degrades the computer's performance. Therefore, to maximize parallel processor performance, the time required to move information, usually in bit sequences called data packets, between processors must be minimized. For a single packet moving through a network, the travel time for the packet is proportional to the distance it travels from its source to its destination. Links are the direct communication channels between processor nodes. Links may allow packet transport either in one direction, i.e., unidirectional, or in both directions, i.e., bidirectional.
One of the enticements for parallel processing is that for bigger problems one can use more processors and, for a perfectly parallel problem, the execution time will be constant and relatively independent of problem size. One central issue is the difficulty of combining parallel processor computers having a smaller number of processors into larger computers. The difficult problem of expandability is exemplified by the hypercube system; trying to expand smaller hypercube systems into a single larger hypercube system. As a hypercube network grows, the distance that an information packet must travel grows by log.sub.2 (n) where n is the number of nodes in a network. But to accommodate the expansion, the design of every single node must be reconfigured. What this means is that communication ports would have to be added, allocation wires and address locations would have to be reconfigured; in essence, every aspect of how information packets move through a node from input port to output port would have to be modified in expansion.
The rate at which information can be transferred in a system is proportional to the distance over which the packet must travel and inversely proportional to the bandwidth of a link. For a realizable network, the bandwidth of a link is not independent of the network topology. The total I/O bandwidth of a network node is finite, for example, the bandwidth may be limited by the number of available pins on a chip multiplied by the transmission bandwidth through each pin. Neglecting the bandwidth between the node and its attached processing element(s) and assuming the bandwidths of all links are the same and fixed in time, i.e., the total node bandwidth is not dynamically reapportioned between links, the bandwidth of any one link is the total node bandwidth divided by the number of links attached to the node. Each direction of a link is counted separately. Therefore, as the number of links connected to a node increase, the link bandwidth decreases. Thus, the time required to transport data packets from one node to another significantly increases.
One approach to obtain the desired logarithmic distance properties without having the log.sub.2 (n) increase in the number of connections is to configure the number of communication ports per node independently of network size. But known systems with a fixed number of communication ports per node have only unidirectional flow of information packets. Unidirectional information flow is problematic because these systems are not easily fault tolerant. If there is a failure in the system to prevent or retard the flow of information, global reconfiguration and rerouting is required.
Information does not usually flow through a packet-switched network unimpeded. A conflict occurs when two packets desire to traverse the same output link from a switch point at the same time and is resolved by having one packet wait in a queue while the other packet traverses the link. The total time required to send a packet from its source to its destination is thus dependent on two factors, the transit time between the ports through the network and the number of conflicts the packet experiences.
When network traffic density is low, conflicts are less common and the packet transfer time is primarily a function of the distance between the source port and the destination port. When traffic density is high, the transfer time is primarily a function of the time the packets spend in queues at switch points waiting for routing through contested links.
Store-and-forward deadlock occurs in a packet-switched network when from among a group of packets, no packet has arrived at its destination and no packet can make the next hop toward its destination. A cycle of nodes exists where no node can accept a packet from the previous node in the cycle because no storage is available for a packet arriving at the node. Movement of packets through the links in the cycle is stopped, and without special procedures, it can never be restarted. Usually, deadlocking in part of the network leads to deadlocking in the entire network. Either the network design should avoid deadlocked states or it should facilitate restarting itself from a deadlocked state without loss of packet information.
Reliability and fault tolerance are two increasingly important issues in interprocessor communication network design. A network's fault tolerance is its ability to perform in the presence of component failures. As the number of network components, e.g., nodes and links increases, the probability of a component failure within the network also increases. For single stage networks with one port per network node, reducing the number of network links reduces the probability of having a component failure within the network. Unfortunately, reducing the number of components "at risk" does not necessarily make the network more robust. Assuming if the tasks assigned to its attached processing element can be reassigned to other processing elements when a node fails, a network fails when it can no longer move information between arbitrary pairs of nodes. The failure of two bidirectional links in the ring network causes the network to fail because the ring is converted into two disjoint line networks unable to pass information between them.
Networks may also be made fault tolerant by having redundant paths between source-destination node pairs so that information can move around failed network components. For fault tolerance to be implemented, the routing switching function must be able to generate auxiliary paths around the failed components.
Several reviews of proposed interprocessor communication network designs can be found in the references of Feng; Agrawal et al.; and Siegel. The hypercube networks, the Hypertree network, the perfect shuffle network, and the Illiac IV network, are all network topologies that fall into the taxonomic class of single stage, nonreconfigurable, packet-switched networks.
Also in that taxonomic class, the topologies of the indirect binary n-cube network, the SW-banyan network, and the loop-structured switching network all are similar. The MAN-YO network of Koike and Ohmoir also has a similar topology, but allows only one-directional flow of information. Each node can receive information from two nodes and send information to two different nodes. The CBAR network of Balasubramanian and Bannerjee also allows only one-direction information flow, but its topology is slight different because the inverse perfect shuffle of links does not occur below the last stage. Wong and Ito's Loop Structured Switching Network is very similar to the MAN-YO network, but has two processing elements attached to each network node. Each processing element's input and output port are split across the node to allow the two processing elements attached to the node to exchange information through the node. In addition, Wong and Ito specifically consider problems of deadlock avoidance. The unidirectional systems are not fault tolerant. As discussed, if there is a failure in the system, the information packet must be rerouted through the network again, often through a global controller, and if there is a complete break in a link or other component, the packet will be unable to get to a specific destination processor.
An example of a interconnection network having unidirectional flow and a global controller is given in U.S. Pat. No. 4,811,210 to McAuley. McAuley teaches a network which uses one stage of a butterfly interconnection network to connect N/2 processors to two N/2 by N/2 optical crossbar switches requiring global control of the crossbar switches. With global control, an omniscient controller sets the individual switch points to provide a path between the information source and destination. With global control the network does not require a regular topological structure because the omniscient controller can always discern the shortest path between any two ports. Unfortunately, for larger networks with heavy information traffic, the controller must be omnipotent as well as omniscient.
Two-directional information flow networks include the lens network and the cube-connected cycles network. The lens network of Finkel and Solomon uses buses to connect sets of processing elements which are nondirect paths between processing elements. Thus, the throughput or traffic density is lower and has arbitration for a shared resource, thereby requiring additional logic and more hardware. There are typically three processors attached to each bus. In the cube-connected cycles network of Preparata and Vuillemin, each network node is connected to three other nodes by bidirectional links, and if pairs of nodes are considered, each pair has four bidirectional communication links. But, these networks don't address routing, packet movement, conflict resolution, deadlock avoidance.
With respect to expandability, some networks, such as the ring and shuffle-exchange networks, are incrementally expandable, while others must be expanded in fixed amounts or by fixed factors, for example the hypercube must be expanded by a factor of two. In addition, some networks require substantial modifications to each node to expand. Again, the hypercube requires an additional bidirectional link at each node to increase to the next larger network size. Other networks, such as the shuffle-exchange, require significant rewiring between nodes as the network expands.
Also, some networks such as the hypercube may require preprocessing of a problem to configure the problem for the machine, a process called proximal mapping. Unfortunately, the general mapping problem that maps problems onto processing elements so that computationally adjacent tasks which exchange information are assigned to proximal processing elements, has been shown to be NP-complete, i.e., the solution time of the problem increases exponentially to the number of pieces in the problem and even when solved the generated mapping is only effective for problems in which the pairs of communicating tasks are known a priori and do not change in time.
Several network topologies shown as prior art in FIG. 1 exemplify the range of interprocessor distance possibilities when using bidirectional communication links between network nodes. In the ring network, every network node is connected to two other neighbors or network nodes to form the ting. The maximum and mean distance that a packet must travel in a ring network given an odd number n nodes is n/2 and (n+1)/4, respectively. In the edge-wrapped 4-connected mesh network, also called a toroidal network, each network node is connected to four other network nodes in a square mesh pattern. Nodes along the edges are connected to nodes on the opposite edge. For n nodes, where .sqroot.n is odd, the maximum and mean transit distance is 2 .sqroot.n/2 and (.sqroot.n+1)/2, respectively. For the n-dimensional binary hypercube, each of the 2.sup.n nodes is connected to n other nodes, and the mean and maximum transit distance is nlog.sub.2 (n)/2(n-1) and log.sub.2 (n). The connections represent the edges of a binary hypercube and the addresses of two directly connected nodes differ in only a single bit. In a hypercube, having (log n) distance, each node must have the number of neighbors equal to the log.sub.2 of the number of processing elements, n. For example, if the hypercube network has sixteen processing elements, then each node will have four neighbors, and if the hypercube network has 256 processing elements, then each node has eight neighbors. In the fully connected network, every node is connected directly to every other node. The mean and maximum distances between network nodes for these networks are 1. For a network with n nodes, these distances range from (n) to (1).