A number of parallel processing computer systems and message routing techniques are well known in the prior art. Generally, in such parallel processing systems, a large number of processors are interconnected in a network. In such networks each of the processors may execute instructions in parallel and may transfer messages to other processors in the network.
U.S. Pat. No. 4,598,400, invented by Hillis, describes an n-dimensional parallel processing computer system in which an array of nodes are interconnected in a pattern of two or more dimensions. Communication between the nodes is directed by addresses indicating displacement of the nodes. Hillis specifically discloses a system in which a message packet may be routed from one node to another in an n-dimensional network. The message packet comprises relative address information and information to be communicated between the nodes.
Many known parallel processing computer systems utilize a store-and-forward mechanism for communicating messages from one node to another. The Hillis system describes such a store-and-forward mechanism. Such store-and-forward mechanisms are more clearly described in Parviz Kermani and Leonard Kleinrock, Virtual Cut-Through: A new Computer Communication Switching Technique, Computer Networks, Vol. 3, 1979, pp. 267-286. Kermani et al. distinguishes store-and-forward systems from circuit switching systems. Specifically, a circuit switching system is described as a system in which a complete route for communication between two nodes is set up before communication begins. The communication route is then tied up during the entire period of communication between the two nodes. In store-and-forward (or message) switching systems, messages are routed to a destination node without establishing a route beforehand. In such systems, the route is established dynamically during communication of the message, generally based on address information in the message. Generally, messages are stored at intermediate nodes before being forwarded to a selected next node. Kermani et al. further discusses the idea of packet switching systems. A packet switching system recognizes improved utilization of resources and reduction of network delay may be realized in some network systems by dividing a message into smaller units termed packets. In such systems, each packet (instead of message) carries its own addressing information.
Kermani et al. observes that extra delay is incurred in known systems because a message (or packet) is not permitted to be transmitted from one node to the next before the message is completely received. Therefore, Kermani et al. discloses an idea termed "virtual cut-through" for establishing a communication route. The virtual cut-through system is a hybrid of circuit switching and packet switching techniques in which a message may begin transmission on an outgoing channel upon receipt of routing information in the message packet and selection of an outgoing channel. This system leads to throughput times exactly the same as in a store-and-forward system when all intermediate channels are busy. When all intermediate nodes are idle, this system leads to throughput times similar to a circuit switched system. However, the system disclosed by Kermani et al. still requires sufficient buffering to allow an entire message to be stored at each node when all channels are busy.
W. J. Dally, A VLSI Architecture for Concurrent Data Structures, Ph.D Thesis, Department of Computer Science, California Institute of Technology, Technical Report 5209, March 1986, discusses a message-passing concurrent architecture to achieve a reduced message passing latency. In Chapter 3, Dally discusses a balanced binary n-cube architecture.
In Chapter 5, Dally discusses an application for reducing message latency. In general, Dally discloses use of a wormhole routing method, rather than a store-and-forward method. A wormhole routing method is characterized by a node beginning to forward each byte of a message to the next node as the bytes of the message arrive, rather than waiting for the next arrival of the entire packet before beginning transmission to the next node. Wormhole routing thus results in message latency, which is the sum of two terms, one of which depends on the message length L and the other of which depends on the number of communication channels traversed D. Store-and-forward routing yields latency depending upon the product of L and D. (See Dally at page 153).
A further advantage of a wormhole routing method is that communications do not use up the available memory of intermediate nodes. In the Dally system, packets do not interact with the processor or memory of intermediate nodes along the route, but rather remain strictly within a routing chip network until they reach their destination.
Dally at pages 154-157 further discloses a message packet comprising relative X and Y address fields, a variable size data field comprising a plurality of non-zero data bytes and a tail byte.
The wormhole routing method alone does not guarantee a deadlock-free routing system. The e.sup.3 algorithm discussed in Dally provides a deadlock-free routing; however, Dally does not provide an adaptive routing system.
Researchers have been motivated to develop adaptive routing techniques in order to alleviate the congestion that is characteristic in heavily loaded multi-dimensional networks and to make more efficient use of channels between nodes. Adaptive routing can also exploit the multiplicity of processors to achieve some level of fault-tolerance in multiprocessor or parallel processing computer systems. These adaptive routing techniques allow messages to dynamically take alternate routes when busy or inoperative channels are encountered; however, message packet buffers are typically required in the intermediate routing elements in order to avoid both deadlock and livelock. These buffers add considerably to the complexity of the implementations and can add significantly to the latency of the routing systems even in the absence of congestion. The adaptive routing technique of the present invention is proven both deadlock and livelock free without the need for packet buffering in the intermediate routing elements.
It is therefore an objective of the present invention to develop an improved method of communication between nodes in a multi-dimensional network.
As another it is desired to develop a parallel processing computer system having reduced message passing latency.
As another object of the present invention, it is desired to develop a system which efficiently and adaptively passes messages without requiring buffering of message packets at each node.