1. Field of the Invention
This invention relates to deadlock prevention in richly-connected multiprocessor computer systems and more specifically to virtual channel techniques to prevent deadlock in large multi-node computing systems interconnected by complicated topologies, including but not limited to Kautz and de Bruijn topologies.
2. Description of the Related Art
Massively parallel computing systems have been proposed for scientific computing and other compute-intensive applications. The computing system typically includes many nodes, and each node may contain several processors. Various forms of interconnect topologies have been proposed to connect the nodes, including Hypercube topologies, butterfly and omega networks, tori of various dimensions, fat trees, and random networks.
One of the problems encountered in building computer systems with complex, richly-connected communication networks is deadlock. Deadlock is the condition or situation in which actions are mutually blocked from progress because they are waiting for some form of resource. That is, some form of cycle of resource dependency exists that cannot be satisfied.
FIG. 1 depicts a simple communications system to illustrate the deadlock problem. In FIG. 1, the system has three nodes, 102, 104, and 106, each with a communication buffer, 108, 110, and 112 respectively. The buffers are resources the nodes need to send, or receive, data to, or from, a connected node (as suggested with links 114, 116, and 118). For node 102 to send its data forward from buffer 108 to buffer 110 in node 104, buffer 110 must be empty, free, or available. In this example, however, there is a cycle of resource dependency. That is, each node needs the adjacent node's buffer to be empty so that it can transmit data to an empty buffer (and not overwrite data in the buffer prematurely). Since a cycle exists from node 102, 104, and 106 and back to 102, the potential exists that no node will be able to send traffic forward, and each node will be waiting for the adjacent node's buffer to empty. FIG. 1 thus shows a system susceptible to deadlock.
Many communication architectures and protocols, including TCP/IP and the global telephone network, address the deadlock issue by discarding traffic whenever a potential deadlock situation arises. This approach, however, imposes a substantial cost in any higher-level protocol which aims to be reliable, because it must recognize and recover from the loss of message traffic. It is therefore very desirable to provide guarantees against deadlock, as opposed to detection of such and then discarding of traffic.
Another well-known strategy for deadlock prevention is the use of “virtual channels,” as described by Dally and Seitz. W. J. Dally and C. L. Seitz. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Transactions on Computers, C-36(5): 547-553, 1987. This approach assigns a separate set of buffers for traffic on each virtual channel, and structures the flow of data through those virtual channels such that there is no cycle of dependency among the channels. By preventing cycles, this strategy ensures that all traffic is able to leave the network, and therefore that deadlock does not occur. Virtual channel assignment is constant. A communication travels on its route using the same virtual channel assignment throughout. Typically buffer resources are mapped and fixed to virtual channel assignments.
Another potential problem in richly-connected networks is the issue of livelock or starvation. In this situation, an action is starved (as opposed to blocked) from getting access to necessary resources. Typically, timers are used to age the action. If the action gets old enough, then its priority may be elevated to increase the probability that it will win arbitration when contending for a required resource (such as access to a buffer or output link).