1. Field of the Invention
The present invention relates to the field of message-passing data networks. For example, such a network is used in a distributed-memory message-passing parallel computer, as applied for example, to high performance computation.
2. Description of the Prior Art
A message-passing data network serves to pass messages between users of the network, referred to herein as “nodes.” Each node can perform operations independent of the other nodes. Nodes can act in concert by passing messages between each other over the network. An example of such a network is that of a distributed-memory parallel computer. Each of its nodes has one or more processors that operate on local memory. An application using multiple nodes of such a computer coordinates their actions by passing messages between them.
A message-passing data network consists of switches and links. A link merely passes data between two switches. Unless stated otherwise, a link is bi-directional. In other words, a link supports messages in either direction between the two switches. A switch routes incoming data from a node or link to another node or link. A switch may be connected to an arbitrary number of nodes and links. Depending on the network and on the nodes' location in the network, a message between two nodes may need to cross several switches and links. In general, the fewer the number of such crossings required, the more efficient the network. The efficiency has two aspects. Firstly, the fewer the number of such crossings, the shorter the latency or time required for a message to pass from its source to its destination. Secondly, the fewer the number of such crossings, the greater the effective bandwidth of the network.
Networks to date efficiently support some communication patterns, but not all patterns. For example, a three dimensional (3D) torus network efficiently supports 3D nearest neighbor communication. By construction, each switch is linked to its neighbors, so for nearest neighbor communication, each message crosses only a single link. Efficient support of nearest-neighbor communication is required in various situations, including many numerical algorithms executing on a distributed-memory parallel computer. In contrast, the 3D torus does not efficiently support communication to a randomly chosen node. On average such a message crosses one quarter of the links in each dimension. So on an N*N*N dimensional torus, a message to a random destination crosses N*¾ links. Efficient support of communication to a randomly chosen node is required in various situations, including the all-to-all communication pattern used in many numerical algorithms executing on a distributed-memory parallel computer. The problem is to create a network which efficiently supports various communication patterns, including nearest-neighbor and all-to-all.
In general at a given bandwidth, an external link between switches costs more than a link within a switch. Thus, a further problem is to create a network which efficiently uses the external links between switches, even if this introduces inefficiencies in the use of links internal to a switch.
It would thus be highly desirable to provide a network architecture that efficiently supports various communication patterns, including nearest-neighbor and all-to-all and further, makes efficient use of external links between switches.