The present invention relates to routing data through a parallel computing system, and more particularly to selecting an efficient path for routing data through the parallel computer system.
A large parallel computer system, such as IBM's BLUEGENE™ parallel computer system, has many nodes interconnected with each other. In the IBM BLUEGENE™ parallel computer system, each node is interconnected along multiple dimensions in a torus topology. For example, the IBM BLUEGENE™/L or P parallel computer system can be configured as a three-dimensional network topology.
The nodes communicate with each other by injecting data packets into the torus network. The data packets at the sending node are stored in an Injection FIFO buffer and injected into the torus network by a processor or a DMA logic. The receiving node stores the injected data packets in a reception FIFO buffer or directly into an arbitrary location in memory. In a three-dimensional torus, there are 6 possible links for receiving a data packet and 6 possible links for sending a data packet between nodes. These links may be labeled as ‘+x’, ‘−x’, ‘+y’, ‘−y’, ‘+z’, and ‘−z’.
Prior art BLUEGENE™ parallel computer systems use dynamic routing to communicate data between nodes. Each data packet contains a ‘dynamic’ bit, which if set indicates that the packet may be dynamically routed. Dymanic routing can improve throughput by avoiding busy links between nodes. Each data packet contains a destination address coordinates for the receiving node and ‘hint bits’ that indicate which links may be used to move the data packet towards its destination.
In a data packet header for a three-dimensional torus there are 6 hint bits corresponding to connections between the sending node and the receiving node in the ‘+x’, ‘−x’, ‘+y’, ‘−y’, ‘+z’, and ‘−z’ directions. These hint bits indicate allowable directions the data packet may move towards its destination and are used to permit early arbitration of the packet.
An important communication pattern in parallel computer systems is ‘All-to-All’ in which each node sends data packets to each connected node. Generally, communication of data packets over a symmetrical torus, i.e., the number of nodes is the same within each dimension, is more efficient that communication over an asymmetrical torus. In an asymmetrical torus performance may degrade due to head-of-line blocking effects.
Thus, there is a need in the art for a method and system that improves communication of data within a parallel computer system. Specifically, needed is a method and system that improves the efficiency of communicating data within an asymmetrical torus.