The present invention relates to routing data through a parallel computing system, and more particularly to selecting a routing path for collective operations in the parallel computer system.
A large parallel computer system, such as IBM's BLUEGENE™ parallel computer system, has many nodes interconnected with each other. In the IBM BLUEGENE™ parallel computer system, each node is interconnected along multiple dimensions in a torus topology. For example, the IBM BLUEGENE™/L or P parallel computer system can be configured as a three-dimensional network topology.
Prior art IBM BLUEGENE™/L and P parallel computer systems use a separate collective network, such as the logical tree network disclosed in commonly assigned U.S. Pat. No. 7,650,434, for performing collective communication operations. The uplinks and downlinks between nodes in such a collective network needed to be carefully constructed to avoid deadlocks between nodes when communicating data. In a deadlock, packets cannot move due to the existence of a cycle in the resources required to move the packets. In networks these resources are typically buffer spaces in which to store packets.
If logical tree networks are constructed carelessly, then packets may not be able to move between nodes due to a lack of storage space in a buffer. For example, a packet (packet 1) stored in a downlink buffer for one logical tree may be waiting on another packet (packet 2) stored in an uplink buffer of another logical tree to vacate the buffer space. Furthermore, packet 2 may be waiting on a packet (packet 3) in a different downlink buffer to vacate its buffer space and packet 3 may be waiting for packet 1 to vacate its buffer space. Thus, none of the packets can move into an empty buffer space and a deadlock ensues. While there is prior art for constructing deadlock free routes in a torus for point-to-point packets (Daily “Deadlock-Free Message Routing in Multiprocessor Interconnection Networks” IEEE TRANSACTIONS ON COMPUTERS, VOL. C-36, NO. 5, MAY 1987 and Duato “A General Theory for Deadlock-Free Adaptive Routing Using a Mixed Set of Resources” IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 12, NO. 12, DECEMBER 2001), there are no specific rules for constructing deadlock free collective class routes in a torus network, nor is it obvious how to apply Duato's general rules in such a way to avoid deadlocks when constructing multiple virtual tree networks that are overlayed onto a torus network. If different collective operations are always separated by barrier operations (that do not use common buffer spaces with the collectives nor block on common hardware resources as the collectives), then the issue of deadlocks does not arise and class routes can be constructed in an arbitrary manner. However, this increases the time of the collective operations and therefore reduces performance.
Thus, there is a need in the art for a method and system for performing collective communication operations within a parallel computing network without the use of a separate collective network and in which multiple logical trees can be embedded (or overlayed) within a multiple dimension torus network in such a way as to avoid the possibility of deadlocks. Virtual channels (VCs) are often used to represent the buffer spaces used to store packets. It is further desirable to have several different logical trees using the same VC and thus sharing the same buffer spaces