The present invention relates generally to data processing and, in particular, to an improved interconnection network topology for large scale, high performance computing (HPC) systems.
Scalable, cost-effective, and high performance interconnection networks are a prerequisite for large scale HPC systems. The dragonfly topology, described, for example, in US 2010/0049942, is a two-tier hierarchical interconnection network topology. At the first tier, a number of routers are connected in a group to form a large virtual router, with each router providing one or more ports to connect to other groups. At the second tier, multiple such groups of routers are connected such that the groups form a complete graph (full mesh), with each group having at least one link to every other group.
The main motivation for a dragonfly topology is that a dragonfly topology effectively leverages large-radix routers to create a topology that scales to very high node counts with a low diameter of just three hops, while providing high bisection bandwidth. Moreover, the dragonfly minimizes the number of expensive long optical links, which provides a clear cost advantage over fat tree topologies, which require more long links to scale to similar-size networks.
However, when considering exascale systems, fat tree and two-tier dragonfly topologies run into scaling limits. Assuming a per-node peak compute capacity Rn=10 TFLOP/s, an exascale system would require N=100,000 nodes. A non-blocking fat tree network with N end nodes built from routers with r ports requires n=1+log(N/r)/log(r/2) levels (with N rounded up to the next integer); therefore, using current Infiniband routers with r=36 ports, this system scale requires a network with n=4 levels, which amounts to 2n−1=7 router ports per end node and (2n−1)/r=0.19 routers per end node. To achieve this scale in just three levels, routers with a radix r=74 are needed, which corresponds to 0.068 routers per node.
A balanced—i.e., providing a theoretical throughput bound of 100% under uniform traffic—two-tier dragonfly network (p, a, h)=(12, 26, 12) can also scale to about 100,000 nodes, where p is “bristling factor” indicating the number of terminals connected to each router, a is the number of routers in each group, and h is the number of channels in each router used to connect to other groups. This corresponds to 1/12=0.083 routers per node and 49/12=4.1 ports per node, which is significantly more cost-effective than the four-level fat tree, and about on par with the three-level fat tree, which requires much larger routers.