1. Field of the Invention
This invention relates generally to deadlock-free routing in fat tree networks.
2. Description of Related Art
Fat tree networks are hierarchical, recursively constructed networks comprising multiple switches connected in a multi-rooted tree-like topology. These networks enjoy widespread popularity and are used in many modern-day high-performance computing systems and commercial data center installations. There are a number of variations of the fat tree topology, but the basic principles of these networks can be understood from FIG. 1 of the accompanying drawings. This shows an example of a specific, highly regular fat tree network known as a “k-ary n-tree”. In this designation, k is the radix of the tree (i.e., the number of children and/or parents at each level), and n is the number of levels. A k-ary n-tree has n levels of switches, each switch having radix 2k, with half of the ports connecting downwards and the other half connecting upwards. The switches in the top level (i.e., the roots of the tree) in principle only require radix k as they have no parents, but in practice these ports may be present and unconnected to allow for future network extensions. The network of FIG. 1 is a binary 4-tree, whereby k=2 and n=4, with the squares representing switches and the lines between them representing inter-switch links. The switches form an interconnection network of an indirect network topology, i.e. the end nodes (compute nodes, servers, etc.) are connected at the edges of the network, providing a clear distinction between compute and networking. The end nodes in FIG. 1 are represented by circles and connect to the bottom layer of switches, i.e. the leaves of the tree topology.
Packets can be transmitted between any pair of end nodes in fat tree networks via a simple routing procedure involving two routing phases. The first routing phase is an “up phase”, in which the route follows one or more switch-to-switch hops in the upwards direction (i.e. towards the roots) of the topology. This is followed by a “down phase” in which the route follows one or more switch-to-switch hops in the downwards direction of the topology. With this routing strategy, shortest-path routing is straightforward and, because routes include only up/down turns and not down/up turns, deadlock is avoided. Deadlock can occur if there are cyclic dependencies between resources in the channel dependency graph as this can result in irreconcilable conflict between resource requests in operation of the network. There are also multiple, equal-length paths between any source and destination (not attached to the same leaf switch), enabling multi-pathing and load-balancing in network operation. Fat tree networks also offer high bisection-bandwidth, and the hierarchical structure is readily scalable to very large networks.
Due to the indirect nature of fat tree networks, routing algorithms for these networks only deal with traffic that flows from one end node to another and do not provide connectivity from any switch to any other switch in the network. Full switch-to-switch connectivity would require use of routes with down/up turns and these turns can introduce deadlock in the network, which must be avoided at all costs. However, direct switch-to-switch connectivity can be highly beneficial for several network management functions, such as communicating topology changes (addition or removal of nodes and/or switches), distribution of local fault events (e.g. breaking of a link), and diagnostics (measuring latency or throughput in between an arbitrary switch pair). Also, current InfiniBand switches often have an embedded subnet manager which needs connectivity with all switches in the network. (InfiniBand is a trade mark of the InfiniBand Trade Association).
A proposal for full connectivity in fat tree networks is described in “sFtree: A fully connected and deadlock-free switch-to-switch routing algorithm for fat-trees”, Bogdanski et al., ACM Trans. Architecture and Code Optimization, vol. 8, no. 4, January 2012. This proposal designates a particular inverted sub-tree, within the overall indirect network topology, in which down/up turns can occur during routing. If conventional two-phase up-down routing does not provide connectivity between a source and destination switch, then a four-phase up-down-up-down route is used, with the down/up turn occurring in the designated sub-tree. This provides deadlock-free routing by route restriction to avoid hardware modifications such as use of virtual channels which the authors deem undesirable for these networks. The proposed routing method is neither shortest path, nor does it fully exploit path diversity. Moreover, because it concentrates switch-to-switch traffic in one inverted sub-tree, the network it is prone to congestive effects and loss of connectivity due to network faults.
Various mechanisms are known for deadlock avoidance in networks in general, including use of virtual channels (i.e. partitioning of resources such as switch buffers and links to provide plural logical channels within one physical channel), and flow control mechanisms such as injection restriction to prevent any single resource from stopping transit. Use of virtual channels for deadlock avoidance in an arbitrary node-to-node network topology is discussed in “Deadlock-free Oblivious Routing for Arbitrary Topologies”, Domke et al., in Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Anchorage, USA, p. 613-624, May 2011. This proposes a complex system based on detailed analysis of the network topology. This and similar algorithms typically require two to twelve virtual channels to guarantee deadlock freedom. “Effective Methodology for Deadlock-Free Minimal Routing in InfiniBand Networks”, Sancho et al., in Proc. IEEE International Conference on Parallel Processing (ICPP), Vancouver, Canada, p. 409-418, August 2002, discloses use of virtual channels and service levels for deadlock-free routing in InfiniBand networks. This requires complex network analysis involving minimal path computation and mapping to a spanning tree of the arbitrary topology, with virtual channels being allocated, if available, to break deadlock. These various techniques for arbitrary topologies lead to poor performance in fat tree networks because they fail to exploit the tree's multi-path capabilities.