Data centers contain large numbers of servers to achieve economies of scale [19], and the number is increasing exponentially [13]. For example, it is estimated that Microsoft's Chicago data center has about 300,000 servers [1]. The huge number of servers has created a challenge for the data center network (DCN) to offer proportionally large bandwidth to interconnect the servers [30]. As a result, modern DCNs usually adopt multi-rooted hierarchical topologies, such as the fat tree [2], VL2 [11], DCell [13], and BCube [12], which offer multipath capability for large bisection bandwidth and increased bandwidth and fault tolerance. For example, FIG. 1 shows a diagram of a hierarchical fat tree topology. The topology in FIG. 1 has four layers: hosts; edge switches; aggregation switches; and core switches, from the bottom to top, and the four core switches act as the multiple roots of the network. As a result, there are two different paths between hosts A and B, as shown in different colors (green and red).
However, traditional link state and distance vector based [16] routing algorithms (e.g., for the internet) cannot readily utilize the multipath capability of multi-rooted topologies. Traditional routing algorithms calculate routes based on only packet destinations, and thus all packets to the same destination share the same route. Although equal cost multipath (ECMP) [9] supports multipath routing, it performs static load-splitting based on packet headers without accounting for bandwidth, allows only paths of the same minimum cost, and supports an insufficiently small number of paths [14]. Further, traditional routing algorithms usually give preference to the shortest path to reduce the propagation delay. Due to small geographical distances, DCNs are less concerned about the propagation delay, but give priority to bandwidth utilization.
Typical DCNs offer multiple routing paths for increased bandwidth and fault tolerance. Multipath routing can reduce congestion by taking advantage of the path diversity in DCNs. Typical layer-two forwarding uses a spanning tree, where there is only one path between source and destination nodes. A recent work provides multipath forwarding by computing a set of paths that exploits the redundancy in a given network, and merges these paths into a set of trees, each mapped as a separate VLAN [19]. At layer three, equal cost multipath (ECMP) [9] provides multipath forwarding by performing static load splitting among flows. ECMP-enabled switches are configured with several possible forwarding paths for a given subnet. When a packet arrives at a switch with multiple candidate paths, the switch forwards it on to the one that corresponds to a hash of selected fields of the packet header, thus splitting the load to each subnet across multiple paths. However, ECMP does not account for flow bandwidth in making allocation decisions, which may lead to oversubscription even for simple communication patterns. Further, current ECMP implementations limit the multiplicity of paths to 8-16, which is fewer than what would be required to deliver high bisection bandwidth for larger data centers [14].
There exist multipath solutions for DCNs, including Global First-Fit and Simulated Annealing [4]. The former simply selects among all the possible paths that can accommodate a flow, but needs to maintain all paths between a pair of nodes. The latter performs a probabilistic search of the optimal path, but converges slowly. The ElasticTree DCN power manager uses two multipath algorithms, Greedy Bin-Packing and Topology-Aware Heuristic [14]. The former evaluates possible paths and chooses the leftmost one with sufficient capacity. The latter is a fast heuristic based on the topological feature of fat trees, but with the impractical assumption to split a flow among multiple paths. The MicroTE framework supports multipath routing, coordinated scheduling of traffic, and short term traffic predictability [5].