Multi-rooted tree structures are commonly deployed in production Data Center Networks (DCNs) to provide high bisection bandwidth. Load balancing strategies, such as Equal-Cost Multi-Path routing (ECMP), are commonly used to balance data traffic load on multiple parallel paths between nodes (e.g., commodity network switches, routers) in the DCN. However, when link failures occur, the highly symmetric DCNs become asymmetric. The asymmetry challenges the load balancing of existing traffic oblivious routing protocols, as existing routing protocols are designed to address destination reachability by least-cost paths. Thus, existing load balancing strategies cannot simultaneously balance traffic and fully utilize link capacities. As a result, network congestion occurs, reducing data throughput in the DCN.
DCNs commonly use logical links between nodes. The logical link is commonly referred to as a Link Aggregation Group (LAG), which generally consists of multiple physical links. The use of LAGs makes the load-balancing problem even more complicated, as a physical link failure in a LAG leads to partial capacity loss in a logical link. Existing routing protocols (e.g, Open Shortest Path First (OSPF)) are generally not aware of such changes to the logical link capacity, and thus continue to route the same amount of load to the degraded LAG, which causes persistent congestion.
To handle this problem with existing routing protocols, the whole LAG, including the healthy physical links of the partially failed LAG, is often disabled. Additional links may be disabled, for example, to assist the routing protocol to find a different path to temporarily mitigate the congestion caused by the unbalanced load. However, the sacrificed link capacity often leads to congestion on a larger scale, especially when overall link capacity in the network is highly utilized. For example, OSPF improperly balances the load facing link failures on asymmetric and non link-disjoint paths, even if information of physical link failure in a LAG is available.