Currently, when dynamic routing protocols such as Open Shortest Path First (OSPF) and Intermediate System-to-Intermediate System (IS-IS) are used in a network topology that includes switches/routers configured as a Virtual Router Redundancy Protocol (VRRP)/multi-chassis link aggregation group (MC-LAG) cluster, the nodes of the cluster cannot ensure “short-path” forwarding for all incoming data traffic. This means that some data traffic arriving at the nodes from client machines may unnecessarily traverse the inter-chassis link (ICL) between nodes before the traffic is forwarded to its final destination, possibly leading to congestion, traffic loss, and/or increased latency.
FIGS. 1A-1C depict an exemplary network environment 100 and corresponding flows that illustrate this problem. As shown in FIG. 1A, network environment 100 includes three network devices (e.g., switches/routers) 102, 104, and 106 that are part of a first network 106 (e.g., a provider network). Network devices 102, 104, and 106 are referred to herein as “provider edge” (PE) devices. Network environment 100 further includes a group of servers 110, 112, and 114 that are part of a second network 116 (e.g., a network core) connected to first network 106, and a network device (e.g., switch/router) 118 connected to PE devices 102, 104 respectively and a client 120. Network device 118 is referred to herein as a “client edge” (CE) device.
In the embodiment of FIG. 1A, PE devices 102 and 104 are grouped into an MC-LAG (e.g., Multi-Chassis Trunking, or MCT) cluster 122. Accordingly, there is an inter-chassis link (ICL) 128 between PE devices 102 and 104, as well as a plurality of links (124 and 126) that form a link aggregation group (LAG) between PE devices 102, 104 and CE device 118. For example, link 124 connects CE device 118 to interface IP address 1.1.1.1 and interface MAC address AAAA.AAAA.AAAA of PE device 102, and link 126 connects CE device 118 to interface IP address 1.1.1.2 and interface MAC address BBBB.BBBB.BBBB of PE device 104. In addition, PE devices 102 and 104 are configured/configurable to act in concert as a virtual router via VRRP (or variants thereof, such as VRRPe), such that PE devices 102 and 104 share a common virtual IP address (1.1.1.254) and a common virtual MAC address (CCCC.CCCC.CCCC).
The challenge with the configuration of FIG. 1A occurs when a dynamic routing protocol, or DRP, is implemented across network environment 100 (as shown in FIG. 1B). In a scenario where there is no dynamic routing, CE device 118 will typically store, within its Layer 3 (L3) routing table, the virtual IP address of cluster 122 as the next hop address for all routes leading to network core 116. However, in the scenario of FIG. 1B where dynamic routing is used, CE device 118 learns the individual interface IP addresses of PE devices 102 and 104 via the DRP, rather than the virtual IP address. Since links 124 and 126 are equal cost paths, this means that CE device 118 will store two next hop addresses in its L3 routing table for each destination in network core 116. For instance, in FIG. 1B, L3 routing table 130 includes next hop addresses 1.1.1.1 and 1.1.1.2 (corresponding to PE devices 102 and 104 respectively) for destination server 110 having IP address 2.2.2.2.
The foregoing can lead to the problematic sequence of events shown in FIG. 1C. At step (1) (reference numeral 132) of FIG. 1C, client 120 transmits a data packet destined for server 110 to CE device 118. At step (2) (reference numeral 134), CE device 118 selects, as the next hop for the packet, the interface IP address of PE device 102 (i.e., 1.1.1.1), resolves the interface MAC address for 1.1.1.1 (i.e., AAAA.AAAA.AAAA) via Address Resolution Protocol (ARP), and sets the destination MAC address of the packet to the interface MAC address. However, this does not guarantee that the data packet gets forwarded over link 124 to PE device 102. Instead, because links 124 and 126 are part of a LAG, CE device 118 performs a hash on the headers of the data packet, which happens to select link 126 (rather than link 124). Thus, in this case, the data packet is forwarded over link 126 to PE device 104, even though the destination MAC address included in the packet identifies the interface MAC address of PE device 102 (i.e., AAAA.AAAA.AAAA).
Upon receiving the data packet, PE device 104 determines that the destination MAC address does not correspond to its own interface MAC address and forwards the packet (at Layer 2) over ICL 128 to PE device 102 (step (3), reference numeral 136). PE device 102 then performs a route lookup in its L3 routing table based on the destination IP address in the packet (i.e., 2.2.2.2) and forwards the packet to the next hop in the shortest path to the destination (i.e., PE device 106) (step (4), reference numeral 138).
The issue with the flow shown in FIG. 1C is that the data packet is unnecessarily sent over ICL 128 from PE device 104 to PE device 102, even though PE device 104 is capable of forwarding the data packet directly to next hop 106. This act of transmitting the data packet over ICL 128 can have a number of detrimental effects. For instance, it can congest ICL 128, which reduces the amount of bandwidth available for other types of cluster communication traffic between PE devices 102 and 104. Further, the data packet can incur unnecessarily higher latency due to traversing ICL 128. Yet further, the uplinks from PE devices 102 and 104 to the network core can become congested due to uneven load sharing.