According to the Ethernet virtual private network (EVPN) Fast Withdraw procedure, when an Ethernet segment indicator (ESI) failure occurs at a provider edge (PE) device, Border Gateway Protocol (BGP) is used to withdraw an Ethernet Auto Discovery (EAD)/Ethernet segment (ES) route from the PE device (e.g., the local BGP peer) pertaining to the failed ESI. When a remote BGP peer receives notification of the withdrawal of the EAD/ES route from the local BGP peer, the remote BGP peer locally withdraws the same EAD/ES route. In addition, the remote BGP peer removes the IP address of the local BGP peer from the equal-cost-multi-path (ECMP) path for forwarding for the failed ESI. The goal of the EVPN Fast Withdraw procedure is to provide data-plane convergence based on a single message instead of relying on withdrawal of individual EVPN Route Type 2 routes.
Although it seems desirable to prioritize the EAD/ES route withdrawal so that the data plane can converge faster for the remote BGP peers, the EAD/ES withdrawal routes are sent using (relatively slow) Transmission Control Protocol (TCP), where the latency can be of the order of seconds, according to BGP. Thus, the EVPN Fast Withdraw procedure is ultimately limited by the rate at which BGP can send EAD/ES withdrawal routes to the remote BGP peers. To reduce this latency, the use of Bidirectional Forward Detection (BFD) with BGP sessions has been recommended to detect failure of BGP peers. However, even with BFD monitoring of BGP peer failure, two severe limitations would exist when deploying EVPN in data centers.
First, for data-center Leaf-Spine topology, BGP sessions are not established between top-of-rack (TOR) devices. Instead, each TOR device peers with a spine device. So, even if BFD monitoring is performed with BGP session between a TOR device and a spine device and an ESI failure occurs at the TOR device, then BGP on the spine device must still send the withdrawal of the EAD/ES route to the other TOR device over the relatively slow TCP path. Therefore, it is only possible to leverage BFD on one of the two hops between the TOR devices. In particular, BFD is leveraged only on the TOR device to Spine hop of the TOR device to Spine (i.e., hop 1) and Spine to other TOR device (i.e., hop 2) hops.
Second, for deployments with multiple spine devices, BGP adds an additional chum due to its best-path handling. For example, a data center deployment can include two TOR devices—TOR1 and TOR2—as Leaf nodes that are connected to two Spines devices—Spine1 and Spine2. When an ESI failure occurs at TOR1, TOR2 has two EAD/ES routes due to peering with Spine1 and Spine2. When Spine1 withdraws its EAD/ES route, TOR2 would still not issue a local withdraw since it would still have the EAD/ES from Spine2. Thus, TOR2 would only remove TOR1 from ECMP forwarding for the failed ESI when both EAD/ES routes are withdrawn. Waiting for all the paths for EAD/ES to be withdrawn can potentially add more latency to the EVPN Fast-Withdraw procedure. This problem can potentially exacerbate in deployments with more Spine devices.
It should be understood that BFD is used conventionally to detect BGP peer failure and withdrawal of EAD/ES routes is communicated using BGP in the data center deployments described above. BFD is not used in these examples to detect ESI failure and/or to provide the ability to monitor ESI availability on remote peers as described below.