Bidirectional Forwarding Detection (BFD) is a network protocol used to detect faults between two network elements connected by a link. BFD provides low-overhead detection of faults even on physical media that do not support failure detection of any kind, such as Ethernet, virtual circuits, tunnels and Multi-Protocol Label Switch (MPLS) Label Switched Paths. BFD establishes a session between two endpoints over a particular link that is explicitly configured between endpoints. A BFD session can operate in an asynchronous mode, each of the network element endpoints periodically send Hello packets to each other. If a requisite number of BFD packets are not received, the session is considered down.
In addition, either endpoint in the BFD session may also initiate an Echo function. When this function is active, a stream of Echo packets is sent, and the other endpoint then sends these Echo packets back to the sender via its forwarding plane. This can be used to test the forwarding path on the remote system.
A number of services can use BFD as a fast fault detection mechanism to determine if a link is down and adjust the service accordingly. For example, Border Gateway Protocol (BGP), Open Shortest Path First (OSPF), Protocol Independent Multicast (PIM), First Hop Redundancy Protocol (FHRP), Link Aggregate Services (LAG), and/or other services can use BFD to detect that a link for the service is down and adjust that service accordingly.
A problem can occur if a failover occurs in a control plane in the network element from an active one central processing unit (CPU) to a standby CPU, and the BFD sessions are processed by the control plane. During a control plane failover, the failover from an active to standby CPU can take up to 30 seconds for the control plane to resume functioning. In addition, the network element is “headless,” where the data plane continues to process network data. Thus, during this control plane failover time, the data plane of the failover network element can continue to forward network data and this function of the network element is not down. However, during this failover time, the BFD sessions are disrupted, which can lead to other network elements in the BFD sessions believing that this network element is completely down. Thus, other network elements in the BFD sessions will treat the failover network element as being down, when the data plane of the failover network element is still functioning. This false negative determined by BFD session can cause disruption in services and a churn in the network.