The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to claims in this application and any application claiming priority from this application, and are not admitted to be prior art by inclusion in this section.
Border Gateway Protocol (BGP) is a network path vector routing protocol for inter-autonomous system routing. The function of a BGP-enabled network node (a BGP host or peer) is to exchange network reachability information with other BGP-enabled network nodes. To exchange routing information, two BGP hosts first establish a BGP peering session by exchanging BGP OPEN messages. The BGP hosts then exchange their full routing tables. After this initial exchange, each BGP host sends to its BGP peer or peers only incremental updates for new, modified, and unavailable or withdrawn routes in one or more BGP UPDATE messages. A route is defined as a unit of information that pairs a network destination with the attributes of a network path to that destination. The attributes of the network path include, among other things, the network addresses (also referred to as address prefixes or just prefixes) of the computer systems along the path.
A BGP host stores information about the routes known to the BGP host in a Routing Information Base (RIB). Depending on the particular software implementation of BGP, a RIB may be represented by one or more routing tables. When more than one routing table represents a RIB, the routing tables may be logical subsets of information stored in the same physical storage space, or the routing tables may be stored in physically separate storage spaces.
As networks grow more complex and the number of BGP routes maintained by a particular network element increase, the consequences of a BGP host device, or the BGP process executing on the BGP host device, becoming non-functional are more severe. For example, in some scenarios, when a BGP host fails or otherwise become non-functional, the BGP host can lose all information about routes maintained by the non-functional BGP host. Thus, recovery of the non-functional BGP host may require retransmission of a large amount of route information from other BGP hosts and the re-computation of a large amount of network reachability information by the recovering BGP host. During the retransmission period, the non-functional BGP host cannot route network traffic. Therefore, vendors of network gear and their customers desire to overcome these limitations to improve network availability.
Inter-Chassis Redundancy (ICR) can provide high availability within a network by having one or more network nodes that can be switched to handle the services of another network node that has become non-functional. Typically, one network node functions as an active ICR node while another network node functions as a standby ICR node that is configured to take over at least some operations (e.g., traffic routing operations) of the active ICR node, through a process called “switchover.” Switchover can be triggered by failure of a network link or component of the active ICR node and/or by a network operator (e.g., taking an active ICR node off-line to perform a software/hardware update or other maintenance). The active ICR node handles routing of IP network traffic until it becomes non-functional, at which time switchover occurs with the standby ICR node taking over at least some functionality that was performed by the non-functional ICR node (with the standby ICR node then becoming an active ICR node).
Although ICR service has been described in the context of the BGP routing protocol, it is not limited thereto and can be used in other L2 or L3 network protocols, such as Open Shortest Path First (OSPF).
When using routing protocols like BGP, the determination of ICR state (Active or Standby) is done based on the best path, i.e., whichever node the BGP network determines has the best path becomes the active ICR node and another node (e.g., having the next best path) becomes the standby ICR node.
Network service failures can arise because the active ICR node does not know the presence of the standby ICR node, due to the BGP router which is the immediate neighbor to the active ICR node suppressing communications relating to non-best paths. Consequently, the active ICR node does not know about the presence of other nodes associated with non-best paths (including the standby ICR node) who are advertising their presence using the BGP protocol. In contrast, the standby ICR node becomes aware of the presence of the active ICR node using the BGP protocol.
A separate ICR transport channel can be established between the active and standby ICR nodes, and the standby ICR node can send heartbeat messages to the active ICR node. The active ICR node can discover the presence of the standby ICR node through receipt of the heartbeat message from the standby ICR node. However, the active ICR node is not aware of any link failure issues with the standby ICR node, configuration changes to the standby ICR node, or other events that cause the standby ICR node to become non-functional. This is because the BGP neighbor router at the active ICR node suppresses any changes to advertisements to the active ICR node as long as the active ICR node has the best path in the network.
Consequently, the active ICR node does not know that the standby ICR node has become non-functional. Therefore, when a network operator attempts to perform a manual switchover or another event occurs that triggers a switchover from the active ICR node to the standby ICR node, the network traffic that is then forwarded to the standby ICR node (instead of to the active ICR node) may not be properly processed and forwarded by the standby ICR node and, thus, lead to network service failures. These failures can result in unacceptable network operations degradation and lost revenue.