The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Border Gateway Protocol (BGP) is a path vector routing protocol for inter-Autonomous System routing. The function of a BGP-enabled network element (a BGP host or peer) is to exchange network reachability information with other BGP-enabled network elements. The most commonly implemented version of BGP is BGP-4, which is defined in RFC1771 (published by the Internet Engineering Task Force (IETF) in March 1995).
To exchange routing information, two BGP hosts first establish a BGP peering session by exchanging BGP OPEN messages. The BGP hosts then exchange their full routing tables. After this initial exchange, each BGP host sends to its BGP peer or peers only incremental updates for new, modified, and unavailable or withdrawn routes in one or more BGP UPDATE messages. A route is defined as a unit of information that pairs a network destination with the attributes of a network path to that destination. The attributes of the network path include, among other things, the network addresses (also referred to as address prefixes or just prefixes) of the computer systems along the path.
A BGP host stores information about the routes known to the BGP host in a Routing Information Base (RIB). Depending on the particular software implementation of BGP, a RIB may be represented by one or more routing tables. When more than one routing table represents a RIB, the routing tables may be logical subsets of information stored in the same physical storage space, or the routing tables may be stored in physically separate storage spaces.
As networks grow more complex and the number of BGP routes maintained by a particular network element increase, the consequences of a BGP host device, or the BGP process executing on the BGP host device, becoming inoperable are more severe. For example, in some scenarios, when a BGP host fails, the BGP host loses all information about routes maintained by the failed BGP host. Thus, recovery of the failed BGP host may require retransmission of a large amount of route information from other BGP hosts and the re-computation of a large amount of network reachability information by the recovering BGP host. During the retransmission period, the failed BGP host cannot route network traffic. Therefore, vendors of network gear and their customers wish to deploy BGP in a high availability manner.
One approach for deploying BGP in a high availability manner is referred to as “stateful switchover” or SSO. SSO is typically implemented with network elements that have dual route processors, each of which can host separate but duplicate instances of various software applications. One route processor is deemed Active and the other is deemed Standby. In one implementation of SSO, processes or “speakers” periodically transfer (in a process referred to as “checkpointing”) a copy of large amounts of data, from one or more routing tables, from the Active BGP speaker to the Standby BGP speaker. In this way, the Standby BGP speaker may operate, using the same routes as previously used by the Active BGP speaker, when the Active BGP speaker becomes inoperable. Consequently, all data accumulated by the Active BGP speaker must be transferred to the Standby BGP speaker before the Standby BGP speaker can start processing BGP UPDATE messages or perform other substantive BGP functions.
However, this bulk data transfer approach is inefficient and does not scale as the volume of routes maintained by the Active BGP speaker increases. For example, the data structures that are transferred must be converted to messages for purposes of inter-process communications. Therefore, all data structures have to be flattened, i.e., pointers present in the data structures cannot be sent in the form of pointers. Further, as data structures change between versions of software, new messages and converter functions are necessary to provide SSO support between the different versions.
Some implementations of BGP SSO attempt to limit that amount of data that is transferred from the Active BGP speaker to the Standby BGP speaker at a single time by transmitting data that identifies a change made to the Active BGP speaker, from the Active BGP speaker to the Standby BGP speaker, as soon as the change is made to the Active BGP speaker. However, such an approach requires a large amount of overhead in updating the RIB of the Active BGP speaker because the RIB of the Standby BGP speaker must be updated synchronously with the RIB of the Active BGP speaker.
Another approach for deploying BGP in a high availability manner is referred to as “graceful restart.” The graceful restart approach involves, for example, two different BGP hosts, denoted host A and host B herein. According to the graceful restart approach, if host A determines that host B may have become inoperable, host A starts a first timer that reflects the amount of time in which host A must receive a communication from host B before host A concludes that host B has become inoperable. If host A does receive a communication from host B before the expiration of the first timer, then host A starts a second timer that reflects the amount of time in which host B must send all BGP UPDATE messages to host A. On the other hand, if host A does not receive a communication from host B before the expiration of the first time, then host A updates the RIB it maintains to reflect that host B is not reachable.
Unfortunately, as a result of the time involved in updating the RIBs of each BGP speaker through the exchange of BGP UPDATE messages, the graceful restart approach requires several minutes or more before host A and host B are both updated after one of the hosts comes back online. Further, it is possible that host A would not be notified of a topology change in the network, because host B will not be able to communicate any BGP UPDATE messages to host A if host B is down.
Thus, there is a clear need for an improved technique for recovering from the failover of a BGP speaker on a network element which does not experience the disadvantages discussed above.