The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Border Gateway Protocol (BGP) is a path vector routing protocol for inter-Autonomous System routing. The function of a BGP-enabled network element (a BGP host or peer) is to exchange network reachability information with other BGP-enabled network elements. The most commonly implemented version of BGP is BGP-4, which is defined in RFC1771 (published by the Internet Engineering Task Force (IETF) in March 1995).
To exchange routing information, two BGP hosts first establish a peering session by exchanging BGP OPEN messages. The BGP hosts then exchange their full routing tables. After this initial exchange, each BGP host sends to its BGP peer or peers only incremental updates for new, modified, and unavailable or withdrawn routes in one or more BGP UPDATE messages. A route is defined as a unit of information that pairs a network destination with the attributes of a network path to that destination. The attributes of the network path include, among other things, the network addresses (also referred to as address prefixes or just prefixes) of the computer systems along the path. In a BGP host, the routes are stored in a Routing Information Base (RIB). Depending on the particular software implementation of BGP, a RIB may be represented by one or more routing tables. When more than one routing table represents a RIB, the routing tables may be logical subsets of information stored in the same physical storage space, or the routing tables may be stored in physically separate storage spaces.
As networks grow more complex and the number of BGP routes maintained by a particular element increases, the consequences of the failure of a BGP host device, or the BGP process that it hosts, become more severe. For example, in some scenarios a BGP failure may require retransmission of a large amount of route information and re-computation of a large amount of network reachability information. Therefore, vendors of network gear and their customers wish to deploy BGP in a fault-tolerant manner.
BGP commonly runs on and uses the Transmission Control Protocol (TCP) as defined in RFC 793, which provides a connection-oriented, reliable data delivery service for applications such as BGP. Having highly available, reliable TCP connections that can be switched over in the face of failure is a foundation requirement for providing BGP with high availability. TCP is a stateful protocol that provides reliable datagram delivery, flow control, and congestion control for higher-order applications. To provide these services, a TCP implementation maintains state data that includes variables, such as window sizes, round trip time, etc.; a re-transmission queue containing copies of segments that have been sent but not yet acknowledged; and timers. A successful switchover to a secondary processor of TCP requires timely synchronization of such state data to the secondary processor.
Highly reliable networks offer high availability by detecting failures and handling the failures in a timely manner with zero or minimal disruption of service. Redundant systems that have at least one secondary processor are often used to achieve high reliability. When the secondary processor is synchronized to the primary processor, and can take over with almost no visible interruption to peer devices, the secondary processor is termed a “hot standby” and the switchover is termed “stateful switchover” or SSO.
SSO can be implemented in a telecommunication network with network elements that have dual route processors, each of which can host separate but duplicate instances of various software applications. One route processor is deemed Active and the other is deemed Standby. When the processors are operating in SSO mode, the active route processor automatically replicates all messages that it receives or sends, for all protocols or activities, and sends the replicated messages to the standby route processor.
In some embodiments, the active route processor periodically sends a bulk copy of data representing a particular state (a “checkpoint”) to the standby route processor. While replication and checkpointing enable the standby route processor to achieve synchronization of state with the active route processor, these approaches require considerable use of processing resources and memory, and require extensive use of an inter-processor communication mechanism. When a route processor is managing a large number of BGP sessions and TCP connections, the burden of continually operating in SSO mode may become unacceptable.
As networks grow larger and more complex, network reliability and throughput depends to a greater extent upon the availability of software processes that implement BGP. For example, when a BGP host becomes unavailable, many other BGP peers may need to re-compute route information to account for the unavailability. Other hosts may lose BGP connectivity during the transition. Thus, present approaches for managing BGP sessions in large networks without SSO capability cause significant network chum. Network administrators are demanding a better solution that does not perturb the network.
Moreover, BGP is merely one example of an application for which high availability is desirable; there are many other applications. BGP and other applications running on top of transport-layer protocols, such as TCP, would benefit greatly from a solution providing true SSO for the TCP connections, achieved in a scalable manner.
Further, users and administrators expect any SSO support for TCP to provide a solution that performs well and scales to large networks that use existing and future platforms without major hardware upgrades.
One approach for providing high-availability TCP involves massive data checkpointing of send and receive windows and related metadata for all established TCP connections. While this approach does allow active and standby processors to maintain identical TCP state information, it is a “brute-force” approach that requires extensive CPU resources. Network administrators desire to have a more efficient approach that is readily scalable to large numbers of connections.