1. Field of the Invention
The invention relates generally to routing protocols used in computer networks and, more particularly, to a technique that speeds up graceful restart of a routing protocol executing on an intermediate node in a computer network.
2. Background Information
A computer network is a geographically distributed collection of interconnected communication links used to transport data between nodes, such as computers. Many types of computer networks are available, with the types ranging from local area networks to wide area networks. The nodes typically communicate by exchanging discrete packets or messages of data according to pre-defined protocols. In this context, a protocol consists of a set of rules defining how the nodes interact with each other.
Computer networks may be further interconnected by an intermediate node, such as a switch or router, to extend the effective “size” of each network. Since management of a large system of interconnected computer networks can prove burdensome, smaller groups of computer networks may be maintained as routing domains or autonomous systems. The networks within an autonomous system (AS) are typically coupled together by conventional “intradomain” routers. Yet it still may be desirable to increase the number of nodes capable of exchanging data; in this case, intradomain routers executing interdomain routing protocols are used to interconnect nodes of the various autonomous systems (ASs).
An example of an interdomain routing protocol is the Border Gateway Protocol version 4 (BGP), which performs routing between autonomous systems by exchanging routing (reachability) information among neighboring interdomain routers of the systems. An adjacency is a relationship formed between selected neighboring (peer) routers for the purpose of exchanging routing information messages and abstracting the network topology. Before transmitting such messages, however, the peers cooperate to establish a logical “peer” connection (session) between the routers. BGP establishes reliable connections/sessions using a reliable/sequenced transport protocol, such as the Transmission Control Protocol (TCP).
The reachability information exchanged by BGP peers typically includes destination address prefixes, i.e., the portions of destination addresses used by the routing protocol to render routing (“next hop”) decisions. Examples of such destination addresses include Internet Protocol (IP) version 4 (IPv4) and version 6 (IPv6) addresses. A prefix implies a combination of an IP address and a mask that cooperate to describe an area of the network that a peer can reach. Each prefix may have a number of associated paths; each path is announced to a peer router by one or more of its peers. Note that the combination of a set of path attributes and a prefix is referred to as a “route”; the terms “route” and “path” may be used interchangeably herein. The BGP routing protocol standard is well known and described in detail in Request For Comments (RFC) 1771, by Y. Rekhter and T. Li (1995), Internet Draft <draft-ietf-idr-bgp4-23.txt> titled, A Border Gateway Protocol 4 (BGP-4) by Y. Rekhter and T. Li (April 2003) and Interconnections, Bridges and Routers, by R. Perlman, published by Addison Wesley Publishing Company, at pages 323-329 (1992), all disclosures of which are hereby incorporated by reference.
The interdomain routers configured to execute an implementation of the BGP protocol, referred to herein as BGP routers, perform various routing functions, including transmitting and receiving routing messages and rendering routing decisions based on routing metrics. Each BGP router maintains a routing table that lists all feasible paths from that router to a particular network. The routing table is a database that contains routing information used to construct a forwarding table of a forwarding information base (FIB) that is used by the router when performing forwarding decisions on packets.
Periodic refreshing of the routing table is generally not performed; however, BGP peer routers residing in the as exchange routing information under certain circumstances. For example, when a BGP router initially connects to the network, the peer routers exchange the entire contents of their routing tables. Thereafter when changes occur to those contents, the routers exchange only those portions of their routing tables that change in order to update their BGP peers' tables. These update messages are thus incremental update messages sent in response to changes to the contents of the routing tables and announce only a best path to a particular network.
Broadly stated, a BGP router generates routing update messages for an adjacency, to also known as a peer router, by “walking-through” the routing table and applying appropriate routing policies. A routing policy is information that enables a BGP router to rank routes according to filtering and preference (i.e., the “best path”). Routing updates provided by the update messages allows BGP routers of the as to construct a consistent view of the network topology. The update messages are typically sent using a reliable transport, such as TCP, to ensure reliable delivery. TCP is a transport protocol implemented by a transport layer of the IP architecture; the term TCP/IP is commonly used to denote this architecture. The TCP/IP architecture is well known and described in Computer Networks, 3rd Edition, by Andrew S. Tanenbaum, published by Prentice-Hall (1996).
Often, maintenance of routers (such as BGP routers) in a network is planned, thereby leading to shutdown or reset of a BGP process executing in the router. For example, a BGP router may be shutdown and removed from service in response to, e.g., upgrading of certain hardware or rebooting of the router following a software upgrade. In addition, the router may be reset in response to changing of BGP parameters, such as when a BGP router identifier is changed. However, a planned router shutdown or reset can result in temporary outages (i.e., loss of routing information exchange) for certain routes for which the shutdown router was the best path.
In order to implement a planned shutdown or reset of BGP, the shutdown router sends a conventional BGP Notification message with error code Cease in order to close its connections with its BGP peers. Subsequently to sending the Notification message, the shutdown router closes the TCP sessions over which the connections are established. In some implementations, the Notification message may be omitted. In response to closing the connections, all original routes advertised on those connections are immediately removed (withdrawn) from service (from the FIBs) by the BGP peers. As a result, some time (i.e., a convergence time) elapses before the network re-converges. In this context, the convergence time is the time that elapses between withdrawal of a route and the time when all subsequent messages triggered by the initial route withdrawal have been exchanged. In general, this can be characterized by the time needed for a BGP router to receive and process update messages from all of its peers, select best paths for each prefix, install those best paths into the routing table and advertise the best paths back to its peers. However, in some networks, multiple such rounds of messages may be required or other factors may also play a part. This approach of simply “halting” the router and re-converging the network leads to temporary loss of routing information due to route withdrawal.
A BGP router may shutdown or become unavailable as result of a timeout or a failure condition. As such, a failed router's functions may be taken over by a designated failover (backup) router. In either a planned shutdown (above) or a sudden failure, eventually BGP connectivity within the failed router will be reestablished. The prior art defines a so-called “graceful restart” (GR) procedure that allows the BGP connection to be reestablished in a manner that causes the least disruption to other connections and avoids further timeouts. A more-detailed discussion of this procedure is provided in Internet Draft <draft-ietf-idr-restart-10.txt> titled, Graceful Restart Mechanism for BGP by S. Sangli, Y. Rekhter, et. al. (December 2004), the teachings of which are expressly incorporated herein by reference.
A graceful restart assumes that the restarting router's peers have first detected that the router's connection has shut down. Subsequently the peers detect that the shutdown router is coming back up and a graceful restart procedure is implemented in an attempt to limit the negative effects on routing caused by the restart of BGP. These negative effects result in part from the need to recompute BGP routes/paths. These processes consume significant system resources.
The conventional graceful restart approach outlined in the above-incorporated Graceful Restart Mechanism for BGP involves entry by the BGP peers into a read-only mode in which they send and receive updates of routes. The peers retrieve their local routing information and generate updates for the restarting router. The restarting router then receives route updates, and thereby updates its local FIB. Once updates are complete, any “stale” paths that are no longer employed by BGP are deleted and read only mode is exited. At this time, all other activity is completed and the “best path” procedure is run on the restarting router. Best path updates are then transmitted out to peers based upon the best path computation. Once updates are complete an end-of-RIB (routing information base) marker is sent out by peers to indicate that their updates are now complete. In this approach, the end-of-RIB marker is specified by an update message with no reachable network layer reachability information (NLRI) and empty withdrawn NLRI.
This procedure for graceful restart can be relatively slow to complete. In particular, it takes time to receive all BGP peers' routes and then to send out best paths to peers. A technique that reduces this latency is highly desirable.
An approach to preserving a BGP connection, with less latency used in so-called “high-availability” implementations of BGP architecture, is to store all state information related to the connection in a standby BGP process also termed a “stateful switchover” (SSO) that mirrors the primary BGP process. While this approach allows immediate reestablishment of the connection, and preserves the entire session, including TCP information, it is expensive both in terms of hardware and processing overhead.