A computer network is a geographically distributed collection of interconnected communication links used to transport data between nodes, such as computers. Many types of computer networks are available, with the types ranging from local area networks to wide area networks. The nodes typically communicate by exchanging discrete packets or messages of data according to pre-defined protocols. In this context, a protocol consists of a set of rules defining how the nodes interact with each other.
Computer networks may be further interconnected by an intermediate node, such as a switch or router, to extend the effective “size” of each network. Since management of a large system of interconnected computer networks can prove burdensome, smaller groups of computer networks may be maintained as routing domains or autonomous systems. The networks within an autonomous system (AS) are typically coupled together by conventional “intradomain” routers. Yet it still may be desirable to increase the number of nodes capable of exchanging data; in this case, interdomain routers executing interdomain routing protocols are used to interconnect nodes of the various ASs.
An example of an interdomain routing protocol is the Border Gateway Protocol version 4 (BGP), which performs routing between autonomous systems by exchanging routing (reachability) information among neighboring interdomain routers of the systems. An adjacency is a relationship formed between selected neighboring (peer) routers for the purpose of exchanging routing information messages and abstracting the network topology. Before transmitting such messages, however, the peers cooperate to establish a logical “peer” connection (session) between the routers. BGP establishes reliable connections/sessions using a reliable/sequenced transport protocol, such as the Transmission Control Protocol (TCP).
The reachability information exchanged by BGP peers typically includes destination address prefixes, i.e., the portions of destination addresses used by the routing protocol to render routing (“next hop”) decisions. Examples of such destination addresses include Internet Protocol (IP) version 4 (IPv4) and version 6 (IPv6) addresses. A prefix implies a combination of an IP address and a mask that cooperate to describe an area of the network that a peer can reach. Each prefix may have a number of associated paths; each path is announced to a peer router by one or more of its peers. Note that the combination of a set of path attributes and a prefix is referred to as a “route”; the terms “route” and “path” may be used interchangeably herein. The BGP routing protocol standard is well known and described in detail in Request For Comments (RFC) 1771, by Y. Rekhter and T. Li (1995), Internet Draft <draft-ietf-idr-bgp4-23.txt> titled, A Border Gateway Protocol 4 (BGP-4) by Y. Rekhter and T. Li (April 2003) and Interconnections, Bridges and Routers, by R. Perlman, published by Addison Wesley Publishing Company, at pages 323-329 (1992), all disclosures of which are hereby incorporated by reference.
The interdomain routers configured to execute an implementation of the BGP protocol, referred to herein as BGP routers, perform various routing functions, including transmitting and receiving routing messages and rendering routing decisions based on routing metrics. Each BGP router maintains a routing table that lists all feasible paths from that router to a particular network. The routing table is a database that contains routing information used to construct a forwarding table of a forwarding information base (FIB) that is used by the router when performing forwarding decisions on packets.
Periodic refreshing of the routing table is generally not performed; however, BGP peer routers residing in the ASs exchange routing information under certain circumstances. For example, when a BGP router initially connects to the network, the peer routers exchange the entire contents of their routing tables. Thereafter when changes occur to those contents, the routers exchange only those portions of their routing tables that change in order to update their BGP peers' tables. These update messages are thus incremental update messages sent in response to changes to the contents of the routing tables and announce only a best path to a particular network.
Broadly stated, a BGP router generates routing update messages for an adjacency, also known as a peer router, by “walking-through” the routing table and applying appropriate routing policies. A routing policy is information that enables a BGP router to rank routes according to filtering and preference (i.e., the “best path”). Routing updates provided by the update messages allows BGP routers of the ASs to construct a consistent view of the network topology. The update messages are typically sent using a reliable transport, such as TCP, to ensure reliable delivery. TCP is a transport protocol implemented by a transport layer of the IP architecture; the term TCP/IP is commonly used to denote this architecture. The TCP/IP architecture is well known and described in Computer Networks, 3rd Edition, by Andrew S. Tanenbaum, published by Prentice-Hall (1996).
Often maintenance of routers (such as BGP routers) in a network is planned, thereby leading to shutdown or reset of BGP. For example, a BGP router may be shutdown and removed from service in response to, e.g., upgrading of certain hardware or rebooting of the router following a software upgrade. In addition, the router may be reset in response to changing of BGP parameters, such as when a BGP router identifier is changed. However, a planned router shutdown or reset can result in temporary outages (i.e., loss of routing information exchange) for certain routes for which the shutdown router was the best path.
In order to implement a planned shutdown or reset of BGP, the shutdown router sends a conventional BGP Notification message with error code Cease in order to close its connections with its BGP peers. Subsequently to sending the Notification message, the shutdown router closes the TCP sessions over which the connections are established. In some implementations, the Notification message may be omitted. In response to closing the connections, all original routes advertised on those connections are immediately removed (withdrawn) from service (from the FIBS) by the BGP peers. As a result, some time (i.e., a convergence time) elapses before the network re-converges. In this context, the convergence time is the time that elapses between withdrawal of a route and the time when all subsequent messages triggered by the initial route withdrawal have been exchanged. In general, this can be characterized by the time needed for a BGP router to receive and process update messages from all of its peers, select best paths for each prefix, install those best paths into the routing table and advertise the best paths back to its peers. However, in some networks, multiple such rounds of messages may be required or other factors may also play a part. This approach of simply “halting” the router and re-converging the network leads to temporary loss of routing information due to route withdrawal.
In particular, the above issue arises when the shutdown router was the best path for one or more routes. In that case, the other BGP routers within the AS will not have access to backup paths, even if they are known to certain routers within the AS because announcement of the best path suppresses advertisement of the backup paths. Thus, when the best path is withdrawn from the network, the convergence time elapses before the alternate paths are propagated and selected, leading to temporary loss of routing information. During that elapsed convergence time, traffic for affected networks can be “black holed”, i.e., the affected networks' service will be interrupted.
Previous approaches that avoid the temporary loss of routing information due to route withdrawal fall into the category of persistently advertising those routes which are not best paths. In BGP parlance, such routes are sometimes called “best external routes”. FIG. 1 is a schematic block diagram illustrating an arrangement of intermediate nodes, such as routers, within an AS of a network. Assume nodes X, Y, and Z are all BGP routers within ASN. Router Y learns a path to a destination D via a BGP router in ASM, and router Z learns a path to the same destination D via a BGP router in ASO. However, the path learned by router Y is “better than” that learned by router Z.
Assume there are internal BGP (iBGP) sessions between X and Y, Y and Z, and Z and X. If only the best path is advertised, router X has only a single path to the destination D and the next-hop for this path is router Y. In order for router X to learn a backup path (not the best), router Z has to advertise the path through ASO, and router X has to store this backup path. This would consume extra network resources (e.g., link bandwidth, processor, and memory) for advertising and storing the backup path, thereby adversely changing the scaling properties of the network.