A computer network is a collection of interconnected computing devices that exchange data and share resources. In a packet-based network, such as the Internet, the computing devices communicate data by dividing the data into small blocks called packets, which are individually routed across the network from a source device to a destination device. The destination device extracts the data from the packets and assembles the data into its original form. Dividing the data into packets enables the source device to resend only those individual packets that may be lost during transmission.
Certain devices within the network, such as routers, maintain routing information that describes routes through the network. Each route defines a path between two locations on the network. From the routing information, the routers may generate forwarding information, which is used by the routers to relay packet flows through the network and, more particularly to relay the packet flows to a next hop. In reference to forwarding a packet, the “next hop” from a network router typically refers to a neighboring device along a given route. Upon receiving an incoming packet, the router examines information within the packet to identify the destination for the packet. Based on the destination, the router forwards the packet in accordance with the forwarding information.
Large computer networks, such as the Internet, often include many routers that exchange routing information according to a defined routing protocol, such as the Border Gateway Protocol (BGP). When two routers initially connect, the routers exchange routing information and generate forwarding information from the exchanged routing information. Particularly, the two routers initiate a routing communication “session” via which they exchange routing information according to the defined routing protocol. The routers continue to communicate via the routing protocol to incrementally update the routing information and, in turn, update their forwarding information in accordance with changes to a topology of the network indicated in the updated routing information. For example, the routers may send update messages to advertise newly available routes or routes that are no longer available.
In the event one of the routers of a routing communication session detects a failure of the session, i.e., the session “goes down,” the surviving router may select one or more alternative routes through the network to avoid the failed router and continue forwarding packet flows. In particular, the surviving router may update internal routing information to reflect the failure, perform route resolution based on the updated routing information to select one or more alternative routes, update its forwarding information based on the selected routes, and send one or more update messages to inform peer routers of the routes that are no longer available. In turn, the receiving routers update their routing and forwarding information, and send update messages to their peers. This process continues and the update information propagates outward until it reaches all of the routers within the network. Routing information in large networks may take a long period of time to converge to a stable state after a network fault due to temporary oscillations, i.e., changes that occur within the routing information until it converges to reflect the current network topology. These oscillations within the routing information are often referred to as “flaps,” and can cause significant problems, including intermittent loss of network connectivity and increased packet loss and latency.
To reduce the impact of failures, some routers include a primary routing control unit and a secondary routing control unit. In the event the primary routing control unit fails, the secondary routing control unit assumes the responsibility of forwarding packet flows. During failover from the primary routing control unit to the secondary routing control unit, a significant period of time may elapse before the secondary routing control unit reaches a state in which it is able to process and forward packets. For example, the secondary routing control unit may need to reestablish routing communication sessions, e.g., BGP sessions, that were lost when the primary routing control unit failed. During this period, network traffic may be queued or lost.
As another technique for reducing the impact of failures, the failed router may also support “non-stop forwarding,” which refers to the ability to continue forwarding packets while the routing session is reestablished. Redundant components in the failed router maintain forwarding state information during control module failure, enabling the failed router to continue forwarding packets over routes that were available in the network's last-known state. Concurrently, the failed router relearns the network topology and recalculates its routing information and forwarding information. As a result, impact on current packet flows through the network is reduced.
Some routers support “graceful restart,” which refers to the capability of preserving forwarding information while restarting a routing communication session, e.g., BGP session. When establishing a routing communication session, a router that supports graceful restart advertises the capability to neighboring routers and specifies a restart time. The restart time is the estimated time that it will take for the router to reestablish the routing communication session after failure of the previous session and may be, for example, approximately 120 seconds. Upon failure of the routing communication session, the surviving router preserves forwarding information based on the expectation that the failed router will reestablish the routing communication session shortly. In other words, the surviving router will maintain the failed router within a forwarding path of the surviving router in the event of a failure of the routing communication session. Likewise, the failed router preserves forwarding information in a state that existed prior to the failure. Consequently, the surviving router does not need to find alternative routes unless the failed router does not reestablish the routing communication session within the advertised restart time. As a result, the routing instability caused by routing flaps within the network may be reduced.