A computer network is a collection of interconnected computing devices that exchange data and share resources. In a packet-based network, such as the Internet, the computing devices communicate data by dividing the data into small blocks called packets, which are individually routed across the network from a source device to a destination device. The destination device extracts the data from the packets and assembles the data into its original form. Dividing the data into packets enables the source device to resend only those individual packets that may be lost during transmission.
Certain devices within the network, such as routers, maintain routing information that describes routes through the network. Each route defines a path between two locations on the network. From the routing information, the routers may generate forwarding information, which is used by the routers to relay packet flows through the network and, more particularly to relay the packet flows to a next hop. In reference to forwarding a packet, the “next hop” from a network router typically refers to a neighboring device along a given route. Upon receiving an incoming packet, the router examines information within the packet to identify the destination for the packet. Based on the destination, the router forwards the packet in accordance with the forwarding information.
Large computer networks, such as the Internet, often include many routers that exchange routing information according to a defined routing protocol, such as the Border Gateway Protocol (BGP). When two routers initially connect, the routers exchange routing information and generate forwarding information from the exchanged routing information. Particularly, the two routers initiate a routing communication “session” via which they exchange routing information according to the defined routing protocol. The routers continue to communicate via the routing protocol to incrementally update the routing information and, in turn, update their forwarding information in accordance with changes to a topology of the network indicated in the updated routing information. For example, the routers may send update messages to advertise newly available routes or routes that are no longer available.
In the event one of the routers of a routing communication session detects a failure of the session, i.e., the session “goes down,” the surviving router may select one or more alternative routes through the network to avoid the failed router and continue forwarding packet flows. In particular, the surviving router may update internal routing information to reflect the failure, perform route resolution based on the updated routing information to select one or more alternative routes, update its forwarding information based on the selected routes, and send one or more update messages to inform peer routers of the routes that are no longer available. In turn, the receiving routers update their routing and forwarding information, and send update messages to their peers. This process continues and the update information propagates outward until it reaches all of the routers within the network. Routing information in large networks may take a long period of time to converge to a stable state after a network fault due to temporary oscillations, i.e., changes that occur within the routing information until it converges to reflect the current network topology. These oscillations within the routing information are often referred to as “flaps,” and can cause significant problems, including intermittent loss of network connectivity and increased packet loss and latency.
To reduce the impact of failures, some routers include a primary routing control unit and one or more secondary routing control units. In the event the primary routing control unit fails, one of the secondary routing control units assumes routing responsibilities. In some situations, the failed router may support “non-stop forwarding,” which refers to the ability to continue forwarding packets while the routing session is reestablished. Redundant components in the failed router maintain forwarding state information during control module failure, enabling the failed router to continue forwarding packets over routes that were available in the network's last-known state. As a result, impact on current packet flows through the network may be reduced.
However, routing functionality is not restored within the failed router until the secondary routing control unit reestablishes the failed routing communication sessions. Specifically, the secondary routing control unit must establish the routing communication sessions, relearn the network topology including any updates that have occurred since failure of the primary routing control unit, and recalculate its routing information and forwarding information. This process may take a significant period of time. For example, the secondary routing control unit may need to establish a significant number of routing communication sessions, e.g., BGP sessions, that were lost when the primary routing control unit failed. Further, typical routing protocols place significant demands on the router in terms of generating and maintaining state information associated with each routing protocol.
As a result, the failed router is unable to respond to network topology changes that occur during this recovery period. This lengthy delay in reestablishing routing functionality with the secondary routing control unit may result in sub-optimal forwarding decisions. In certain situations, the inability to respond to topology changes may lead to packets being delayed or dropped entirely.