A computer network is a geographically distributed collection of interconnected communication links and subnetworks for transporting data between nodes, such as computers. Many types of computer networks are available, with the types ranging from local area networks (LANs) to wide area networks (WANs). A LAN is an example of a subnetwork that provides relatively short distance communication among the interconnected stations, whereas a wide area network enables long distance communication over links provided by public or private telecommunications facilities. The nodes typically communicate by exchanging discrete frames or packets of data according to predefined protocols. In this context, a protocol consists of a set of rules defining how the nodes interact with each other.
Computer networks may be further interconnected by an intermediate node, called a router, to extend the effective “size” of each network. Since management of a large system of interconnect computer networks can prove burdensome, smaller groups of computer networks may be maintained as routing domains or autonomous systems. The networks within an autonomous system are typically coupled together by conventional intradomain routers. These routers manage communication among local networks within their domains and communicate with each other using an intradomain routing (or an interior gateway) protocol. An example of such a protocol is the Open Shortest Path First (OSPF) routing protocol described in Request for Comments (RFC) 2328, OSPF Version 2, by J. Moy (1998). The OSPF protocol is based on link-state technology and, therefore, is hereinafter referred to as a link state routing protocol.
Each router running the link state routing protocol maintains an identical link state database (LSDB) describing the topology of the autonomous system (AS). Each individual piece of the LSDB is a particular router's local state, e.g., the router's usable interfaces and reachable neighbors or adjacencies. As used herein, neighboring routers (or “neighbors”) are two routers that have interfaces to a common network, wherein an interface is a connection between a router and one of its attached networks. Moreover, an adjacency is a relationship formed between selected neighboring routers for the purpose of exchanging routing information and abstracting the network topology. One or more router adjacencies may be established over an interface.
The adjacencies are established and maintained through the use of a conventional Hello protocol. Broadly stated, the Hello protocol ensures that communication between neighbors is bi-directional by periodically sending Hello packets out all router interfaces. Bi-directional communication is indicated when the router “sees” itself listed in the neighbor's Hello packet. On broadcast and non-broadcast multi-access (NBMA) networks, the Hello protocol elects a designated router (DR) and backup designated router (BDR) for the network.
The infrastructure of a typical router comprises functional components organized as a control plane and a data plane. The control plane includes the functional components needed to manage the traffic forwarding features of the router. These features include routing protocols, configuration information and other similar functions that determine the destinations of data packets based on information other than that contained within the packets. The data plane, on the other hand, includes functional components needed to perform forwarding operations for the packets.
For a single processor router, the control and data planes are typically implemented within the single processor. However, for some high performance routers, these planes are implemented within separate devices of the intermediate node. For example, the control plane may be implemented in a supervisor processor, such as a route processor, whereas the data plane may be implemented within a hardware-assist device, such as a co-processor or a forwarding processor. In other words, the data plane is typically implemented in a specialized piece of hardware that is separate from the hardware that implements the control plane.
The control plane generally tends to be more complex than the data plane in terms of the quality and quantity of software operating on the supervisor processor. Therefore, failures are more likely to occur in the supervisor processor when executing such complicated code. In order to ensure high availability in an intermediate network node, it is desirable to configure the node such that if a failure arises with the control plane that requires restarting and reloading of software executing on the supervisor processor, the data plane continues to operate correctly. Restarting and reloading of control plane software may be necessary because of a failure with the routing protocol process, e.g., an OSPF module, or a software upgrade to the OSPF module. A router that is configured to enable its data plane to continue packet forwarding operations during restart and reload of the control plane software is referred to as a non-stop forwarding (NSF) capable router.
Each router distributes its local state throughout the domain in accordance with an initial LSDB synchronization process and a conventional asynchronous flooding algorithm. The initial LSDB synchronization procedure is performed when the router is initially connected to the network, whereas the flooding procedure is performed to ensure continuous LSDB synchronization in the presence of topology changes after the initial procedure is completed. In order to guarantee convergence of a link state routing protocol, it should be ensured that link state protocol data units (PDUs) that originate after an initial LSDB synchronization between neighbors is completed and delivered to all routers within the flooding scope limits. These limits may comprise an area or the entire AS, depending on the protocol and the type of link-state PDU. An area is a collection or group of contiguous networks and nodes (hosts), together with routers having interfaces to any of the included networks. Each area runs a separate copy of the link state routing algorithm and, thus, has its own LSDB. In the case of OSPF, the PDU is a link state advertisement (LSA) packet comprising a unit of data describing the local state of a router or network. The collected PDUs of all routers and networks form the LSDB for the particular link state routing protocol.
Coherency of the LSDB is needed for link state routing protocols, such as OSPF, to correctly calculate routing information. In order for a NSF-capable router to reload its OSPF routing protocol software, it must be able to download LSA packets received from the neighbors into its LSDB without destroying (“dropping”) the adjacencies with those neighbors. An OSPF router typically resynchronizes its LSDB with the LSDB of a neighbor by forcing a finite state machine (FSM) of the neighbor into a particular state, e.g., from a Full state to an ExStart state. The router provides a FSM per neighbor at each of its interfaces and the FSM implements various states of the adjacency between the router and its neighbor.
Yet, the OSPF standard RFC 2328 does not allow routers to resynchronize their LSDBs without changing the topological view of the network. That is, RFC 2328 does not define a means to resynchronize the databases between two neighbors without “flapping” (i.e., bringing down) the adjacency between the neighbors. Bringing down the adjacency generally disrupts traffic; this is particularly significant if the router supports failure recovery and is still capable of forwarding traffic. Moreover, bringing down all adjacencies of the router creates unnecessary network events, forcing all routers in the network to compute alternate paths.
According to the OSPF standard, after two routers have established an adjacency (i.e., the neighbor FSMs have reached Full state), the routers announce the adjacency states in their router-LSAs. The asynchronous flooding algorithm ensures that the LSDBs of the routers maintain synchronization in the presence of topology changes. However, if routers need to resynchronize their LSDBs, they cannot do so without placing the neighbor FSMs into the ExStart state. This effectively causes the adjacencies to be removed from the router-LSA packets, which may not be acceptable in some cases such as, e.g., when a NSF router restarts after reloading its routing protocol software.
Specifically, restarting of the NSF router should not impact forwarding operations. To that end, the router (i) relearns its prior existing neighbors in order to maintain those existing adjacencies and (ii) acquires all LSA packets of the neighbors to ensure coherency of its LSDB and, ultimately, its routing tables. These actions are preferably transparent to the neighbors so that they do not place their neighbor FSMs (and their adjacencies with the NSF router) into the ExStart state. Placing the neighbor FSMs into the ExStart state destroys (“drops”) the adjacencies with the NSF router and causes the neighbors (and other routers) to stop listing those adjacencies in their router-LSAs. This eventually leads to rerouting of traffic around the NSF router, thus making the router non-NSF capable. Yet, as noted, LSDB resynchronization typically requires that the neighbor FSMs be placed in the ExStart state.
Therefore, an object of the present invention is to provide an efficient technique whereby a NSF router may resynchronize its LSDB with the LSDB of a neighbor without destroying the OSPF adjacency with the neighbor.
Another object of the present invention is to provide a technique that enables neighbors to keep listing the NSF router in their router-LSA packets during resynchronization of the NSF router's LSDB.