This invention relates to communication network link-state routing protocols.
Link-state routing protocols, such as the Open Shortest Path First (OSPF) or the ISO Intermediate System to Intermediate System routing protocol (ISxe2x80x94IS), are becoming the dominant Internet technology for routing packets within Autonomous Systems. An Autonomous System (xe2x80x9cASxe2x80x9d or area) is a group of routers operating a common routing protocol and exchanging routing information.
Under link-state protocols, each router maintains a database describing the Autonomous System""s topology. In steady state, every router has an identical database. The information stored in this database includes the routers"" local states, e.g., their usable interfaces (links), reachable neighbors, and the links parameters (metrics). The routers distribute their local states throughout the Autonomous System by flooding, i.e., sending the local states to all the routers. Each router then makes its forwarding decisions based on the complete description of the topology of the routing area. All routers operate under the same protocol.
From the topological database, each router constructs a tree of shortest paths (the xe2x80x9cshortest path treexe2x80x9d or xe2x80x9cSPTxe2x80x9d) with itself as the root. The routing information obtained from other routers appears on the tree as leaves. The tree provides routes to all destinations within the Autonomous System. Dimensionless metrics describe the costs of the separate links and complete routes.
Thus, in link-state protocol networks, the process of collecting topological information from the network is separated from the process of computing the correct routes. The former is performed distributively by all the routers in an area who share state information with each other. The latter is performed locally by each router. This is the main advantage of link-state protocols, because the computation can be performed quickly and without relying on other routers.
When a cost metric of some link changes, the information need be sent once to every router in the area; the recipients then immediately update their own routing tables under the common protocol. This is in sharp contrast to distance-vector protocols, such as RIP, where multiple routing packets may have to be sent many times between the same routers in order for their routing tables to converge to a steady-state correct value.
Even though the amount of information exchanged by routers operating under link-state protocols is less than the amount of information exchanged under distance-vector protocols, it may still be large when the cost metrics of the links vary quickly, or when the number of links in an area is large. In principle, every router in the area should have at all times the latest topological information. If the topology information is not identical in all routers, routing loops may result. This means that every time that there is a cost metric change in any one of the links, all the routers in the routing area must be notified.
Flooding an entire area after a single link-state change is inefficient in terms of bandwidth and computational overhead required. Furthermore, flooding a large routing area may take a long time. During that time, different routers will have different link-state information, and transient routing loops are possible. Many of the Internet""s problems with routing instability are associated with the long delays required to propagate routing information.
One way to reduce the total amount of routing information being transmitted is to have the router responsible for a given link ignore changes in that link""s metric. The new information will simply not be propagated and routing will continue along the old paths in a sub-optimal manner. After several metric changes, the router can broadcast a cumulative update packet summarizing all the topological changes that have taken place since last update. By limiting the frequency of such updates, the amount of network information traffic can be limited at the cost of some sub-optimal routing.
This approach does not work when the link change that takes place is a link failure or extreme congestion of the link; in this case, the rest of the network must be notified immediatelyxe2x80x94withholding link failure information will cause routing failures and loss of packets. Therefore, in order to restore the paths that previously traversed the failed link, information regarding failure of a link must be propagated at least to some routers in the network immediately.
We now discuss three basic ideas regarding routing restoration with limited local updates. Although these ideas are either inefficient or do not work in all cases, they will illustrate the problem and provide the reader with a better insight into our local route restoration scheme.
As illustrated in FIG. 1, when the link between routers A and B goes down, a new path is constructed between these routers through routers N1, N2, and N3. All the traffic that should have traveled through the broken link is now diverted into this new path, which acts as a virtual link.
A tunneling scheme can be implemented as follows. After the link between A and B fails, router A detects the failure, but does not broadcast this information to the rest of the network. Instead, it uses its shortest path first (SPF) engine to compute the new shortest path to router B and records the new next hop for B. Router A then sends a special packet containing the information regarding the failed link through that next hop. When the next hop router, N1 in our example, receives the special packet, it in turn re-computes the new shortest path to B, determines and records the new next hop to B, and forwards the special packet along that next hop. This operation is repeated until router B receives the special packet, i.e., router B is the computed next hop router.
In this manner, router A and all of the routers in the restoration pathxe2x80x94routers N1, N2, and N3 in our examplexe2x80x94are informed of the link failure and are capable of forwarding packets to B along the new path. When a regular data packet that needs to traverse the failed link arrives at A, the packet is encapsulated in another packet with destination B and forwarded along the newly computed shortest path until it reaches B. Upon arrival at B, the original packet is decapsulated and forwarded according to the established routing table.
This scheme limits the update information that needs to be broadcast after a link failure. Only the routers that are part of the new path from A to B are informed of the changes in the topology. Even though the rest of the network does not know about the failure, global routing continues to function correctly, though possibly sub-optimally.
A major drawback of the tunneling scheme is that every single data packet that goes through the new path has to be encapsulated at router A. This requires A to be able to generate a new packet for every data packet that must be diverted, increasing the load on A and greatly limiting the efficiency of its packet forwarding function.
According to this approach, only the routers of the restoration path are informed of the link""s failure. These routers modify their routing tables and let their forwarding engines function as usual. In other words, along some restoration path, the new topological information is broadcast and the routing tables are re-computed, whereas the other routers continue using old routing tables. In our previous example (see FIG. 1), if routers A, N1, N2, and N3 simply re-computed their routing tables using the new information about the link failure, global routing might still function correctly. Even though router S continues to use an old routing table, a packet from S to D might still be routed correctly, although possibly sub-optimally.
We say might because this partial update scheme is not restrictive enough to guarantee proper forwarding. Because the actual path that a packet takes is determined on a hop-by-hop basis by routers with different topological information, routing loops may occur.
Consider the example in FIG. 2, where the number next to each link is the cost metric of that link. After the link between routers A and B fails, a restoration path is established between routers A and B via router C. Only the routing tables of A and C are updated.
When a packet with destination D arrives at C, it is forwarded to E because E lies on the shortest path to D. But when the packet arrives at E, which has not been informed of the failure of the link between A and B, the packet is forwarded back to C because it appears to E that the shortest path to D is C-A-B-D. Therefore, a routing loop results between routers C and E.
The problem with this scheme is that a packet can leave the restoration pathxe2x80x94in the example, A, C, and Bxe2x80x94too soon. When this happens, the packet will enter a region where routers do not have current routing tables; these routers can forward the packet back to an earlier part of the restoration path, causing routing loops.
This approach is a straightforward and intuitive attempt to modify the scheme discussed immediately above. Here, we force all the packet that would have had to traverse the failed link to travel through the entire restoration path. This can be achieved by modifying the routing tables in the routers belonging to the restoration path in such a way that all packets that would have had to traverse the failed link are now forwarded to the new next hop for B. All the packets that router A would have forwarded to B through the failed link will now be forwarded to B along the restoration path; these packets will not leave the restoration path until they reach B.
Unfortunately, this scheme may not work either. Unlike its predecessor, this scheme is too restrictive in the way next hops are selected, and this may lead to routing loops when a packet does not exit the restoration path timely.
Consider, for example, a packet traveling from node S to node E in FIG. 1. Before the failure of the link between A and B, the packet would traverse routers A, B, and N3. After the link failure, the packet is forced to traverse routers N1, N2, N3, and B. At B, the packet is forwarded back to N3, resulting in a routing loop.
Accordingly, the object of the present invention is to provide a routing algorithm that, after a link failure, restores all the paths traversing the failed link, ensure loop-free routing, and minimize communication overhead.
To accomplish the aforementioned object, we provide an algorithm that restores loop-free routing after a single link failure by informing only some of the routers in the local neighborhood of the failed link.
According to the invention, after link L between routers A and B fails, the following steps are taken:
1. A set of nodes D0 is defined as all the nodes that are descendants of L in any current shortest path tree rooted at A;
2. The link-state database of router A is modified to incorporate the change in the metric of link L;
3. The Shortest Path First engine in router A re-computes the next hop router for B, designated N1;
4. In router A, the next hop for all destinations belonging to set D0 is set to N1;
5. A special packet identifying router B and the failed link L is sent to N1;
6. In router Ni (i=1 . . . n) that receives the special packet, a set Di is defined as all the nodes that are descendants of L in any current shortest path tree rooted at Ni;
7. In Ni, the link state database is modified to incorporate the change in the metric of link L;
8. In router Ni, the SPF engine re-computes the next hop for router B, designated Ni+1;
9. In router Ni, the next hop for all destination nodes in Di is set to Ni+1;
10. If router Ni+1 is not router B, send a special packet to Ni+1 identifying router B and the failed link L;
11. Steps 6-10 are repeated until the computed next hop is router B.