Routers maintain topological information about a network in order to determine the correct paths for routing packets. The routers derive this topological information by continuously communicating with other routers or other network processing devices in the network (peers). An adaptive routing algorithm is used by the routers to identify communication failures and adaptively compute new routes around the failure. An adjacency is formed whenever a router communicates with a peer and is defined as a relationship with a neighboring network processing device. The router continuously determines which adjacencies are up and which are down. The longer is takes for the router to detect that an adjacency has gone down and route around it, the greater the chance a significant number of packets will get sent over the wrong routes, or sent to a down router and hence be lost.
There are two primary reasons why packets may not be successfully sent to an adjacency. The interface or link to the adjacency may have failed or the adjacency itself may have failed. Some links, such as digital TDM (Time Division Multiplexed) channels, SONET, and direct point-to-point Ethernet connections provide fast failure detection using hardware indications such as loss-of-light, missed heartbeat, etc. Failures can also be detected using a low-level link protocol mechanism such as OAM (Operation, Administration, and Maintenance) headers in Sonet (Synchronous Optical Network). In these detection schemes, there are direct links between the two network processing devices. This allows layer 1 physical interfaces to quickly identify failures which are then identified to the layer 3 routing algorithms which route around the identified failure.
In some common network configurations, routers are not connected directly together but are connected through a layer 2 switch. For example, a switched LAN (Local Area Network) may use a Gigabit Ethernet switch that includes different ports connected to PCs and to layer 3 routers. The routers connected to different switch ports can not immediately identify failures either of other routers or of the ports and links by which they are connected to the switch. The routers currently have to rely on slow timeout mechanisms, such as missed hello packets, to detect failures on other links connected to the switch.
For example, an IGP (Interior Gateway Protocol) uses “hello” message exchanges to discover and maintain link connectivity. If one of the routers fails to receive a “hello” acknowledge message after some period of time, the router failing to acknowledge the “hello” message is assumed to have gone down. The router sending the hello message then routes around the failed link.
A substantial amount of time is required to send and wait for replies to “hello” messages. For example, in one implementation hello messages are sent once every second. A failure is assumed only after three hello message go unacknowledged. Thus, upwards of three seconds are required to detect an adjacent link or adjacent router failure. The time required to detect failures can and often does dominate the time required for a routing algorithm to determine a new network topology around a detected failure (convergence time).
The hello message failure detection process takes much longer than layer 1 protocols used for detecting failures, but because the routers in switched networks are not connected directly together, the layer 1 failure protcols cannot be used.
The present invention addresses this and other problems associated with the prior art.