A computer network is a collection of interconnected computing devices that can exchange data and share resources. In a packet-based network, such as the Internet, the computing devices communicate data by dividing the data into small blocks called packets, which are individually routed across the network from a source device to a destination device. The destination device extracts the data from the packets and assembles the data into its original form. Dividing the data into packets enables the source device to resend only those individual packets that may be lost during transmission.
Certain devices, referred to as routers, maintain routing information that describes routes through the network. A “route” can generally be defined as a path between two locations on the network. Upon receiving an incoming packet, the router examines information within the packet and forwards the packet in accordance with the routing information.
In order to maintain an accurate representation of a network, routers typically send periodic packets to each other to communicate the state of the device. These periodic packets are sometimes referred to as “keepalives” or “hellos.” For example, a first router may send a packet to a second router every five seconds to verify that the router is still operational. The first router may require the second router to respond in a certain amount of time. When a response packet is not received in the allotted time frame, the first router expecting the message may conclude a network failure has occurred, such as failure of the second router or failure of the link connecting the two routers. Consequently, the first router may update its routing information to exclude that particular link, and may issue a number of update messages to neighboring routers indicating the link failure.
However, a number of non-failure conditions may prevent the second router from responding to the first router within the required periodic response time. For example, the computing resources of the second router may be consumed due to heavy network traffic loads. In other words, with the increased amount of network traffic on the Internet, for example, many conventional routers have become so busy performing other functions, such as route resolution, that the response time to periodic packets is not sufficient. Furthermore, the increased complexity of current routers has increased the number of processes concurrently executing on the router, each of which require computing resources. In addition, there has been continual demand to shorten the allowable time to respond to such periodic messages in order to accelerate the detection of network failure conditions.
Failure to respond due to these and other conditions can result in significant network thrashing and other problems. For example, a router may have a route resolution process that requires a significant period of time, e.g., ten or more seconds, for convergence due to the complexity of the network topology. This period may exceed that allowable response time to a periodic packet. By the time the router has sufficient computing resources to respond to the periodic packet, the neighboring router may already mistakenly interpret the router or link as failed. Consequently, the neighboring router may update its routing information to exclude the “failed” router. Furthermore, the neighboring router may send update messages to its neighboring routers indicating the failure, causing its neighboring routers to perform route resolution in similar fashion. Shortly thereafter, the “failed” router may have sufficient resources to send its neighboring router a response packet indicating that it is operational. As a result, the neighboring router again updates its routing information to include the router and sends another update message to its neighbors, causing the neighboring routers to once again perform route resolution. The unnecessary route resolution and update messages cause the network routers to thrash, creating significant network delays.