The Internet is a collection of different packet-switched networks linked together to form an internetwork. In order to successfully send data from one node on the Internet to another, a protocol referred to as the Internet Protocol (IP) is used. This enables an IP datagram to be routed through the Internet from a transmitting or originating source node to a receiving or terminating destination node. As will be well known to persons skilled in the art of data networks, IP is a layer 3 or network layer protocol when compared with the ISO seven layer reference model of data networks. This essentially means that it is responsible for carrying data over multiple hops across a network or internetwork. Thus at each hop the ultimate IP address is read and an onward route is selected unless the data happens to have arrived at the destination node in which case it is passed up the layer stack.
Thus, IP is a data-oriented protocol used by source, destination and intermediate nodes (which might, for example, be a web server, a web client and multiple routers respectively) for communicating data across a packet-switched network (or, more usually, an internetwork). Furthermore, IP has the property that no specific set-up process is required before a source node attempts to transmit data to a destination node, irrespective of whether the nodes have previously communicated with one another before and irrespective of the type of data to be transmitted.
In order to achieve this, IP specifies that data is transmitted in IP datagrams, each of which comprises a header portion and a payload portion. The data to be transmitted (or a portion of it) is carried in the payload portion of an IP datagram whilst the header contains information which enables intermediate routers to process the datagram as a whole in an appropriate manner to try to deliver it to the destination node.
As mentioned above, IP represents only one layer of functionality out of many provided by an internetwork in order to enable data to be successfully transmitted over the internetwork which, by comparison with the seven layer OSI Reference Model, corresponds approximately to layer 3, the Network layer. “Beneath” the network layer is both a link layer and a physical layer in the OSI reference model, and therefore each IP datagram is likely to be encapsulated within at least one lower layer (i.e. the link layer) data packet(s) for transmission from one node on a network to another on the same network. However, each node will “strip out” the IP datagram from the received packet(s) and pass this to an IP function within each intermediate node, as well as at the destination node. The IP function within each intermediate node then reads the IP header portion to determine if it is the destination node. If it is the destination node, it will pass the contents of the payload portion of the IP datagram to the next “higher” layer function identified in the header portion of the IP datagram (e.g. to a Transport Control Protocol (TCP) function or to a User Datagram Protocol (UDP) function), it not, it will try to forward on the IP datagram towards the destination node—the mechanics of this are described in greater detail below.
Intermediate nodes which are connected to multiple different networks and which are therefore important interconnecting nodes, often having many direct connections with other nodes, are typically known as routers or gateways and usually perform data transfer as their sole or primary purpose. In order to allow a large internetwork to continue to be able to deliver IP datagrams correctly even in the event of changes to the internetwork (such as for example links or routers going down and coming back up again, or additional links or routers being added to the network to increase capacity), routers, at least (as opposed to host computers residing at the edge of the network), will tend to use a dynamic routing protocol to maintain their routing tables up to date automatically (hosts at the edge of the network may use a very simple static routing table which passes all IP datagrams, not destined for the host, to a single IP address as the next hop over a single interface to the network).
Internetworks can generally be considered as hierarchical entities which can be viewed at different scales. At a high level scale one can consider so-called Autonomous Systems (AS's). These will generally be connected together to form an internetwork of AS's. Each AS will typically comprise a network itself or even an internetwork, itself being formed from a number of smaller networks or subnetworks. Routers which connect different AS's together are often referred to as Border Gateways. In order to route traffic over an internetwork formed from a plurality of AS's, each AS maintains a routing table setting out to which neighbouring AS traffic should be sent in order to reach any given IP destination address. In some internetworks, these routing tables may be maintained in an autonomous manner using a protocol known as Border Gateway Protocol (BGP) of which the most current version at the filing date of the present application is BGP version 4 (see IETF's RFC 1771). With BGP Transport Control Protocol (TCP) connections are established between BGP “speakers” (i.e. border gateway routers) in order to transfer routing information between border gateway routers. Having set up a TCP connection with another BGP speaker, the connection is maintained indefinitely (unless one speaker or the other closes the connection or if there is some fault which causes the connection to be broken). Once a connection has been set up and initial routing information passed between the connected BGP speakers, the speakers only send further “updates” whenever there has been some significant change in the routing information held by one party or the other. In order to enable one party to determine if the other has gone down without closing the TCP connection, the routers may agree to periodically send “KEEPALIVE messages” and to maintain a “Hold Timer” which is reset whenever a KEEPALIVE message is received. In a typical implementation, the hold timer would timeout after 3 seconds and each party would send the other a KEEPALIVE message about every 1 second (routers are not permitted to send KEEPALIVE messages—over a particular TCP connection—more frequently than one every second).
Within an autonomous system, a similar mechanism is used to route IP datagrams through the network (or internetwork of sub-networks) from one point to another in which each router (and each host) again maintains a routing table. However, instead of using BGP, an Interior Gateway Protocol (IGP) is used instead. There are a number of IGP's currently in use. Dynamic routing protocols in general and IGP's in particular may be classified into two distinct types of protocol: distance vector routing protocols and link state routing protocols. One popular IGP for relatively small networks is Routing Information Protocol (RIP) which is a distance vector routing protocol which uses the User Datagram Protocol (UDP) to transmit routing information (using so-called routing-update messages) between co-operating routers instead of forming TCP connections. Using RIP, a gateway host (with a router) sends its entire routing table (which lists all the other hosts it knows about) to its closest neighbour host every 30 seconds as well as whenever there is a change in the network topology detected by a host. Upon receipt of a neighbour's routing table, each host checks to see if it needs to update its routing table in view of the newly received neighbour's routing table (e.g. because one of the routes to a particular destination, for which the sending neighbour host is currently set as the first hop in the receiving host's router table, has changed as far as the neighbour host is concerned since it last broadcast its routing table, or because a destination, for which the neighbour host is not currently set as the first hop, now appears from the received routing table to be reachable by a shorter route via the sending neighbour than via the neighbour currently set as the first hop). Since each neighbour host passes the information about its current routing table on to all of its neighbours and so on, all hosts within the network should eventually end up having routing tables which are self-consistent and specify the best (according to whatever metric is used) routes between hosts on the network, a state known as network convergence. Generally, RIP uses hop count as a way to determine network distance. (Other protocols use more sophisticated algorithms that may, for example, include delay, cost, etc. as well.)
The discussion above assumes that the topology of the network is fixed. In practice, hosts, gateways and lines often fail and come back up. Since only the best route to any given destination is remembered by any given host or gateway, the gateway needs to be notified that its current best route has gone down. if the gateway involved in that route should crash, or the network connection to it break, then it has no way of notifying neighbours of the change.
In order to handle problems of this kind, distance vector protocols must make some provision for timing out routes. The details depend upon the specific protocol. As an example, in RIP every gateway that participates in routing sends an update message to all its neighbours once every 30 seconds. Suppose the current route for network N uses gateway G. If we don't hear from G for 180 seconds, we can assume that either the gateway has crashed or the network connecting us to it has become unusable. Thus, we mark the route as invalid. When we hear from another neighbour that has a valid route to N, the valid route will replace the invalid one. Note that we wait for 180 seconds before timing out a route even though we expect to hear from each neighbour by way of a router update message every 30 seconds. Unfortunately, messages are occasionally lost by networks. Thus, RIP does not invalidate a route based on a single missed update message.
Another popular IGP is Open Shortest Path First (OSPF) protocol (defined in the Internet Engineering Task Force (IETF)'s request for comments (rfc) 2328 [1]). Unlike RIP, OSPF is a link state routing protocol in which each router has knowledge of the whole network and uses this knowledge to calculate a routing table (using an algorithm known as the Djikstra algorithm [2]). It has less overhead than RIP because it only transmits messages when there has been a change in a router's information about the network, however these messages (known as Link State Updates (LSU's) each of which contains one or more Link State Advertisements (LSA's)) may contain much more information than is transmitted in each RIP rbuting-update message.
In addition to sending LSA's whenever there is a change in the network topology (e.g. because a link or a router has gone down) each host also periodically sends out a small “Hello” message which principally acts as a “keep alive” type message but also includes a small amount of network information. In the event that a router falls to receive a Hello message from a neighbouring router within a predetermined period known as the Router Dead Interval (RDI)
(which may vary from one interface to another, although for a common network it is supposed to be the same for all links on that network according to the OSPF specification defined in (1)) the router will consider that the neighbouring router (or the link thereto) has gone down and it will adjust it's internal topology “map” accordingly and presently then send out a Link State Update to its other neighbours detailing the change.
In typical implementations of the OSPF protocol, the Hello Interval is set to a default value of 10 seconds and the Router Dead Interval is typically set to a value of 40 seconds, or four times the Hello Interval. Once a dead router is detected by a neighbouring router (which, in normal circumstances will therefore be at least after the elapse of the Router Dead Interval) the router generates a new LSA to reflect the changed topology. If a router (as opposed to just a link to that router) has gone down, all routers affected by the dead router must calculate their own LSA's and all of these are flooded throughout the network, and cause all of the routers in the network to redo the shortest path first calculation and then accordingly update (if necessary) their internal link-state database and their “topology map” and thus, if necessary, update their next hop information contained in their routing table.
Thus the time required to recover from a router failure consists of: (1) the failure detection time, (2) the LSA flooding time and (3) the time to complete the new SPF calculations and update the various topology and routing tables accordingly. As mentioned above, the failure detection time will typically take at least 40 seconds with a RDI of 40 seconds, the LSA flooding times consist of the propagation delays and any pacing delays resulting from the rate limiting of Link State Update packets sent down an interface.
Once a router receives an LSA, it schedules an SPF calculation. Since an SPF calculation using the Djikstra algorithm places a significant load on the router's processor, the router waits for some time (spfdelay time which is typically set at 5 seconds) to let other LSA's arrive before doing an SPF calculation (to avoid having to redo the calculation every time a new LSA arrives, given that LSA's are likely to arrive in groups as multiple different routers are affected by a single router going down—or coming back up). Moreover, the routers place a limit on the frequency with which SPF calculations may be performed (dictated by a variable spfHoldTime which is typically set to 10 seconds and which prevents a new spf calculation from being carried out at least until spfholdTime has elapsed since the last spf calculation). Both of these measures can introduce further delays in the time taken for a system to recover from a failure.
It has recently come to the attention of workers in this field that the time taken to recover from a network element failure in a typical implementation of an OSPF system is too long for modern requirements. This is because technology has evolved so that such networks may have very large bandwidths, and because of this a large amount of data could be lost whilst the network is in an unstable situation because it is in the process of recovering from a network element failure. A number of published documents have considered this issue and some of the most interesting of these are identified and briefly discussed below.
Alaettinoglu et al. [3] proposes reducing the HelloInterval to a millisecond range to achieve sub-second recovery from network failures, but this document does not consider any side effects of HelloInterval reduction. Since their processor model of a router assumes that data packets are forwarded by line cards in hardware and control packets are handled by the routing control processor, there is enough computation resource in the routing control processor to deal with the huge number of Hello messages. Thus they set the minimum possible value for HelloInterval without causing too many route flaps. These assumptions however are not always valid in practical network implementations.
Shaikh et al. [4] describes the use of Markov Chain based analysis of a simple network topology to obtain the expected times before high packet drop rates cause a healthy adjacency to be declared down and then back up again. The described simulation suggests that OSPF's behaviour depends only on the traffic overload factor and is insensitive to the packet size distribution, the buffer size or the packet dropping policy in effect. The paper suggests prioritising OSPF control traffic over normal data traffic in order to minimise the risk of healthy adjacencies being falsely declared as down due to congestion.
Basu and Riecke [5] study three indicators of OSPF routing stability: network convergence time, routing load on processor and the number of route flaps. They also investigate the scheme of using sub-second HelloIntervals to achieve faster recovery from network failures and conclude that 275 ms would be an optimal value for HelloInterval providing fast failure detection while not resulting in too many false alarms. The paper suggests introducing randomization into the “LSA timers” to avoid all routers issuing LSA's at the same time, causing congestion. However, the paper does not specify exactly how this could be achieved practically nor exactly what “LSA timers” they are talking about, or in what way the randomization should be introduced. Furthermore, this work still assumes that the control and data planes are physically separated.
Choudhury et al. [6] observes that reducing the HelloInterval lowers the threshold (in terms of number of LSAs) at which an LSA burst will lead to generation of false alarms. This paper also proposes explicitly marking certain key OSPF packets and arranging for the processing of these to be prioritised over both ordinary packets and other, less key, OSPF packets, especially where there is congestion [6].
IETF's rfc 4222 proposes considering the receipt of any OSPF packet (e.g. an LSA) from a neighbour as an indication of the good health of the router's adjacency with the neighbour [7]. This provision can help avoid false loss of adjacency in the scenarios where Hello packets get dropped because of congestion which caused by a large LSA burst, on the control link between two routers. Such mechanisms should help mitigate the false alarm problem significantly. However, in many practical OSPF networks there is no dedicated control link between routers and therefore LSA bursts are not the only causes of congestion which might cause Hello packets to be dropped. Many different types of control traffic for routing, signalling and network management as well as data traffic from customers contribute to network congestion, and the solution proposed in this paper will be less than completely successful in such circumstances.
More recently, Goyal, et al. [8] evaluate the best value for the HelloInterval that will lead to fast failure detection in the network while keeping the false alarm occurrence within acceptable limits and investigate the impact of both network congestion and the network topology on the optimal HelloInterval value. Additionally, they discuss the effectiveness of faster failure detection in achieving faster failure recovery in OSPF networks. Their work is similar to [5] in that it considers the tradeoff between faster failure detection and the increased frequency of false alarms. Unfortunately, this method relies heavily on the number of false alarms and this is only obtained from a network simulation tool. In a practical network, generally, a router can never know the number of local false alarms.