Traditional IP routing (e.g. Interior Gateway Protocols [IGPs] such as Open Shortest Path First [OSPF] or Intermediate System to Intermediate System [ISIS]) has relatively slow fail-over properties. Hence, the Internet Engineering Task Force (IETF) routing working group and also the research community has been considering several alternatives for IP Fast Re-Route (IPFRR).
The basic components of almost all previously considered IPFRR proposals are the following:                Fast failure detection, locally. This is assumed to be already existing. Mechanisms exist like Bidirectional Forwarding Detection (BFD) or lower layer upcalls if the lower layer detects the failure (loss of signal). IPFRR solutions rely on fast failure detection but do not target it as a problem.        Pre-calculated backup paths. The routing engine can prepare for failures by pre-calculating alternate paths (i.e. alternate next-hops) that should be used in case of failures.        Pre-downloaded backup forwarding entries. The next-hops are not only pre-calculated but they are also pre-downloaded into the forwarding engine, i.e. the linecards so that they can be used instantly upon a trigger.        Switch-over to backup forwarding entries within the forwarding engine. The fast failure detection is processed in the line cards and the FIB change is performed locally and instantly without any involvement of the control plane. (FIB stands for Forwarding Information Base, and is also known as a Forwarding Table.)        Ensure consistent forwarding in other hops. Since no one else knows about the failure every other node has the same FIB as before the failure. Due to IP's hop-by-hop forwarding nature, a neighbour might route the packet back towards the point of local repair (still believing that it is the shortest path) which results in a forwarding loop, meaning that the failure was not handled. See FIG. 1 of the accompanying drawings.        Some previous proposals suggest suppressing IGP convergence temporarily. The goal is to see if failure is persistent, and if yes then let the control plane IGP re-converge onto new paths globally. Otherwise, in case of transient failures, it is possible to completely hide down and up events quickly following each other. When the failure disappears, it is possible to use the original paths, and avoid unnecessary Control Processor (CP) reconfiguration.        
The above mechanism is extremely useful to eliminate a big portion of the tasks that traditional re-routing procedures performed: to be able to respond to a failure there is no need to start calculating the new paths (a control plane task) and there is no need to download the results to the forwarding card.
It can be easily seen that in order to make consistent (i.e. loop-free) forwarding decisions in arbitrary failures on arbitrary topologies, remote nodes must get some form of information about the failure. This is illustrated in FIGS. 1 and 2 of the accompanying drawings.
Almost all existing proposals have tried to provide this information implicitly within the re-routed data packets.                Piggyback on user data packets                    Bits in packet header (MRC [A. Kvalbein, A. F. Hansen, T. Cicic, S. Gjessing, O. Lysne, “Fast IP networkrecovery using multiple routing configurations”, Network Operations and Management Symposium, 2008. NOMS 2008. IEEE, 2008])            Encapsulation header (Not-Via [S. Bryant, M. Shand, S. Previdi, “IP fast reroute using Not-via addresses”, Internet Draft, available online: http://tools.ietf.org/html/draft-ietf-rtgwg-ipfrr-notvia-addresses-05, 2010])                        Packet direction (FIFR [J. Wand, S. Nelakuditi, “IP fast reroute with failure inferencing”, In Proceedings of ACM SIGCOMM Workshop on Internet Network Management—The Five-Nines Workshop, 2007], LFA U-turn [A. Atlas, “U-turn alternates for ip/ldp fast-reroute”, Internet Draft, available online: http://tools.ietf.org/html/draft-atlas-ip-local-protect-uturn-03, 2006])        
One exception is Loop Free Alternates (LFA) [A. Atlas, A. Zinin, “Basic specification for IP Fast-Reroute: Loop-Free Alternates”, Internet Engineering Task Force: RFC 5286, 2008] but that cannot guarantee full failure coverage. LFA is only dealing with failure situations where the node detecting the failure can on its own find an alternate neighbour who provides a loop free path with default routing.
The proposal of Hokelek et al [“Loop-Free IP Fast Reroute Using Local and Remote LFAPs”, http://tools.ietf.org/html/draft-hokelek-rlfap-01] makes another important exception. They propose to advertise the failure explicitly in a signalling message, which allows distant nodes to switch to new forwarding configurations upon the reception of the notification.
The present applicant has therefore appreciated that most of the IPFRR solution proposals try to implicitly incorporate the notification into the data packets. There is one exception, LFA [referenced above], which is trying to select safe alternative next-hops which do not loop the packet or do not forward it through the failure. The drawback of LFA is that it cannot guarantee full failure coverage. For example in FIG. 1, in case of link failure A-D, node A has no loop free alternates towards destination D.
The present applicant has appreciated that neither of the existing proposals is acceptable from a practical implementation perspective.
Not-Via [referenced above] and similar solutions rely on tunnelling. However, encapsulation is not preferred due to fragmentation at Maximum Transmission Unit (MTU). Both segmentation and reassembly at the tunnel end-point decrease forwarding performance. Also, Not-Via requires special tunnel endpoint addresses, the management of which is cumbersome. MRC assumes that packet marking is used to encode a new routing configuration ID. There are, however, no viable bits in the IP header for this purpose, and encapsulation would cause the same problems as for Not-via.
FIFR on the other hand relies on interface-specific forwarding, i.e. remote nodes infer the fact of the failure from the incoming direction of the packet. A typical router's design has the same replica of the forwarding table at each linecard (serving multiple interfaces/adjacencies)—an assumption deep in HW/SW which is extremely hard to change.
Explicit failure notification signalling has to be extremely fast not to have the same problem as with the traditional flooding mechanism of OSPF or ISIS. The draft by Hokelek et al about “Loop-Free IP Fast Reroute Using Local and Remote LFAPs” [referenced above] does not describe how the failure notification has to be flooded fast and without additional control plane delays in each hop, or how the FIB has to be updated rapidly.
The reason why the Hokelek et al draft does not deal with such important problems, is that this solution was developed originally for wireless ad-hoc routing; for ad-hoc routing the most important is to minimize the protocol overhead, the area of notification propagation. Since this area was limited seriously (only a few hops), the delay caused by the propagation is insignificant. Moreover, such networks do not have to deal with numerous prefixes, thus updating is not an issue either.
Normally, in the current state of the art, if a Forwarding Processor (FP, typically a linecard) receives a notification packet of a protocol, which needs to be disseminated and processed at the same time, the notification is sent to the separated Control Processor (CP). The CP processes the packet, ensures the flooding of the information and reconfigures the FPs. This is illustrated in FIG. 3 of the accompanying drawings, which shows a process carried out by a previously-considered router. A Forwarding Processor (FP, typically a linecard) receives a notification packet of a protocol in step 1, the notification packet being of a type that needs to be disseminated and processed. The notification is sent to a separate Control Processor (CP) for processing in step 2. The CP processes the packet in step 3, and arranges for the forwarding of the packet to the FPs in step 4, which in turn floods the information to other routers (step 5). Through the processing carried out by the CP, the CP also reconfigures the FPs. However, CP interaction is not preferred if the goal is to provide instant flooding of the incoming message (and maybe even instant processing after flooding). If the control plane is involved then reaction times are hard to be guaranteed to be sub-second, never mind in the order of milliseconds that would be desired for carrier-grade fail-over performance.
It is desirable to find efficient ways of handling failure notifications.