§1.1 Field of the Invention
The present invention concerns IP networks. In particular, the present invention concerns failure recovery using rerouting schemes that determine backup ports within an IP network.
§1.2 Background Information
The Internet has evolved to a global information platform that supports numerous applications ranging from online shopping to worldwide business-related and science-related activities. For such a critical infrastructure, survivability is a stringent requirement in that services interrupted by equipment failures must be recovered as quickly as possible. Typically, a recovery time of tens of milliseconds satisfies most requirements (e.g., SDH/SONET automatic protection switching (“APS”) is completed within 50 ms). At the same time, it is desired that failure recovery schemes have low complexity and do not reserve redundant bandwidth.
Network failures can be caused by a variety of reasons such as fiber cut, interface malfunctioning, software bugs, misconfiguration and attacks. Despite continuous technological advances, failures have occurred even in well maintained networks.
An important issue of failure recovery is how to set up a new path to replace a damaged one. The main approaches used by today's IP networks are route recalculation and lower layer protection. Each is introduced below.
Routing protocols (such as open shortest path first (“OSPF”) and intermediate system to intermediate system intra-domain routing (“IS-IS”) are typically designed to perform failure advertising, route recalculation and routing table update to recover from failures. Although these mechanisms can deal with various types of failures, the time for the recovery process can easily reach seconds. Such delays can lead to long service disruptions, dropped packets, latency, etc., to an extent unacceptable for certain applications (such as stock trading systems, for example).
On the other hand, lower layer protection achieves fast recovery by establishing backup connections in advance (e.g., a time slot channel). These previously established backup connections are used to quickly replace damaged connections. In this case, the IP layer can be protected from failures without any modifications on the routing tables. However, this type of approach reserves redundant bandwidth (such as redundant links or channels on links, redundant ports, etc.) for the backup connections. More importantly, relying on lower layer protection means the IP layer is not independent in term of survivability. From this point of view, an original objective of packet switching—to design a highly survivable network where packet forwarding in each router is adaptive to the network status—is still not fully achieved.
The framework of IP fast rerouting (“IPFRR”) is described in a recent draft of Internet Engineering Task Force (“IETF”). (See, e.g., M. Shand and S. Bryant, “IP fast reroute framework,” Internet-Draft, October 2005. (Online) available at http://www.ietf.org/internet-drafts/draftietf-rtgwg-ipfrr-framework-04.txt.) Basically, IPFRR lets a router maintain (the identity of) a backup port for each destination and use the backup port to forward packets when the primary port fails. Since the backup ports are determined in advance and do not occupy or otherwise reserve redundant bandwidth, IPFRR can achieve fast failure recovery with great cost-efficiency.
IPFRR and the following presume that failure detection has already occurred (e.g., using a known or proprietary techniques).
§1.2.1 Previous Approaches to IP Fast Rerouting, and Perceived Limitations of Such Approaches
A simple scheme related to IPFRR is equal cost multi-paths (“ECMP”), where a number of paths with the same cost are calculated for each source/destination pair. (See, e.g., A. Iselt, A. Kirstdter, A. Pardigon, and T. Schwabe, “Resilient routing using ecmp and mpls,” IEEE High Performance Switching and Routing (HPSR) (April 2004).) A failure on a particular path can be handled by sending packets along an alternate path. This approach has been implemented in practical networks. However, equal cost paths might not exist in certain situations (such as in a ring). Thus, it has been reported that ECMP cannot guarantee 100% failure recovery.
A scheme to find loop-free alternate paths is presented in the paper, A. Atlas, “Basic specification for IP fast-reroute: loopfree alternates,” Internet-Draft, (February 2005) (Online) available at http://www3.ietf.org/proceedings/05mar/IDs/draft-ietf-rtgwg-ipfrrspec-base-03.txt. Consider the routing from S to D. If S has a neighbor X that satisfies d(X,D)<d(X,S)+d(S,D), where d(i,j) is the cost from i to j, it can send packets to X as an alternate path. The condition ensures that packets do not loop back to S. Similar to ECMP, this scheme does not guarantee 100% failure recovery since a node might not have a neighbor X that satisfies the foregoing condition.
The paper S. Bryant, M. Shand, and S. Previdi, “IP fast reroute using not-via addresses,” Internet-Draft, (October 2005) (Online) available at http://www.ietf.org/internet-drafts/draft-bryant-shand-ipfrrnotvia-addresses-01.txt, proposes a scheme to set up a tunnel from node S to node Y that is multiple hops away. The alternate path to a destination D is from S to Y then to D. This guarantees 100% failure coverage. Unfortunately, the maintenance of many tunnels imposes extra costs, and fragmentation can occur when the encapsulated IP packet is longer than the maximum transmission unit (“MTU”).
A scheme called failure insensitive routing (“FIR”) for recovering from single-link failures is presented in the paper S. Lee, Y. Yu, S. Nelakuditi, Z. Zhang, and C.-N. Chuah, “Proactive vs reactive approaches to failure resilient routing,” IEEE INFOCOM (March 2004). Given a primary path S→D, FIR identifies a number of key links such that removing any of these links forces the packets go back to S. Therefore, the failure of any key links can be inferred by S if a deflected packet occurs. To provide an alternate path, FIR removes the key links and runs shortest path routing from S to D. FIR is extended to cover single-node failures in the paper Z. Zhong, S. Nelakuditi, Y. Yu, S. Lee, J. Wang, and C.-N. Chuah, “Failure inferencing based fast rerouting for handling transient link and node failures,” IEEE Global Internet (March 2005). The scheme is also applicable to networks using ECMP. Unfortunately, it does not consider the general case of multi-path routing where the paths may not have equal cost. In addition, determining extra shortest paths can be computationally expensive.
An algorithm called multiple routing configuration (“MRC”) is presented in the paper A. Kvalbein et al., “Fast IP network recovery using multiple routing configurations,” IEEE INFOCOM (April 2006). Under MRC, each router maintains multiple routing tables (configurations). After a failure is detected, the routers search for a configuration that can bypass the failure. After that, the index of the selected configuration is inserted into packet headers to notify each router which routing table to use. MRC achieves 100% failure coverage. Unfortunately MRC has to maintain multiple routing tables and has to add an extra index to packet headers.
The paper X. Yang and D. Wetherall, “Source selectable path diversity via routing deflections,” ACM Sigcomm, (2006), discusses how to find multiple paths between source/destination pairs using routing deflection, and derives three conditions that achieve generic path diversity. Although the scheme is not designed for a specific application, it is shown to be promising for failure recovery. Unfortunately, directly using the scheme cannot guarantee 100% failure coverage.
In view of the foregoing, it would be useful to facilitate fast failure recovery in IP networks, preferably without introducing high complexity and/or high resource usage.