§1.1 Field of the Invention
The present invention concerns IP networks. In particular, the present invention concerns failure recovery from double link failures using rerouting schemes that determine first and secondary backup ports within an IP network.
§1.2 Background Information
The Internet has evolved to a global information platform that supports numerous applications ranging from online shopping to worldwide business-related and science-related activities. For such a critical infrastructure, survivability is important in that services interrupted by equipment failures should be recovered as quickly as possible (See, e.g., S. Rai, B. Mukherjee, and O. Deshpande, “IP Resilience within an Autonomous System Current Approaches, Challenges, and Future Directions,” IEEE Commun. Mag., Vol. 43, No. 10, pp. 142-149 (October 2005).) Typically, a recovery time of tens of milliseconds satisfies most requirements (e.g., SDH/SONET automatic protection switching (“APS”) is completed within 50 ms (See, e.g., T. H. Wu and R. C. Lau, “A Class of Self-Healing Ring Architectures for SONET Network Applications,” IEEE Trans. Commun., Vol. 40, No. 11, pp. 1746-1756 (November 1992).). At the same time, it is desired that failure recovery schemes have low complexity and do not reserve redundant bandwidth.
Network failures can be caused by a variety of reasons such as fiber cut, interface malfunctioning, software bugs, misconfiguration and attacks (See, e.g., A. Markopoulou, G. Iannaccone, S. Bhattacharyya, C.-N. Chuah, and C. Diot, “Characterization of Failures in an IP Backbone,” IEEE INFOCOM (March 2004).) Despite continuous technological advances, failures have occurred even in well maintained networks.
An important issue of failure recovery is how to set up a new path to replace a damaged one. The main approaches used by today's IP networks are route recalculation and lower layer protection. Each is introduced below.
Routing protocols (such as open shortest path first (“OSPF”) (J. Moy. OSPF version 2, RFC 2328 (Standard) (April 1998)) and intermediate system to intermediate system intra-domain routing (“IS-IS”) are typically designed to perform failure advertising, route recalculation and routing table update to recover from failures. Although these mechanisms can deal with various types of failures, the time for the recovery process can easily reach seconds. Such delays can lead to long service disruptions, dropped packets, latency, etc., to an extent unacceptable for certain applications (such as stock trading systems, for example).
On the other hand, lower layer protection achieves fast recovery by establishing backup connections in advance (e.g., a time slot channel). These previously established backup connections are used to quickly replace damaged connections. In this case, the IP layer can be protected from failures without any modifications on the routing tables. However, this type of approach reserves redundant bandwidth (such as redundant links or channels on links, redundant ports, etc.) for the backup connections. More importantly, relying on lower layer protection means the IP layer is not independent in term of survivability. From this point of view, an original objective of packet switching—to design a highly survivable network where packet forwarding in each router is adaptive to the network status—is still not fully achieved (See, e.g., P. Baran, “The Beginnings of Packet Switching Some Underlying Concepts, IEEE Commun. Mag., Vol. 40, No 7, pp. 42-48 (July 2002).).
The framework of IP fast rerouting (“IPFRR”) is described in a recent draft of Internet Engineering Task Force (“IETF”). (See, e.g., M. Shand and S. Bryant, “IP fast reroute framework,”Internet-Draft, October 2005. Basically, IPFRR lets a router maintain (the identity of) a backup port for each destination and use the backup port to forward packets when the primary port fails. Since the backup ports are determined in advance and do not occupy or otherwise reserve redundant bandwidth, IPFRR can achieve fast failure recovery with great cost-efficiency.
IPFRR and the following presume that failure detection has already occurred (e.g., using known or proprietary techniques). Examples of known failure detection techniques are described in the articles, L. Fang, A. Atlas, F. Chiussi, K. Kompella, and G. Swallow. “LDP Failure Detection and Recovery,” IEEE Commun. Mag., Vol. 42, No. 10, pp. 117-123 (October 2004), and S. Q. Zhuang, D. Geels, I. Stoica, and R. H. Katz. “Fast IP Network Recovery Using Multiple Routing Configurations,” IEEE INFOCOM, Vol. 3, pp. 2112-2123 (March 2005).
IP fast rerouting (IPFRR) has gained much attention for network survivability. The idea of IPFRR is to proactively calculate backup ports that can be used to replace primary ports temporarily until the subsequent route recalculation is completed. FIGS. 1A-1C shows an example with node 1 as the destination. In normal operation, each router forwards packet to its primary port. When link 1-2 fails, node 2 and node 4 switch to their backup ports immediately to resume packet forwarding. FIG. 2 shows that IPFRR resumes disrupted. services immediately after a failure is detected, meanwhile, route recalculation can be performed to find optimal paths in the new topology. The main challenges of IPFRR are how to find the backup ports and how to coordinate routers during recovery to avoid forwarding loops. Several IPFRR-related schemes have been proposed. (See, for example, A. Atlas, “Basic Specification for IP Fast-Reroute: Loop-Free Alternates,” Internet-Draft (February 2005); S. Bryant, M. Shand, and S. Previdi, “IP Fast Reroute using Not-Via Addresses,” Internet-Draft, (October 2005); A. Kvalbein et al., “On Failure Detection Algorithms in Overlay Networks,” IEEE INFOCOM, (April 2006); S. Lee, Y. Yu, S. Nelakuditi, Z. Zhang, and C.-N. Chuah, “Proactive vs Reactive Approaches to Failure Resilient Routing,” IEEE INFOCOM, (March 2004); C. Perkins, “IP Encapsulation within IP,” RFC 2003 (Proposed Standard) (October 1996); M. Shand and S. Bryant, “IP Fast Reroute Framework,” Internet-Draft, (October 2005); K. Xi and H. J. Chao, “IP Fast Rerouting for Single Link/Node Failure Recovery,” Polytechnic Univ. Technical Report, (2006); U.S. patent application Ser. No. 11/786,417 (incorporated herein by reference), titled: “DETERMINING REROUTING INFORMATION FOR SINGLE-LINK FAILURE RECOVERY IN AN INTERNET PROTOCOL NETWORK,” filed on Apr. 10, 2007, and listing Hung-Hsiang Jonathan CHAO and Kang XI as inventors; U.S. patent application Ser. No. 11/786,416 (incorporated herein by reference), titled “DETERMINING REROUTING INFORMATION FOR SINGLE-NODE FAILURE RECOVERY IN AN INTERNET PROTOCOL NETWORK,” filed on Apr. 10, 2007, and listing Hung-Hsiang Jonathan CHAO and Kang XI as inventors.
X. Yang and D. Wetherall, “Source Selectable Path Diversity Via Routing Deflections,” ACM Sigcomm (2006); Z. Zhong, S. Nelakuditi, Y. Yu, S. Lee, J. Wang, and C.-N. Chuah, “Failure Inferencing Based Fast Rerouting for Handling Transient Link and Node Failures,” IEEE Global Internet, (March 2005).). Each of these references is incorporated herein by reference. Almost all of the references consider single-link failures or single node failures only.
Therefore, it would be useful to provide an IPFRR scheme that handles double-link failures. Although double-link failures have been investigated in optical networks (See, e.g., A. Chandak and S. Ramasubramanian, “Dual-Link Failure Resiliency through Backup Link Mutual Exclusion,” IEEE Broadnets, pp 258-267 (2005); H. Choi, S. Subramaniam, and H. Choi, “Loopback Recovery from Double-Link Failures in Optical Mesh Networks,” IEEE/ACM Trans. Netw., Vol. 12, No. 6, pp. 1119-1130 (2004); W. He and A. Somani, “Path-Based Protection for Surviving Double-Link Failures in Mesh-Restorable Optical Networks,” IEEE Globecom (2003).), the solutions suggested in optical networks cannot be used in IP networks where routing is destination-based instead of flow-based. One may argue that multiple links usually do not fail simultaneously, thus the study of double-link failure recovery is of less importance. However, when an IP topology is built on top of a WDM network, the failure of a single fiber disconnects all the logical links it carries, which results in multiple simultaneous failures and is called shared-risk link-group (SRLG) problem (See, e.g., L. Shen, X. Yang, and B. Ramamurthy, “Shared Risk Link Group (SRLG)-Diverse Path Provisioning under Hybrid Service Level Agreements in Wavelength-Routed Optical Mesh Networks,” IEEE/ACM Trans. Netw., Vol. 13, No. 4, pp. 918-931 (August 2005); and D. Xu, Y. Xiong, C. Qiao, and G. Li, “Failure Protection in Layered Networks with Shared Risk Link Groups,” IEEE Netw., Vol. 18, No. 3, pp. 36-41 (May 2004.).) Therefore, it would be useful to provide a double-link failure recovery scheme for IP networks or networks in which routing is destination-based.
§1.2.1 Previous Approaches to IP Fast Rerouting, And Perceived Limitations of Such Approaches
A simple scheme related to IPFRR is equal cost multi-paths (“ECMP”), where a number of paths with the same cost are calculated for each source/destination pair. (See, e.g., A. Iselt, A. Kirstdter, A. Pardigon, and T. Schwabe, “Resilient Routing using ecmp and mpls,” IEEE High Performance Switching and Routing (HPSR) (April 2004).) A failure on a particular path can be handled by sending packets along an alternate path. This approach has been implemented in practical networks. However, equal cost paths might not exist in certain situations (such as in a ring). Thus, it has been reported that ECMP cannot guarantee 100% failure recovery.
A scheme to find loop-free alternate paths is presented in the paper, A. Atlas, “Basic Specification for IP Fast-Reroute: Loopfree Alternates,” Internet-Draft, (February 2005). Consider the routing from S to D. If S has a neighbor X that satisfies d(X,D)<d(X,S)+d(S,D), where d(i,j) is the cost from i to j, it can send packets to X as an alternate path. The condition ensures that packets do not loop back to S. Similar to ECMP, this scheme does not guarantee 100% failure recovery since a node might not have a neighbor X that satisfies the foregoing condition.
The paper S. Bryant, M. Shand, and S. Previdi, “IP Fast Reroute using Not-Via Addresses,” Internet-Draft, (October 2005) proposes a scheme to set up a tunnel from node S to node Y that is multiple hops away. The alternate path to a destination D is from S to Y then to D. This guarantees 100% failure coverage. Unfortunately, the maintenance of many tunnels imposes extra costs, and fragmentation can occur when the encapsulated IP packet is longer than the maximum transmission unit (“MTU”).
A scheme called failure insensitive routing (“FIR”) for recovering from single-link failures is presented in the paper S. Lee, Y. Yu, S. Nelakuditi, Z. Zhang, and C.-N. Chuah, “Proactive vs Reactive Approaches to Failure Resilient Routing,” IEEE INFOCOM (March 2004). Given a primary path S→D, FIR identifies a number of key links such that removing any of these links forces the packets go back to S. Therefore, the failure of any key links can be inferred by S if a deflected packet occurs. To provide an alternate path, FIR removes the key links and runs shortest path routing from S to D. FIR is extended to cover single-node failures in the paper Z. Zhong, S. Nelakuditi, Y. Yu, S. Lee, J. Wang, and C.-N. Chuah, “Failure Inferencing based Fast Rerouting for Handling Transient Link and Node Failures,” IEEE Global Internet (March 2005). The scheme is also applicable to networks using ECMP. Unfortunately, it does not consider the general case of multi-path routing where the paths may not have equal cost. In addition, determining extra shortest paths can be computationally expensive.
An algorithm called multiple routing configuration (“MRC”) is presented in the paper A. Kvalbein et al., “Fast IP Network Recovery using Multiple Routing Configurations,” IEEE INFOCOM (April 2006). Under MRC, each router maintains multiple routing tables (configurations). After a failure is detected, the routers search for a configuration that can bypass the failure. After that, the index of the selected configuration is inserted into packet headers to notify each router which routing table to use. MRC achieves 100% failure coverage. Unfortunately MRC has to maintain multiple routing tables and add an extra index to packet headers.
The paper X. Yang and D. Wetherall, “Source Selectable Path Diversity via Routing Deflections,” ACM Sigcomm, (2006), discusses how to find multiple paths between source/destination pairs using routing deflection, and derives three conditions that achieve generic path diversity. Although the scheme is not designed for a specific application, it is shown to be promising for failure recovery. Unfortunately, directly using the scheme cannot guarantee 100% failure coverage.
In view of the foregoing, it would be useful to facilitate fast failure recovery of double link failures in IP networks, preferably without introducing high complexity and/or high resource usage.