The invention relates generally to network communications. More specifically, the invention relates to systems and methods that optimize traffic engineering for restoring traffic in IP networks constructed over a variety of physical network architectures following IP Router failures, IP link failures; or a combination.
FIG. 1 shows an exemplary IP network comprising six IP Routers and seven IP links. IP traffic may originate at any of the IP Routers and may travel to any of the other IP Routers over a sequence of one or more IP links. Each IP link may go over a sequence of Layer 1 links using a variety of technologies such as Gigabit Ethernet or Ultra Long Haul (ULH) technologies using Dense Wavelength Division Multiplexing (DWDM). FIG. 2 shows a backbone IP network comprising 28 Routers and 45 fiber links.
One challenge for IP service restoration is to provide sub-second or sub-100 ms single-failure restoration for real-time, or near real-time IP services such as Internet Protocol Television (IPTV), Voice over Internet Protocol (VoIP), gaming, and others, while maintaining efficient bandwidth utilization.
The two most prevalent methods for IP-layer (Layer 3) restoration are IP reroute and Multi Protocol Label Switching (MPLS) Fast Reroute (FRR). IP reroute is the default and the most common restoration method in large commercial IP networks. It routes traffic along the shortest path using a certain link weight metric such as latency or inverse of link capacity. It uses Interior Gateway Protocols (IGP) such as Open Shortest Path First (OSPF) or Intermediate System to Intermediate System (IS-IS) for general topology discovery and updates, and then re-computes paths upon a failure. Using default OSPF or IS-IS timer values, re-convergence may take seconds. Through the skillful tuning of OSPF/IS-IS timer values, re-convergence time can be reduced to a few seconds but sub-second convergence is not possible.
MPLS Fast Reroute is an Internet Engineering Task Force (IETF) standardized protocol where primary and backup (restoration) Label Switched Paths (LSPs) are established for next-hop or next-next-hop MPLS FRR (the former can protect against link failures and the latter can protect against link or router failures). When a failure is detected at the upstream router from the failure, the MPLS forwarding label for the backup LSP is pushed on the MPLS shim header at the upstream router and popped at the downstream router (next-hop or next-next-hop). These labels are pre-calculated and stored in the forwarding tables, so restoration is very fast (sub-100 ms restoration is achievable and traffic switchover time below 50 ms has been measured in lab experiments). However, in this scheme, IP traffic flows stay routed over the backup paths until the failure is restored. Because these paths are segmental patches to the primary paths, the technique has poor capacity for restoring all traffic assuming that the backup paths follow the shortest paths. The restoration paths also have poor latency behavior. Resource utilization and latency of the technique suffers even further if there is a subsequent failure before the original failure has been repaired which can take several hours.
The prior art has considered MPLS FRR and IP/Label Distribution Protocol (LDP) FRR which may provide sub-100 ms failure restoration, but not efficient bandwidth utilization. The prior art considered optimized traffic engineering for IP routing that has efficient bandwidth utilization but not sub-100 ms failure restoration.
What is desired is a sub-100 ms restoration system and method that maximizes sharing among single/multiple failures of links, routers, and Shared Risk Link Groups (SRLGs) while minimizing overall capacity or overall cost.