The present invention relates to data networking and more particularly to systems and methods for providing fault tolerance to data networks.
As the Internet becomes a multi-media communications medium that is expected to reliably handle voice and video traffic, network protocols must also evolve to support quality-of-service (QoS) requirements such as latency and reliability and to provide guaranteed available bandwidths. One form that this evolution is taking is the advent of MPLS (Multi-Protocol Label Switching) Traffic Engineering which may be supplemented by Diffserv-aware Traffic Engineering. Rather than using conventional IP routing techniques where individual packets travel through the network following paths determined individually for each packet as it progresses through the network, MPLS Traffic Engineering exploits modern label switching techniques to build guaranteed bandwidth end-to-end circuits through a network of label switched routers (LSRs). MPLS has been found to be highly useful in establishing such circuits also referred to as label switched paths (LSPs). MPLS networks employing LSPs can more easily interoperate with other IP-based networks than other virtual circuit-oriented networks employing, e.g., ATM or Frame Relay. Networks based on MPLS Traffic Engineering, especially those supplemented with DiffServ-aware Traffic Engineering are very effective in handling delay and jitter-sensitive applications such as voice over IP (VoIP) and real-time video.
Meeting the demands of businesses and consumers, however, also requires that bandwidth and latency guarantees continue to be met when links or nodes fail. When failure of a link or a node causes the failure of an LSP, the standard routing protocols such as constraint-based shortest path first (CSPF) are too slow to be used for dynamic rerouting of QoS-sensitive traffic. In optical networks employing SONET, fast restoration can be provided by means of features incorporated into the SONET protocol. However, where such techniques are not available, other protection mechanisms become necessary to ensure that services are restored within a sufficiently short time, e.g., 50 ms, such that the user experience is not affected.
To address this requirement, various fast reroute techniques have been developed that provide rapid reaction to failure of a link or node such that the user experience is preserved. In one approach, a so-called “primary” LSP is protected by a series of backup LSPs bypassing individual links and nodes traversed by the primary LSP. There is potentially a separate backup LSP bypassing a given link or node for each primary LSP traversing this given link or node. In an alternate approach, links and nodes are protected by local backup tunnels that are associated with the links and nodes themselves rather than the primary LSPs traversing the links and nodes. In this alternate approach, a single backup tunnel protecting a link or node can be used for backup of all primary LSPs protecting that link or node.
To protect a link, a backup tunnel is established connecting the two nodes that the protected link connects without including the protected link in the backup tunnel. To protect a node, a backup tunnel protects each pair of links traversing the node.
To guarantee quality of service under failure conditions, it is important that the backup tunnels have sufficient capacity to support all rerouted traffic in the event of a failure. The problem of placing backup tunnels while assuring that the backup tunnels have sufficient bandwidth to maintain quality of service can be reduced to the well known problem of placing LSPs with a given set of bandwidth requirements in a network with a given link capacity. This problem is frequently referred to as the QoS-routing problem. A standard solution is to use CSPF to place the backup tunnels (LSPs) one at a time, each time finding the shortest path where the remaining link capacity satisfies the bandwidth requirements of the backup tunnel being placed.
CSPF-based computation of the backup tunnel placement is quite computationally efficient. However, the algorithm may fail to find a placement for all of the needed backup tunnels even if a placement satisfying all of the capacity constraints exists.
This disadvantage of CSPF-based backup tunnel placement techniques has motivated development of more sophisticated methods for placing backup tunnels. These methods typically attempt to determine the placement of all needed backup tunnels simultaneously, rather than one at a time as in CSPF-based methods. Unfortunately the general problem of placing N backup tunnels satisfying capacity constraints is known to be NP-complete, i.e., not computationally efficient.
What is needed are systems and methods for placing backup tunnels for fast reroute protection which are more likely to find a backup tunnel placement satisfying capacity constraints than CSPF-based methods and are sufficiently efficient to be computed quickly in a dynamically changing environment.