Modern backbone and transport networks are highly complex networks that strive to carry services with quality-of-service (QoS) guarantees. These networks support general topologies and dynamic routing of bandwidth guaranteed connections, yet at the same time they aim to provide fast recovery from network failures. Traditionally ring-based synchronous optical networks (SONETs) have offered 50 millisecond (ms) restoration to bandwidth guaranteed services, using pre-reserved spare protection capacity and pre-planned protection paths. Pre-planning protection in rings has been especially attractive, because of the availability of exactly one backup path between any two nodes, leading to very simple and fast automatic protection switching mechanisms. However, in ring-based SONET networks, these advantages come at the cost of reserving at least half the total capacity for protection, thus requiring 100% redundancy.
Recently mesh-based networks have received much attention due to the increased flexibility they provide in routing connections, thus leading to more efficient utilization of network resources. Also, mesh networks are appealing due to the high degree of protection capacity sharing that is possible in these networks.
Designing efficient protection schemes for mesh networks that achieve the fast restoration times of ring-based SONET networks, and yet do not require the overbuild generally associated with these networks, has remained a challenging problem. Overbuild generally refers to the amount of redundancy needed to support protection. In general, most protection schemes including those for SONET and ring-based schemes have been designed to protect against a single link failure. It is also a challenging problem to design efficient protection schemes that protect against multiple link failures.
Recently, fast restoration for mesh networks has gained momentum in the context of Multi-Protocol-Label-Switching (MPLS) networks. The MPLS fast restoration mechanism, referred to as fast or local reroute, supports a local repair capability, where upon a node or link failure, the first node upstream from the failure reroutes the effected Label Switch Paths (LSP) onto bypass (backup) tunnels with equivalent guaranteed bandwidths. Bandwidth guarantees are important since it is the most likely reason for setting up QoS guaranteed LSPs. Also, one way of incorporating other QoS constraints such as end-to-end delays and losses is to convert these into an effective bandwidth requirement, for the LSPs. The MPLS fast reroute mechanism allows for bandwidth sharing between bypass tunnels protecting independent resources, thus resulting in efficient capacity utilization.
Two different techniques for local protection in MPLS networks have been proposed. The first technique is referred to as a one-to-one backup technique. The one-to-one backup technique creates bypass LSPs for each protected service carrying LSP, at each potential point (link or node) of local repair. The second technique is referred to as a facility backup technique. The facility backup technique creates a bypass tunnel to protect a potential failure point (link or node), such that by taking advantage of the MPLS label stacking mechanism, a collection of LSPs with similar backup constraints can be jointly rerouted, over a single bypass tunnel.
In general, the one-to-one backup technique does not scale well with the number of supported protected LSPs, since the number of bypass tunnels can quickly become very large, not to mention the enormous load on signaling and routing to support these extra tunnels. In addition, for implementing the one-to-one backup technique, either extensive routing extensions are needed to propagate the set of bypass LSPs and their attribute information, resulting in heavy load on the control plane, or the amount of achievable sharing of protection capacity is sacrificed, by limiting the amount of state that is propagated in the routing updates, thus requiring large amounts of spare capacity for protection. In contrast, the facility backup technique is free from many of the drawbacks of the one-to-one backup technique.
In general, the protection schemes for optical and MPLS networks can be classified based on whether the protection is local (link-based) or end-to-end (path-based), and whether the backup resources are dedicated or shared. Fast or local reroute mechanisms, outlined above, are instances of link-based protection. In path-based protection, the entire primary service carrying path is backed up by alternate protection paths, such that any failure on the primary path results in its traffic getting rerouted over its protection paths. In path-based protection the reroute is done by the end nodes of the path. Compared to link-based protection, recovery may be slower in path-based protection schemes, partly because failure information has to reach the end nodes before restoration can be initiated, and partly because even a failure of a single link may affect primary paths of many different ingress egress pairs, all of which may initiate path protection in parallel, resulting in high signaling loads and contention for common resources and crankbacks.
The protection schemes can be further classified as being pre-planned (e.g., SONET) or event-driven (dynamic). The latter involves computing bypass routes and reserving protection bandwidth at the time when the working path is provisioned. These schemes rely on heavy signaling to maintain the reservations and to effect the rerouting on the failure of a link. These schemes although very efficient in lowering the overbuild tend to have longer restoration times.
For pre-planned facility-based fast reroute, the main existing approaches are through the use of rings in mesh topology. Once the set of rings are identified, then pre-planned protection schemes (e.g., as in SONET) are employed. In some of these approaches, the network is designed in terms of rings or by partially using rings. Thus, these schemes are only applicable to constrained topologies.
Some other protection schemes provide protection by embedding rings in a mesh-based topology. In these schemes each link is covered by a cycle leading to a cycle cover for the network. Each of these cycles is provisioned with enough protection capacity to cover the links that belong to it. On the failure of the link, the working traffic is rerouted over the protection capacities in the surviving links of the covering cycle. There are two drawbacks of this problem: one, the overbuild can be significant and, second, it is hard to find the smallest cycle cover of a given network.
An improvement to these schemes is based on the notion of p-cycle. Here, the main idea is that a cycle can be used to protect not just the links on the cycle but also the chords (spokes) of the cycle, thus showing that far fewer rings may be sufficient for providing full protection. An alternative to cycle covers, intended to overcome the difficulty of finding good covers, is to cover every link in a network with exactly two cycles. A set of cycles that meets this requirement is called a double cycle cover. For planar graphs, double cycle covers can be found in polynomial-time. For non-planar graphs, it is conjectured that double cycle covers exist, and they are typically found quickly in practice. However, even for double cycle cover-based protection schemes, the required network overbuild can be significant. Note the all the ring-based approaches suffer from the drawback that after any topology change, the structure of the solution may change dramatically, thus limiting their scalability.
Non-ring based approaches to link restoration on mesh networks is generalized loop-back, where the main idea is to select a digraph, called the primary, such that the conjugate digraph, called the secondary, can be used to carry backup traffic for any link failure in the primary.
However, improved network design techniques for supporting fast restoration with minimum overbuild are needed.