Currently, the number of data networks and the volume of traffic these networks carry are increasing at an ever increasing rate. The network devices making up these networks generally consist of specialized hardware designed to move data at very high speeds. Typical asynchronous packet based networks, such as Ethernet or MPLS based networks, are mainly comprised of end stations, hubs, switches, routers, bridges and gateways. A network management system (NMS) is typically employed to provision, administer and maintain the network.
Multiprotocol Label Switching (MPLS) based networks are becoming increasingly popular especially in traffic engineering IP networks. MPLS uses a label switching model to switch data over a Label Switched Path (LSP). The route of an LSP is determined by the network layer routing function from the topology of the network and the demands of the user. Any suitable link state routing protocol may be used such as Open Shortest Path First (OSPF) or Intermediate System to Intermediate System (ISIS) routing protocol to provide the link state topology information needed by the network layer routing to engineer data traffic. LSPs may be setup using any suitable signaling protocol such as RSVP-TE or CR-LDP.
There is increasing demand by users that the IP network include a mechanism for fast repair of failed links or nodes. Since a LSP traverses a fixed path in the network, its reliability is dependent on the links and nodes along the path. It is common for many networks to provide some form of protection in the event of failure. For example, in the event of a link or node failure, the network can be adapted to switch data traffic around the failed element via a protection route.
The protection of traffic can be accomplished in several ways using the MPLS framework. Two ways that traffic can be protected using MPLS include recovery via LSP rerouting or via MPLS protection switching or rerouting actions.
The two basic models for path recovery include path rerouting and protection switching. Protection switching and rerouting may be used in combination. For example, protection switching provides a quick switchover to a recovery path for rapid restoration of connectivity while slower path rerouting determines a new optimal network configuration at a later time.
In recovery by path rerouting, new paths or path segments are established on demand for restoring traffic after the occurrence of a fault. The new paths may be chosen based upon fault information, network routing policies, pre-defined configurations and network topology information. Thus, upon detecting a fault, paths or path segments to bypass the fault are established using the signaling protocol. Note that reroute mechanisms are inherently slower than protection switching mechanisms, since more processing and configuring must be done following the detection of a fault. The advantage of reroute mechanisms is that they are simpler and cheaper since no resources are committed until after the fault occurs and the location of the fault is detected. An additional advantage of reroute mechanisms is that the LSP paths they create are better optimized, and therefore consume less network resources.
Note also that once the network routing algorithms have converged after a fault, it may be preferable to re-optimize the network by performing a reroute based on the current state of the network and network policies in place.
In contrast to path rerouting, protection switching recovery mechanisms pre-establish a recovery path or path segment based on network routing policies and the restoration requirements of the traffic on the working path. Preferably, the recovery path is link and node disjoint with the working path. When a fault is detected, the protected traffic is switched over to the recovery path(s) and restored.
The resources (i.e. bandwidth, buffers, processing, etc.) on the recovery path may be used to carry either a copy of the working path traffic or extra traffic that is displaced when a protection switch occurs leading to two subtypes of protection switching. In the first, known as 1+1 protection, the resources (bandwidth, buffers, processing capacity) on the recovery path are fully reserved, and carry the same traffic as the working path. Selection between the traffic on the working and recovery paths is made at the path merge LSR (PML).
In the second, known as 1:1 protection, the resources (if any) allocated on the recovery path are fully available to low priority or excess information rate (EIR) traffic except when the recovery path is in use due to a fault on the working path. In other words, in 1:1 protection, the protected traffic normally travels only on the working path, and is switched to the recovery path only when the working path has a fault. Once the protection switch is initiated, the low priority or EIR traffic being carried on the recovery path is displaced by the protected traffic. This method affords a way to make efficient use of the recovery path resources.
An example of protection switching in MPLS networks is described below. A diagram illustrating an example MPLS based network incorporating a bypass tunnel is shown in FIG. 1. The network, generally referenced 10, comprises a plurality of label switched routers (LSRs) 12 connected by links 14. Backup tunnels are established for protecting LSPs statically by the management station or using RSVP signaling. RSVP extensions for backup LSP tunnels have been defined. To meet the needs of real-time applications such as video on demand, voice over IP, etc., it is desirable to affect the repair of LSP tunnels within tens of milliseconds. Protection switching can provide such repair times.
For example, LSR1 creates a tunnel to LSR5 via the path [LSR1, LSR2, ISR3, LSR4, LSR5]. LSR2 can provide a repair by creating a partial backup tunnel [LSR2, LSR8, LSR9, LSR4], as shown by the dashed line 16, which merges with the original tunnel [LSR1, LSR2, LSR3, LSR4, LSR5] at LSR4. For each LSP to be backed up, another backup LSP is established.
The LSPs can also be protected (i.e. backed up) using the label stacking capabilities of MPLS Instead of creating a separate LSP for every backed-p LSP tunnel, a single LSP is created which serves to backup a set of tunnels. Such a tunnel is termed a bypass tunnel. The bypass tunnel itself is established just like any other LSP tunnel. The bypass tunnel must intersect the original tunnel(s) somewhere downstream of the point of repair. Note that this implies that the set of tunnels being backed up all pass through a common downstream node. Candidates for this set of tunnels include all tunnels that pass through the point of local repair and through this common node which do not use the facilities being bypassed.
To repair the backed up tunnels, packets belonging to a repaired tunnel are redirected onto the bypass tunnel. An additional label representing the bypass tunnel is stacked onto the redirected packets. At the last LSR of the bypass tunnel, the label for the bypass tunnel is popped off the stack, revealing the label that represents the tunnel being backed up. An alternative approach is to pop the bypass-tunnel label at the penultimate LSR of the bypass tunnel.
With reference to FIG. 1, LSR2 in this case would build a bypass tunnel [LSR2, LSR8, LSR9, LSR4] represented by the dashed line 16. The backup path for [LSR1, LSR2, LSR3, LSR4, LSR5] rejoins the original path at LSR4, but its path is now [LSR1, LSR2, LSR4, LSR5] with the bypass tunnel as the connection between LSR2 and LSR4.
Note that this bypass tunnel can also be a backup for tunnels from any of LSR1, LSR2, LSR6 or LSR8 to any of LSR4, LSR5, or LSR10 that traverse the path LSR2 to LSR3 to LSR4 in case of a failure of the link LSR2 to LSR3 or of the node LSR3. A bypass tunnel for protecting the link between LSR2 and LSR3 can also be created (this tunnel should start at LSR2 and end at LSR3).
The above describes the use of protection tunnels to quickly restore traffic after a link or a node failure. In some cases, the use of protection tunnels, however, does not protect against node failures. Due to the practice of sharing the same protection bandwidth between multiple protection tunnels, the case of multiple failures may result in there not being sufficient bandwidth to restore traffic on all the working LSPs that share the tunnel either by activating additional protection tunnels or by rerouting these LSPS In addition, when a link fails, the protection tunnels traversing that link are not available to protect other links in the event they fail. Thus, when a node or link fails, the part of the network that has its protection bandwidth available for protection tunnels does not allow creation of protection tunnels for these links. This means that the next node or link that fails will not be protected.
There is therefore a need for a protection mechanism that is capable of protecting working paths against failures that are not protected by a protection tunnel. Such failures include for example node failures and link failures that were not originally protected by a protection tunnel or were originally protected but the protection tunnel is not available.