MPLS Technology
MPLS is a communication technology for forwarding packet-based traffic along pre-established logical paths called label switched paths (LSPs, a.k.a. tunnels), based on short labels associated therewith which allow identifying, classifying and forwarding data over LSPs. MPLS is designed to offer a reliable packet delivery, with predictable quality of service (QoS) guarantees, and supports traffic engineering (TE) to optimize usage of network resources.
An LSP is used for conveying traffic from a source (a.k.a., ingress, Head) node (a.k.a., label switching router, LSR) downstream to its destination (a.k.a., egress, Tail) LSR. The LSP may traverse intermediate (a.k.a., transit) LSRs. If there are no intermediate LSRs, the LSP is referred to as a single hop LSP.
FIG. 1 illustrates an LSP that originates at ingress LSR1, traverses through intermediate LSR2 (from port “A” to port “B”) and extends to LSR3 where it is terminated. The LSP path may be summarized as 1-2-3.
Also illustrated in this Figure, is the MPLS label processing: ingress LSR1 pushes label 31 to an arriving packet as allocated by LSR2, while intermediate LSR2 swaps the label with another label, 96, as allocated by LSR3, so that the packet may be conveyed from LSR2 to LSR3.
Fast Reroute (FRR)
Fast reroute (FRR) is an MPLS resiliency mechanism for providing fast traffic restoration upon a link or a node failure, occurring along the LSP. With FRR, detours are pre-established along the LSP, thereafter an interrupted traffic stream may quickly be rerouted around a failed link or a failed node. This enables to complete recovery within a short period of time (under 50 milliseconds), thereby minimizing the adverse impact upon the traffic being conveyed at the time when the failure occurred. For the sake of simplicity, it shall be assumed hereinafter that upon failure of a link, both directions of that failing link are considered to be down. In other words, when the link illustrated in FIG. 1 in the direction that extends from LSR1 to LSR2, fails, that implies that the link in the direction that extends from LSR2 to LSR1 failed too.
In case of a failure, the LSR located upstream of the failure (a.k.a., point of local repair, PLR), redirects the traffic of the so-called working LSP onto a pre-established (P2P) backup LSP (a.k.a. bypass LSP), which conveys the traffic so that it is rerouted around the failure. The backup LSP brings the traffic from the PLR to an LSR located downstream of the failure (a.k.a., Merge Point, MP). Thereafter, the traffic resumes the original working LSP. The MP also serves as the egress LSR for the backup LSP.
For the sake of simplicity it will be assumed that the MP is the closest LSR located downstream of the failure. Accordingly, with FRR link protection, the MP is the next-hop (NH) LSR, i.e., it is the LSR located at the far end of the protected link. Similarly, with node protection, the MP is the next-next-hop (NNH) LSR, i.e. the LSR that follows the NH along the working LSP. For FRR link (node) protection to succeed, the backup LSP extending from the PLR to the NH (NNH) must not traverse the link extending between the PLR node and the NH node, nor should it reach the NH node itself.
In case of a penultimate hop PLR, the LSR located along the working path which precedes the egress LSR, can only perform link protection.
It will also be assumed herein that a failure of a protected link, would trigger a switchover to the backup LSP, irrespective of whether the cause for the failure is a link or a node failure. This procedure provides fast detection time, as it is based upon detecting physical layer defects, which can be detected very fast. Examples for such physical defects could be loss of signal, signal quality degradation, and remote failure indications.
FIG. 2 demonstrates an example where FRR is implemented for a Working LSP which includes three LSRs. A backup LSP, B1, enables protection against a failure of the link extending from LSR2 to LSR3. This backup LSP originates at PLR LSR2 and terminates at NH (relative to the PLR LSR2 location) LSR3. On the other hand, backup LSP B2 provides protection against a failure of LSR2, and consequently also against a failure of the link extending between LSR1 and LSR2. This backup LSP B2 originates at PLR LSR1, passes through intermediate LSR4, and terminates at NNH (relative to the PLR LSR1 location) LSR3.
The MPLS label processing which is also demonstrated in this FIG. 2 includes the following: LSR1 swaps Working label 20 with Working label 40 (as allocated by LSR3), and pushes backup label 95 towards LSR4 (as allocated by LSR4). LSR4 swaps the backup label 95 with label 99 and leaves the Working label 40 unchanged. LSR3 pops the backup label 99 and swaps the Working label 40 with label 50, as allocated by the subsequent hop.
Link Protection Scenario: when there is a failure at the link extending between LSR2 and LSR3, the PLR LSR2 redirects the traffic to B1, along which traffic would be conveyed to NH LSR3. Thereafter, the traffic resumes the original Working LSP.
Node Protection Scenario: when LSR2 fails, or there is a failure at the link extending between LSR1 and LSR2, the PLR LSR1 redirects the traffic to backup LSP B2, along which traffic would be conveyed to NNH LSR3. Thereafter, the traffic resumes the original Working LSP. As noted earlier, the failure of LSR2 is detected based on the failure occurring along the link connecting LSR1 and LSR2.
Multi-Failure Protection
The FRR scheme demonstrated above, enables protecting against a failure in the Working path. However, it does not address the problem of concurrent failures occurring along both the Working and the backup LSP. When considering a case of concurrent failures occurring in the setup demonstrated in FIG. 2, where failures occur both at the link of the Working LSP that extends between LSR1 and LSR2, and at the link of the backup LSP extending between LSR4 and LSR3, it is clear that in such a case, traffic cannot be recovered.
FIG. 3 illustrates a multi-ring topology, where multi-failure protection could be required. A network depicted in this Fig. comprises two topological rings 1 and 2. Ring 1 is formed by LSRs 1-2-3-4-8, while ring 2 is formed by LSRs 3-4-5-6-7. The rings are interconnected via LSR3 and LSR4 (sometimes referred to as “ring gateways”, or “gateways”), which share a common link 3-4. The links are typically realized by using optic fibers. A Working LSP W1 extends along LSRs 1-2-3-7-6 (marked with solid arrows) and is protected against the failure of NH LSR7 and the link 3-7 via the node protection backup LSP B1 that extends along LSRs 3-4-5-6 towards NNH LSR6 (marked with dashed arrows).
Node Protection Scenario: When the link 3-7 fails (Cut 1, marked with “x”), PLR LSR3 detects that link failure and assumes that NH LSR7 is down, and will therefore redirect the traffic to the backup LSP B1, which will convey the redirected traffic to NNH LSR6. The successfully recovered traffic continues over the Working LSP towards LSR6.
Dual Link Failure Scenario: When both links 3-7 and 3-4 fail, the behavior of PLR is the same as above, yet, since link 3-4 is down, the traffic cannot reach LSR4, and thus cannot be recovered.
IETF draft-vasseur-mpls-linknode-failure-00.txt (also described in US2003233595) uses a specific method for distinguishing between a link failure and a node failure at the PLR, and only after determining which of the two types of failures had occurred, it activates the appropriate type of protection. This requires signaling overhead over an alternate path, which does not include the directly connected link extending between the PLR and the NH, for detecting when the NH cannot be reached.
US 20110110224 discloses a dual FRR method, which provides both link and node protection, and uses backup LSP(s) in order to provide concurrent link and node protection (thus initiating a so-called Dual or concurrent FRR) while configuring a suitable blocking rule at the link protection merge point (the NH), to avoid traffic duplication that would otherwise occur with standard FRR. However, the main drawback of the disclosed method is the need to replicate traffic at the NNH (called NNHOP) which consumes extra (twice) resources at the NNHOP, where internal capacity resources are often limited. This is especially undesired when protecting a point to point (“P2P”) Working path, where there is no reason to carry out packet replications.
US 20130094355 describes a method that enables carrying out a fast reroute protection technique which provides both link and node protection without traffic duplication, without the need to distinguish between link and node failures, and without replicating traffic. This method is based on a point-to-multipoint (P2MP) backup path, and a special non-standard rule applied at a very specific node (the “penultimate hop”, PH) along that path, to reroute around a failure of the protection path. The main drawback of this method is that it is designed to recover only a specific failure of the protection path, namely the failure of the last hop of the P2MP backup path.
FIG. 4 illustrates the application of the solution suggested by the disclosure of US 20130094355A1 on the network demonstrated in FIG. 3. The working LSP W1 extends along 1-2-3-7-6. B1 (B2) is a P2MP sub-tunnel that extends along 3-4-5-6-7 towards NH LSR7 (NNH LSR6), respectively. The penultimate hop of B1 is actually the NNH, and is not an effective branching point to reroute the traffic from B1 to B2. For instance, when the link-3-7 fails (“Cut 1”, marked with “x”), then PLR LSR3 would reroute the traffic towards NH LSR7. Yet, if link 3-4 is also down (“Cut 2”, marked with “x”), the traffic conveyed along B1 fails to recover.