MPLS Technology
MPLS is a new technology for fast delivery of packet-based traffic along pre-established logical paths called label switched paths (LSPs, a.k.a. tunnels). MPLS supports traffic engineering (TE) to optimize usage of network resources, and is designed to offer a reliable traffic delivery, with predictable quality of service (QoS) and guaranteed capacity (a.k.a., bandwidth). MPLS uses a notion called label to identify, classify and forward data over LSPs.
A point-to-point (P2P) LSP delivers traffic from the access port(s) of a source (a.k.a., ingress) node (a.k.a., label switching router, LSR) downstream to the access port(s) of destination (a.k.a., egress) LSR. The LSP may traverse intermediate (a.k.a., transit) LSRs. Access ports can also be LSR-internal, i.e., the LSR can also generate traffic towards LSP. An LSR may serve as ingress node for one LSP and transit or egress for another.
FIG. 1 illustrates a P2P LSP, that originates at ingress LSR1, traverses through transit LSR2 (from port “a” to port “b”) and LSR3, and terminates at egress LSR4. The LSP path may be summarized as 1-2-3-4.
A point-to-multipoint (P2MP) LSP delivers multicast traffic (such as IPTV) from ingress LSR (a.k.a., root) downstream to one or more egress LSRs (a.k.a., leaf, leaves). One example of P2MP LSP is shown in FIG. 2. It is a tree-and-branch structure, where traffic is replicated at transit branch points and then sent to the access port(s) at the leaves. This scheme is efficient in terms of link capacity consumption, as only one copy of the traffic is ever sent per branching link. Multiple copies (a.k.a., traffic duplication) per branch are forbidden, as they could crash the user application.
With P2MP tunnels, any LSR may serve as both a transit and an egress (abbreviated transit&leaf) LSR, in which case it forwards traffic both to its local access port(s), and to downstream LSRs. The point-to-point path to each specific leaf is called a sub-LSP.
FIG. 2 illustrates a P2MP LSP. Multicast traffic gets on the LSP at a root LSR1, sent to a transit&leaf LSR2, replicated towards a leaf LSR3 and a transit LSR4. The traffic is forwarded by LSR4 to a transit&leaf LSR5, which in turn forwards it to a leaf LSR6. Each of the LSRs 2, 3, 5, 6 forwards the traffic also to the local access port(s). As illustrated by dotted lines, there are 4 sub-LSPs, with paths 1-2, 1-2-3, 1-2-4-5, and 1-2-4-5-6.
Note that LSR1 sends only one packet copy to LSR2, even though the link to LSR2 carries 4 sub-LSPs. Similarly, LSR2 sends one copy on the branch to LSR3 and one copy on the branch to LSR4, even though the link to LSR4 carries 2 sub-LSPs.
Fast Reroute (FRR) and Point to Point (P2P) LSPs
A major MPLS feature is the support of fast reroute (FRR). FRR is a mechanism for rapid traffic restoration upon a link or a node failure along the LSP path. With FRR, an interrupted traffic stream can be rerouted around a failed node/link within a time interval of sub-50 milliseconds, thereby minimizing impact on the traffic.
The LSR upstream of the failure (a.k.a., point of local repair, PLR), redirects the traffic of the so-called working, LSP onto a pre-established (P2P) backup LSP (a.k.a., a bypass LSP), which reroutes around the failure. The backup LSP brings the traffic from the PLR to an LSR positioned downstream of the failure (a.k.a., a merge point, MP), after which the traffic returns to the working LSP. The MP is actually the egress LSR of the backup LSP.
For the sake of simplicity, it will be assumed that the MP is the closest possible LSR downstream the failure point, counting from the PLR. Accordingly, with FRR link protection, the MP is PLR's next-hop (NHOP) LSR, i.e., the LSR at the far end of the protected link; in node protection the MP is PLR's next-next-hop (NNHOP) LSR, i.e. the LSR that is placed after the NHOP along the working LSP path (in a general case, it is not obligatory the neighbor of NHOP). The backup LSP may be capable of being shared, i.e. provide protection to multiple working LSPs, in which case it is known as Facility backup LSP.
It will also be assumed that a failure of the protected link triggers both the link and the node protection mechanisms. This procedure is commonly in use and provides fast detection time, since the failure detection is based on rapid physical layer indications. Examples for such indications are: “loss of signal”, “signal quality degradation”, “remote alarm indication”.
FRR protection should preferably be node protection FRR rather than link protection FRR, as the former provides better resiliency. The last hop of an LSP cannot be provided with node protection, i.e., no protection is possible when the egress LSR fails. A fully protected P2P LSP is a P2P LSP having assigned node protection at all its hops, except for the last hop, for which a link protection is assigned.
FIG. 3 illustrates FRR configuration for the P2P working LSP of FIG. 1. This
LSP is fully protected as follows: (1) B1 (B2) protects against the failure of LSR2 (LSR3), it originates at PLR LSR1 (LSR2) and terminates at NNHOP LSR3 (LSR4), respectively. The backup LSP path may include also transit LSRs, as illustrated for B1 which traverses transit LSR5 (2) Backup LSP B3 protects against the failure of the link from LSR3 to LSR4. It originates at PLR LSR3 and terminates at MP NHOP LSR4.
Node FRR Scenario: if LSR2 (LSR3) goes down, as detected by a Point of Local Repair LSR1 (LSR2) due to the failure of the outgoing link to LSR2 (LSR3), the PLR redirects the traffic to B1 (B2), that brings it to MP LSR3 (LSR4), respectively, after which the traffic returns to the working LSP.
Link FRR Scenario: if the link LSR3 to LSR4 goes down, the PLR LSR3 redirects the traffic to B3, that brings it to MP LSR4, after which the traffic is sent to the access port.
Fast Reroute (FRR) and Point-to-Multipoint (P2MP) LSPs
FRR is also applicable for a working P2MP LSP. To protect against a link failure, the PLR needs a single backup LSP towards NHOP. To fully protect a node, the PLR needs a backup LSP per NNHOP. When a backup LSP protects multiple sub-LSPs and FRR is activated, the PLR sends only one copy of the traffic over that backup LSP, thus avoiding traffic duplication.
A fully protected sub-LSP is a sub-LSP assigned node protection at all hops, except for the last hop, for which it is assigned a link protection. A fully protected P2MP LSP is a P2MP LSP for which all the sub-LSPs are fully protected.
FIG. 4. illustrates FRR configuration for the working P2MP LSP of FIG. 2, wherein LSR5 and LSR6 of FIG. 2 are removed and wherein LSR4 becomes an egress LSR. FRR protection is organized as follows: (1) B1 (B2) protects against the failure of LSR2. It originates at PLR LSR1, traverses through a different network node marked LSR6, and terminates at NNHOP LSR3 (LSR4), respectively; (2) B3 (B4) protects against the failure of the link from LSR2 to LSR3 (LSR4), it originates at PLR LSR2 and terminates at MP NHOP LSR3 (LSR4), respectively.
This working P2MP LSP is not fully protected because the sub-LSP to LSR2 has no link protection at the last hop.
Thus, if link LSR1 to LSR2 fails (LSR2 stays up), no traffic will reach the access port of LSR2.
Node FRR Scenario: if LSR2 goes down, as detected by PLR LSR1 via the failure of the link to LSR2, the PLR redirects the traffic to B1 and B2, that brings it to MP LSR3 and LSR4, respectively, after which the traffic is sent to the access ports.
Link FRR Scenario: if the link LSR2 to LSR3 (LSR4) goes down, PLR LSR2 redirects the traffic to B3 (B4), that brings it to MP LSR3 (LSR4), after which the traffic is sent to the access ports.
So far, backup LSPs were P2P LSPs. It is also possible to establish P2MP backup LSPs, as per draft-ietf-mpls-p2 mp-te-bypass-02.txt, to optimize protection bandwidth consumption. For example, in case of a node failure, the PLR would redirect the traffic onto a P2MP backup LSP, which would route a single traffic copy towards all NNHOPs.
FIG. 5 illustrates FRR configuration for the working P2MP LSP of FIG. 4, with B1 and B2 replaced by P2MP backup LSP B5, to protect against the failure of LSR2. B5 originates at PLR LSR1, traverses through LSR6, and terminates at NNHOPs LSR3 and LSR4. The advantage of P2MP backup LSP is exemplified at link LSR1 to LSR6, where B5 consumes half as much protection bandwidth compared to the aggregated bandwidth of B1 and B2 of FIG. 4.
This working P2MP LSP is not fully protected because the sub-LSP to LSR2 has no link protection at the last hop. Thus, if link LSR1 to LSR2 fails (LSR2 stays up), no traffic will reach the access port of LSR2.
Node FRR Scenario: if LSR2 goes down, as detected by PLR LSR1 via (due to) the failure of the link to LSR2, the PLR redirects the traffic to B5, that brings it to MP LSR3 and LSR4, after which the traffic returns to the working LSP.
Link FRR Scenario: Same as for FIG. 4.
Traffic Duplication Problem for FRR
Assume that PLR has both the link and the node protection assigned towards the directly connected NHOP and NNHOP(s), respectively. Since it was assumed that FRR (both the link protection and the node protection mechanisms) is triggered based on link failure alone, this would result in the traffic duplication. This problem is referred hereafter as FRR duplication problem.
FIG. 6. illustrates a portion of a P2MP working LSP, flowing on path 1-2-3-4, where LSR2 is a transit&leaf node (it forwards traffic both to LSR3 and to its own Access port). For this LSP to be fully protected, it ought to: (1) Protect transit&leaf LSR2, e.g., via B1. This provides node protection for sub-LSPs whose LSR2 is transit (2) Protect the link LSR1 to LSR2, e.g., via B2. This provides link protection for the sub-LSP whose LSR2 is leaf.
Note that B1 alone cannot make the LSP fully protected, because when the link LSR1 to LSR2 fails (and LSR2 stays up), no traffic would reach the access port at LSR2. Likewise, B2 alone is insufficient, being less resilient than the node protection.
Case A: Link LSR1 to LSR2 goes down but LSR2 stays up. Being unable to distinguish a link failure from a node failure, LSR1 moves traffic to both B1 and B2. LSR3 receives two copies, one from LSR2 and one from B1.
Case B: P2P vs. P2MP Backup. The P2P backup LSPs may be replaced by P2MP backup LSP (not shown). This can improve the protection bandwidth efficiency, but the duplication problem will remain.
Case C: Mixed Node and Link FRR. In the absence of node protection towards some of NNHOPs (if additional NNHOPs exist), they may benefit from link protection. The traffic duplication at LSR3 remains.
Case D: P2P vs. P2MP Working. The P2MP working LSP may be replaced by a P2P working LSP (say, there is no Access port in LSR2). The problem of duplication remains.
When summarizing the prevalence of the FRR duplication problem, one comes to a conclusion that it may occur invariably to P2P and P2MP working LSPs, with either P2P or P2MP backup LSPs. The main cause of traffic duplication is that PLR does not, based on the earlier assumption, distinguish a link failure from a node failure.
A number of solutions have been proposed in the prior art to distinguish a link from a node failure, and thereby could avoid the described problem of traffic duplication.
The PLR could implement mechanisms to distinguish between link and node failures. Upon link (node) failure, the PLR applies only link (node) protection, respectively, thus avoiding traffic duplication. Such an approach is classified as out of scope one in section 4.1.1 of draft-ietf-mpls-p2mp-te-bypass-02.txt: “The PLR needs to localize the failed elements in order to activate the P2MP Bypass Tunnel(s) protecting this element. Mechanisms through which this location is retrieved are out of the scope of this document.”
A specific method for distinguishing a link failure from a node failure is proposed by draft-vasseur-mpls-linknode-failure-00.txt (also formulated in US2003233595). It is based on exchanging so-called “Hello” messages over an alternative path between the PLR and NHOP, for detecting when NHOP is not reachable. The author of that specific method proposes two schemes of behavior for the PLR upon link failure detection. Scheme 1, Step 1: assume a link failure occurred and switch traffic to the link protecting backup LSP. Step 2: if later it becomes clear (via Hellos) that the NHOP is down, move traffic to the node protecting backup LSP. Scheme 2, Step 1: assume a node failure occurred and switch traffic to the node protecting backup LSP. Step 2: if later it becomes clear (via Hellos) that the NHOP is up, move traffic to the link protecting backup LSP.
The above method has the following disadvantages: (1) Performance: PLR and NHOP need to exchange Hello messages based on which a node failure is detected. These messages consume resources, and could create a performance burden on the PLR, especially if it has many protected links. (2) Detection Time: PLR should waist time for several Hello acknowledgements, plus a propagation delay from NHOP, before Step 2 can be executed. (3) Availability: A need for an alternate path to carry the Hellos between PLR and NHOP. Such a path is not always available. The backup LSP could carry the Hellos from PLR to NHOP but there is still a need for a path in the reverse direction. Such a complex backup path might also fail/not reliably deliver the Hellos, thereby causing false indications.
A totally different approach is called for, i.e. an approach that would practically address the FRR duplication problem, while avoiding the potentially intolerable drawbacks discussed above.