The problem of determining a protected path in a mesh communication network is relevant for various types of networks and for various data protocols applicable in such networks. For the sake of better illustration, the problem will be presented and analyzed using a modern example of Multiprotocol Label Switching (MPLS) networks.
MPLS is an emerging technology that improves the scalability and performance of IP (Internet Protocol) networks along with providing new services previously unavailable with traditional IP. In particular, MPLS enables service providers to engineer the traffic in their networks and offer quality-of-service (QoS) based services, including guaranteed capacity (bandwidth, BW). With MPLS, a source label switching router (LSR) inserts labels into packets arriving from a source system (e.g., a computer); all these labeled packets then follow the same label switched path (LSP) towards the destination LSR, which pops the label off the packet and transfers it to the destination system.
FIG. 1 illustrates an LSP path that originates at a source node LSR A, travels through a transit node LSR B, and ends at a destination node LSR C. Thus, all the packets that are mapped by LSR A into the LSP follow the path A->B->C. Also, it is shown that packets are appended with MPLS label at source LSR A and freed from the MPLS label at destination LSR C.
A major MPLS feature is a mechanism called fast reroute (FRR). FRR allows to rapidly reroute LSPs onto a preconfigured backup LSP around a network link or node failure, where the former case is referred to as FRR link protection while the latter is referred to as FRR node protection. When a protected link or node fails, the traffic of the primary LSPs is switched over to a backup LSP towards the next hop (NH) LSR or next-next-hop (NNH) LSR, respectively. The backup LSP remerges with the original path of the primary LSPs at NH or NNH, which redirects the traffic to the primary LSP. Note that this application assumes that a node protecting backup LSP merges with the primary LSP path at NNH and not further along the LSP path (e.g., not at NNNH). A backup LSP can be shared by multiple LSPs. With FRR, an interrupted traffic stream can be rerouted around a failed node/link within a short time interval of sub 50 milliseconds, thereby minimizing impact on the traffic.
FIG. 2 illustrates the FRR link protection. An LSP 1 (thick line) and an LSP 2 (thin line) normally flow along the path A->B. When the link between LSR A to LSR B fails, LSR A reroutes both LSP 1 and LSP 2 into a backup LSP (dotted line) that flows to LSR B (NH) through LSR C (the backup LSP is shared by LSP 1 and LSP 2, this is possible as noted earlier). The path of the backup LSP remerges with the original path of LSP 1 and LSP 2 at NH (next hop) LSR B, which redirects the traffic back into LSP 1 and LSP 2, respectively.
FIG. 3. illustrates FRR node protection. Let LSP 1 (thick line) and LSP 2 (thin line) normally flow along the path A->B->C. When LSR B fails, LSR A reroutes both LSP 1 and LSP 2 into a backup LSP (dotted line) that flows to LSR C (NNH) through LSR D. The path of the backup LSP remerges with the original path of LSP 1 and LSP 2 at NNH (next next hop) LSR C, which redirects the traffic back into LSP 1 and LSP 2, respectively.
For the protection to be valid, the primary LSP and its backup LSPs must be diversely routed (disjoint), otherwise the backup LSP will fail once the protected link or node fails. Using the examples of FIG. 2 and FIG. 3, the path of a link protecting backup LSP (e.g., the dotted path shown in FIG. 2) must not use the protected link (link AB in FIG. 2), and the path of a node protecting backup LSP (e.g., the dotted one in FIG. 3) must neither use the protected node (node B in FIG. 3) nor any of its links.
Detection of a link failure can be based on alarms of a physical communication layer (e.g., SONET/SDH) detected on that link (link AB in FIG. 2). Detection of a node failure can be based on alarms of either a network or a link communication layers (e.g., OSPF or PPP, respectively). Sometimes, and preferably due to the fast detection mechanism, a node failure can be detected using physical communication layer alarms on the link upstream the failed node (link AB in FIG. 3). In the latter case, an LSP may be assigned a link protecting backup LSP or a node protecting backup LSP per hop but not both, since it is not possible to distinguish between a link and a node failure. To be applicable for all these failure detection scenarios, this application will also assume that an LSP can be assigned either the link protection or the node protection per hop, but not both.
FRR protection can be defined as a segmented backup scheme, since it is based on providing a backup LSP to protect against a failure of a single segment (node or link) along the path of the primary LSP. Thus, to fully protect an LSP, multiple backup LSPs are required, one backup LSP per protected segment. Though the path of the backup LSP for each protected segment must be disjoint from the path of the primary LSP on that segment, it is allowed to share links and nodes with the primary LSP on other segments, and to share links, nodes, and may also share bandwidth with backup LSPs protecting the other segments of the primary LSP. This sharing can extremely increase the chances that a set of backup LSPs that can fully protect an LSP actually exists. For example, the path of a backup LSP in FIG. 2 must not use the protected link AB, but can use any other link along the path of the primary LSPs 1 and 2, e.g., the link with which LSP 1 and LSP 2 arrive at LSR A.
An important aspect in MPLS implementation is how to compute a path that meets the LSP requirements. Path computation may be done by a Source LSR, or by a route server/network management system. Input data for a path finding algorithm includes the LSP requirements and so-called traffic-engineered (TE) network topology, where the term TE is used to indicate that the topology includes quality-of-service (QoS) related information such as available capacity of network segments.
The LSP requirements include the source LSR, destination LSR, and additional attributes, such as LSP bandwidth (capacity) and FRR protection requirements; additional requirements can be class of service (CoS), hop limit, bidirectionality, etc.
The TE topology can be represented as a weighted directed graph of nodes and interconnecting links, where each node and link is associated with resources (or TE topology constraints). In particular, each link connecting an outgoing port of one LSR with an incoming port of another LSR is associated with a weight a.k.a metric or cost, available BW (optionally, per a required Class of Service CoS), and information on available backup LSP.
Backup LSP knowledge may be maintained by a path computation server (PCS), which can be any kind of LSR or a network management system. The source LSR would then request the backup LSP information from the PCS, when it (the source LSR) needs to compute a path for an FRR protected LSP. Alternatively, the source LSR may request the PCS to compute the path of the FRR protected LSP itself.
The result of a path finding algorithm is an optimum feasible path that meets the LSP requirements without violating the TE topology constraints. There are various optimization criteria, of which the most popular is the shortest path, in which it is desired to find the least total metric feasible path. The LSP path finding based on the shortest path criterion is usually referred to as “constrained shortest path first” (CSPF). CSPF has two heuristic phases: (1) In the first phase, all the infeasible links (“link directions”) are pruned. With respect to bandwidth availability, this means that any link direction that does not satisfy the BW requirements of the LSP must be excluded from the TE topology. (2) In the second phase, the shortest path between the source LSR and the destination LSR is computed using a standard algorithm such as Dijkstra algorithm.
FIG. 4 illustrates a TE topology containing four LSRs (A,B,C,D) interconnected with links, where each link direction (arrow) has a preconfigured metric and available bandwidth BW. For example, there is a link direction from LSR B to LSR D, having metric=5 and available BW=100; the metric and the BW (in brackets) are shown in a rectangular label shown near the link direction. Source/destination LSRs A and C are shown as dark boxes, while transit LSRs are shown as white boxes.
Suppose that it is required to compute a path to an LSP with BW=200 from LSR A to LSR C.
Phase 1: all the infeasible links are pruned, yielding the topology shown in FIG. 5. For example, the link direction from B to D is pruned because it has BW=100<200.
Phase 2: the shortest path is computed. In this case, there remained only two possible paths from LSR A to LSR C:
(1) Path A->C, with the metric equal to 2
(2) Path A->D->C, with the metric of 6+3=9.
Note that a path that traverses an LSR more than once (e.g., A->D->A->C) thereby forming a “traffic loop” is forbidden, and is thus not taken into account. The path A->C is the shortest one, and is thus selected as shown in FIG. 6. In this example, the shortest path computation is very simple; in a general case, a systematic method such as Dijkstra or Bellman-Ford algorithm is used.
Path computation gets more complicated when the LSP to be established has FRR protection requirements. Such requirements may ask for: (1) link protection; (2) node protection (except for the last link direction on path which should have link protection); (3) node-link protection, where either node or link protection is required, and where node protection (wherever available) should be preferred over link protection.
A problem arises when the FRR protection must be guaranteed along the LSP path. That is, the primary LSP must be protected against any single segment (link or node) failure along its path. Such an LSP is considered to be a fully FRR protected primary LSP. Any path that does not satisfy this requirement must be rejected.
If the desired guaranteed FRR is link protection, simple pruning of link directions that are not provided with link protection (Phase 1) can work. For example, if only the link direction from A to C in FIG. 4 has the link protection (i.e., there is a backup LSP from A to C that protects against a failure of the link direction A->C), the pruned topology would be as shown in FIG. 6. In this case, there is only one feasible path for the required LSP: A->C.
The problem is that the simple pruning rule of Phase 1 sometimes does not help when a network user requires a guaranteed FRR node protection or a guaranteed FRR node-link protection. It happens because the LSP path of interest is not known during Phase 1 of the path finding procedure. For example, in the topology of FIG. 7 there is a backup (dotted) LSP to protect against failure of node B if the LSP path is A->B->C. However, there is no backup LSP to protect against failure of node B if the LSP path is A->B->E; therefore, the link direction A->B is feasible if LSP path is A->B->C but is infeasible if LSP path is A->B->E. Since the LSP path is unknown in Phase 1, pruning the link direction A->B in Phase 1 would be a wrong act because the LSP path of interest might be possibly chosen to use the path A->B->C in Phase 2.
A two step approach for a segmented backup scheme is used in “A Scalable and Decentralized Fast-Rerouting Scheme with Efficient Bandwidth Sharing” by Simon Balon et al., Dec. 13, 2005: it first computes a primary LSP path using one of the available algorithms, such as the two phase CSPF described earlier. For the obtained candidate path the algorithm then tries to assign backup LSPs. The disadvantage of this approach is that it cannot assure a fully FRR protected primary LSP even if such one does exist. As mentioned in the article, “When the primary path is known we compute the set of backup LSPs required to prevent any possible node failure along this path. If a backup path cannot be found under the node-failure assumption, we assume that only a link failure will occur and compute a new backup path. If it fails again, the request is rejected.”
Another two step approach for a segmented backup scheme is used in “A Distributed Primary-Segmented Backup Scheme for Dependable Real-Time Communication in Multihop Networks” by Ranjith et al., 2002: it recognizes the advantages of the segmented backup scheme (as is the FRR protection) over non-segmented backup but it assumes that the primary path is already chosen and all that remained is to compute optimal backup paths for it. However, if the primary path is already chosen, it may traverse links and nodes with no available backup LSPs, and thus the protection cannot be guaranteed. As mentioned in the article, “The distributed algorithm has to be executed twice, i.e., the first time for finding a minimal cost primary path between the source and the destination, and the second time for finding a set of minimal total weight segmented backups for this primary path.”
US patent publication 2003/0229807 A1 describes a segment protection technique for a network, comprising two steps: calculating an optimal (shortest) active path (AP), dividing it into several active segments (AS) and then protecting each AS with a detour called backup segment or BS. Actually, the US publication 2003/0229807 describes yet another version of the known principle “active path first” (APF).
A similar approach is described by Yu Liu and David Tipper in their article “Successive Survivable Routing for Node Failures”, where working paths are given before pre-planned backup paths are routed and reserved.
Another known approach is called the K Shortest Paths (KSP) method, in which the algorithm finds K shortest active paths (APs) and then tests them one by one in the increasing order of their costs until a backup path is found or all of them have been tested. Since the KSP method is based on a limited number of a priori chosen APs, it might not find a protected path even if such one exists.
Based on the above, it can be seen that the task of finding a guaranteed segment-protected path in a mesh network (say, a fully FRR protected primary LSP in an MPLS network) cannot be effectively resolved using any of the above-mentioned 2-phase path finding algorithms.
At the same time, the guaranteed FRR protection is of prime importance to service providers. The essence of many Internet applications such as e-business and e-commerce is the ability to offer customers round-the-clock, uninterrupted access to the Internet. Fault or failure in a fiber optic link or routing/switching node that makes up the internet can knock out an internet service provider's vital link to their customers. When forwarding packets at the speed of 10 giga bits per second or more, a single second of inactivity means that millions of bytes of precious customer's data are discarded and the QoS guarantees are gone the same way; not to mention mission-critical applications where such failures could have catastrophic consequences.