1. Field of the Invention
The present invention generally relates to forwarding of multicast traffic. The invention relates more specifically to a method and apparatus for forwarding label distribution protocol multicast traffic along a multicast tree having a primary and a backup path.
2. Background Information
The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
In computer networks such as the Internet, packets of data are sent from a source to a destination via a network of elements including links (communication paths such as telephone or optical lines) and nodes (for example, routers directing the packet along one or more of a plurality of links connected to it) according to one of various routing protocols.
MPLS is a protocol that is well known to the skilled reader and which is described in document “Multi Protocol Label Switching Architecture” which is available at the time of writing on the file “rfc3031.txt” in the directory “rfc” of the domain “ietf.org” on the World Wide Web. According to MPLS, a path for a source-destination pair is established, and values required for forwarding a packet between adjacent routers in the path together with headers or “labels” are pre-pended to the packet. The labels are used to direct the packet to the correct interface and next hop. The labels precede the IP or other header allowing smaller outer headers.
The path for the source-destination pair, termed a Label Switched Path (LSP) can be established according to various different approaches. One such approach is Label Distribution Protocol (LDP) in which each router in the path sends its label to the neighbor routers according to its IP routing table. LDP labels are sent to the neighbor routers in a label mapping message which can include as one of its TLV (Type Length Value) fields a path vector specifying the LSP. For each LSP created, a forwarding equivalent class (FEC) is associated with the path specifying which packets are mapped to it. A Label Forwarding Information Base (LFIB) stores the FEC, the next-hop information for the LSP, and the label required by the next hop.
MPLS LDP approaches have flurther been applied to multicast networks. Conventionally multicast networks rely on unicast routing protocols. Unicast routing protocol relies on a routing algorithm resident at each node. Each node on the network advertises the routes throughout the network. The routes are stored in a routing information base (RIB) and based on these results a forwarding information base (FIB) or forwarding table is updated to control forwarding of packets appropriately. When there is a network change, a notification representing the change is flooded through the network by each node adjacent the change, each node receiving a notification sending it to each adjacent node.
As a result, when a data packet for a destination node arrives at a node the node identifies the optimum route to that destination and forwards the packet via the correct interface to the next node (“NEXT_HOP”) along that route. The next node repeats this step and so forth.
Multicast networks such as point to multi point (P2MP) are built on Unicast routing protocols. However multicast allows data packets to be forwarded to multiple destinations (or “receivers”) without unnecessary duplication, reducing the amount of data traffic accordingly. All hosts wishing to become a receiver for a multicast group perform a “join” operation to join the multicast group. A multicast tree such as a shortest path tree is then created providing routes to all receivers in the group. The multicast group in a P2MP group is denoted (S,G) where S is the address of the source or broadcasting host and G is an IP multicast address taken from a reserved address space. As a result routers receiving a packet from the source S to the multicast address G send the packet down each interface providing a next hop along the route to any receiver on the tree.
During forwarding of multicast data at a router, when a packet is received at the router with a multicast address as destination address, the router consults the multicast forwarding table and sends the packet to the correct next hop via the corresponding interface. As a result, even if the path from the next hop subsequently branches to multiple receivers, only a single multicast packet needs to be sent to the next hop. If, at the router, more than one next hop is required, that is to say the multicast tree branches at the router, then the packet is copied and sent on each relevant output interface.
However it is important to ensure that looping does not take place, for example where a router forwards multicast traffic which is then returned to it such that repeat forwarding takes place. Any such loops and any multicast network will propagate very quickly and can lead to network overload.
In order therefore to avoid looping each router ensures that data is only sent away from the source and towards the receiver. In order to achieve this the router carries out a reverse path forwarding (RPF) check to ensure that the incoming packet has arrived on the appropriate input interface. If the check fails then the packet is dropped. The router uses the unicast forwarding table to identify the appropriate upstream and downstream interfaces in the tree as part of the RPF and only forwards packets arriving from the upstream direction.
Multicast methods which make use of existing forwarding information in this manner belong to the family of “protocol independent multicast” (PIM) methods as they are independent of the specific routing protocol adopted at each router.
More recently the use of MPLS multicast has been explored and in particular the use of LDP has been discussed for building receiver driven multicast trees. Once such approach is described in Label Distribution Protocol Extensions for Point-to-Multipoint Label Switched Paths” of I. Minei et al which is available at the time of writing on the file “draft-ietf-mpls-ldp-p2mp-00.txt” in the directory “wg/mpls” of the domain “tools.ietf.org”.
The approach described therein can be understood further with reference to FIG. 1 which is a network diagram illustrating a P2MP network and FIG. 2 which is a flow diagram illustrating the steps involved in a node joining the network. The network shown in FIG. 1 is designated generally 100 and includes nodes comprising, for example routers R1, reference 102, R2, reference numeral 104, R3, reference numeral 106 and R4, reference numeral 108. Node R1, R2 and R4 are joined to node R3 via respective interfaces S0, S1, S2, reference numerals 110, 112, 114 respectively. Nodes R1 and R2 comprise leaf or receiver nodes which can receive multicast traffic from root node R4 via transit node R3.
Referring to FIG. 2, at step 200, receiver node R2 joins the multicast tree according to any appropriate mechanism, and obtains the relevant identifiers of the tree, namely the root node and the FEC of traffic belonging to the tree. It then creates an LDP path from the root R4. In particular, at step 202 R2 identifies its nexthop to the root of the tree for example from its IP forwarding table, in the present case, node R3. At step 204 node R2 constructs a P2MP label mapping message 116 indicating the multicast tree FEC (for example an identifier “200”), the root R4 of the multicast tree and the label it pushes to R3, label L2. In the case of a P2MP network the downstream direction for traffic is from R4 via R3 to R2 and hence the label mapping message is sent upstream from R2 to R3.
At step 206 node R3 similarly allocates a label L5 and updates its forwarding state such that incoming packets with label L5 will have the label swapped for label L2 and forwarded along interface S1 to R2. Node R3 further sends a P2MP label mapping message to node R4 indicating the FEC 200, the root R4 and its label L5 at step 208. At step 210 root node R4 updates its forwarding state with label L5 for the FEC 200. It will be n o t e d that steps 200 to 210 are repeated for each leaf or receiver node joining the multicast tree. For example if node R1 joins the tree then it sends a P2MP label mapping message to R3 with FEC 200, route R4 and label L1. In this case, as is appropriate for multicast, R3 does not construct a further label to send to R4 but adds label L1 to the forwarding state corresponding to incoming packets with label L5.
P2MP LDP Multicast can be further understood with reference to FIG. 3 which shows the network of FIG. 1 with the datapath of multicast traffic, and FIG. 4 which comprises a flow diagram showing the steps performed in the forwarding operation. At step 400 the root node R4, acting as ingress node to the P2MP network, recognizes in any appropriate manner traffic for example ingress IP traffic for the multicast tree 100 and forwards the traffic shown as packet 300 to which the label L5 302 is appended to an IP payload 304. The forwarding table or multicast LFIB (mLFIB) 306 maintained at R3 for traffic incoming on interface S2 is shown in FIG. 3 for “down” traffic, that is, traffic from the root to the receivers. At step 402 node R3 carries out an RPF check to ensure that the incoming packet with label L5 arrived on the correct interface S2. If so, then at step 404 labels L1 and L2 are swapped for label L5 for forwarding along respective interfaces SO and S1, as shown at step 404. As a result packets 308, 310 are sent to the respective receivers with the appropriate label appended to the payload.
Provision is also made for withdrawal of labels. For example referring to FIG. 5, which is a flow diagram illustrating the steps performed in a label withdrawal transaction, where a node for example node R2 wishes to leave the multicast tree then at step 500 it sends a label withdraw message to its nexthop neighbor R3. At step 502, node R3 deletes the relevant state for example label L2 and at step 504 R3 sends a label release message to R2. It will be noted that if node R1 also leaves the tree then node R3 will remove all of the state corresponding to FEC 200 and will send a label withdraw message to node R4.
FIG. 6 is a flow diagram illustrating the steps performed when a nexthop changes but without removal of any receiver node from the multicast tree. An example topology is shown in FIG. 7, which is a network diagram corresponding to FIGS. 1 and 3 but with an additional node R5 700 as node R3's nexthop to node R4, and an additional node R6 702 as an alternative nexthop for node R2 to node R4. Node R2's nexthop to node R4 will change if the link between node R5 and node R4 fails, and change to, for example, node R6.
In that case at step 600 node R2 sends a label withdraw message to node R3 and at step 602 node R2 clears the relevant entries in its mLFIB. At step 604 node R2 sends its new label for example L6 to node R6 following the label mapping procedures described above with reference to FIG. 2. At step 606 node R6 installs the label L6 and forwards a label mapping message to root R4 again in the manner described above.
It will be noted that LDP allocates a local label for every FEC it learns, and if the FEC is removed, the local label and an associated binding (i.e. remote corresponding labels) for the FEC are preserved for a timeout period. If the FEC is reinstated before the timeout expires, LDP uses the same local label binding for that FEC. Accordingly where there is a network change which changes the route of the multicast tree's unicast nexthop, the same local label binding is used and rewritten in an ingress interface independent manner such that the label rewrite is used on the data plane, i.e. in the mLFIB, before and after the network change.
In the event of a network change such as removal or introduction of a network component such as a node (router) or link between routers, an MPLS multicast fast reroute technique has been proposed in U.S. patent application Ser. No. 11/336,457 entitled “Method and Apparatus for Implementing Protection for Multicast Services” of Raj et al dated Jan. 20th 2006 the contents of which are incorporated by reference as if fully disclosed herein. According to Raj et al each router in a network has a primary path to a destination and, in addition, identifies backup paths around failed components and pre-installs them. For example in the case of a potential link failure, a repairing router identifies a backup path to its nexthop node across the link. In the case of a node failure the repairing router identifies a backup path to the next nexthop node which would have been forwarded to by the nexthop node. The backup paths comprise label switched paths and an appropriate signaling mechanism is implemented to distribute the corresponding labels, the backup paths hence acting as traffic tunnels in repair mode.
The approach in Raj et al can be further understood from FIG. 8 which is a network diagram illustrating a P2MP network including a link failure and FIG. 9 which is a network diagram illustrating a P2MP including a node failure. Referring firstly to FIG. 8 it will be seen that if interface S2 fails between nodes R4 and R3 (reference numerals 108, 106) then node R4 as repairing node can institute a repair tunnel 800 around the failed node (for example using additional nodes and links which are not shown) to node R3. Referring to FIG. 9 where node R3 itself fails then node R4 can implement repair tunnels 900, 902 to nodes R2, R1 respectively as next nexthop nodes.
However, a problem inherent in both unicast and multicast traffic is that of micro looping. In essence, micro loops occur when a network change takes place and nodes converge on the new network at different times. Whilst the nodes are not all converged, there is a risk that one node will forward according to an old topology whereas another node will forward according to a new topology such that traffic will be sent back and forth between two or more nodes in a micro loop. In IP networks, transient micro loops can occur for example because of control plane inconsistency between local and remote devices (that is, for example, inconsistencies in the RIB), control and data plane inconsistency on a local device (that is inconsistencies between the RIB and the FIB if the FIB has not yet been updated) and inconsistencies on the data plane between local and remote devices, for example where the FIB or LFIB or respective nodes are converged on different topologies.
Transient micro loops are in fact common in IP networks and in unicast IP routing the impact and number of devices affected is restricted. However in the case of multicast networks there is the risk of exponential-traffic loops during convergence. For example if there are 100,000 multicast trees through a multicast core router such as router R3 then during a network change, transient micro loops could bring down the entire network.
It will be seen that a similar transient micro loop problem can occur in the case of networks supporting multicast fast reroute as described in Raj et al. However micro loops are not acceptable during fast reroute in view of the risk of data loss. For example the problem can occur during fast reroute when a link-down event occurs, that is to say a link fails. In that case the local node, for example node R4 detects the failure and enables the backup path. The routing protocol then propagates the link failure to the remote nodes. However, each node may receive the failure notification at a different time, depending upon its location. Also each node may take a different amount of time to compute and install the path independently. Therefore there may be a period of time in which some of the nodes may have a new path installed and others may have the old path installed meaning that the link-down event can lead to the formation of transient micro loops despite the presence of fast reroute. Similar problems can arise when a new link is introduced in the network.
A further problem can arise when back up paths are invoked by multiple nodes. This may occur, for example, upon node failure being detected by multiple repairing nodes across respective links. According to unicast routing, multiple upstream nodes (where upstream is in the direction from the receivers to the root) can use the same downstream node as a nexthop. In this case according to multicast fast reroute approaches, each upstream node will compute the backup path for a node failure which will appear as multiple link failures. In that case, a first upstream node may attempt to repair using a backup path that includes another of the upstream repairing nodes. If this node is also repairing via a backup path using the first upstream node then there will be a loop between the backup paths. This can be termed a fast re-route loop.
Fast reroute loops can be understood further with reference to FIGS. 10 to 13 which are network diagrams illustrating a P2MP network in relation to which such a loop may be instigated. Referring to FIG. 10 a P2MP network includes a root node 1000 and receiver nodes 1012, 1014 both of which are downstream of a transit node 1010. The root node 1000 has two paths to the transit node 1010 either via nodes 1002, 1006 or via nodes 1004, 1008. Referring to FIG. 11, in the event of failure of the link 1016 between nodes 1006 and 1010, node 1006 as a repairing node or point of local repair may institute a link protection repair path 1018 via node 1002, 1000, 1004, 1008 to the transit node 1010 from which data is then forwarded normally. Conversely, as shown in FIG. 12, if the failure is in fact at node 1010 then both nodes 1008, 1006 ,will detect the failure as failure of respective links 1016 and 1022 and institute respective link protection label switched paths 1018, 1020 in opposite directions via nodes 1004, 1000, 1002 and 1006. It will be seen that if both nodes 1006 and 1008 use their link protecting backup paths at the same time it would create a fast reroute loop whereby node 1006 attempts to repair node 1008's repaired traffic back to node 1008 and so forth. This loop can be seen further in FIG. 13 as loop 1024.