1. Field of the Invention
The present invention generally relates to distribution of labels, for example, Multi Protocol Label Switching (MPLS) labels. The invention relates more specifically to a method and apparatus for distributing labels in a Label Distribution Protocol (LDP) multicast network.
2. Background Information
The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
In computer networks such as the Internet, packets of data are sent from a source is to a destination via a network of elements including links (communication paths such as telephone or optical lines) and nodes (for example, routers directing the packet along one or more of a plurality of links connected to it) according to one of various routing protocols.
MPLS is a protocol that is well known to the skilled reader and which is described in document “Multi Protocol Label Switching Architecture” which is available at the time of writing on the file “rfc3031.txt” in the directory “rfc” of the domain “ietf.org” on the World Wide Web. According to MPLS, a path for a source-destination pair is established, and values required for forwarding a packet between adjacent routers in the path together with headers or “labels” are prepended to the packet. The labels are used to direct the packet to the correct interface and next hop. The labels precede the IP or other header allowing smaller outer headers.
The path for the source-destination pair, termed a Label Switched Path (LSP) can be established according to various different approaches. One such approach is Label Distribution Protocol (LDP) in which each router in the path sends its label to the neighbor routers according to its IP routing table. LDP labels are sent to the neighbor routers in a label mapping message which can include as one of its TLV (Type Length Value) fields a path vector specifying the LSP. For each LSP created, a forwarding equivalent class (FEC) is associated with the path specifying which packets are mapped to it. A Label Forwarding Information Base (LFIB) stores the FEC, the next-hop information for the LSP, and the label required by the next hop.
MPLS LDP approaches have further been applied to multicast networks. Conventionally multicast networks rely on unicast routing protocols. Unicast routing protocol relies on a routing algorithm resident at each node. Each node on the network advertises the routes throughout the network. The routes are stored in a routing information base (RIB) and based on these results a forwarding information base (FIB) or forwarding table is updated to control forwarding of packets appropriately. When there is a network change, a notification representing the change is flooded through the network by each node adjacent the change, each node receiving a notification sending it to each adjacent node.
As a result, when a data packet for a destination node arrives at a node, the node identifies the optimum route to that destination and forwards the packet via the correct interface to the next node (“NEXT_HOP”) along that route. The next node repeats this step and so forth.
Link state protocols can support multicast traffic comprising point to multipoint traffic (P2MP) and multipoint to multipoint traffic (MP2MP). For example IP (internet protocol) multicast is well known to the skilled reader and is described in document “Internet Protocol Multicast” which is available at the time of writing on the file “IP multi.htm” in the directory “univercd/cc/td/doc/cisintwk/ito_doc” of the domain www.cisco.com of the World Wide Web.
Multicast allows data packets to be forwarded to multiple destinations (or “receivers”) without unnecessary duplication, reducing the amount of data traffic accordingly. All hosts wishing to become a receiver for a multicast group perform a “join” operation to join the multicast group. A multicast tree such as a shortest path trees is then created providing routes to all receivers in the group. The multicast group in a P2MP group is denoted (S,G) where S is the address of the source or broadcasting host and G is an IP multicast address taken from a reserved address space. As a result routers receiving a packet from the source S to the multicast address G send the packet down each interface providing a next hop along the route to any receiver on the tree.
In the case of MP2MP multicasts, a shared group is denoted (*,G) allowing multiple sources to send to multiple receivers. The multicast tree is constructed as a shared tree including a shared root or rendezvous point (RP).
During forwarding of multicast data at a router, when a packet is received at the router with a multicast address as destination address, the router consults the multicast forwarding table and sends the packet to the correct next hop via the corresponding interface. As a result, even if the path from the next hop subsequently branches to multiple receivers, only a single multicast packet needs to be sent to the next hop. If, at the router, more than one next hop is required, that is to say the multicast tree branches at the router, then the packet is copied and sent on each relevant output interface.
In order to avoid looping, each router ensures that data is only sent away from the source and towards the receiver as otherwise traffic would loop back, which is impermissible in multicast. In order to achieve this the router carries out a reverse path forwarding (RPF) check to ensure that the incoming packet has arrived on the appropriate input interface. If the check fails then the packet is dropped. The routers uses the unicast forwarding table to identify the appropriate upstream and downstream interfaces in the tree as part of the RPF and only forwards packets arriving from the upstream direction.
Multicast methods which make use of existing forwarding information in this manner belong to the family of “protocol independent multicast” (PIM) methods as they are independent of the specific routing protocol adopted at each router.
More recently the use of MPLS multicast has been explored and in particular the use of LDP has been discussed for building receiver driven multicast trees. Once such approach is described in Label Distribution Protocol Extensions for Point-to-Multi-point Label Switched Paths” of I. Minei et al., which is available at the time of writing on the file “draft-minei-wijnands-mpls-ldp-p2mp-00.txt” in the directory “wg/mpls” of the domain “tools.ietf.org”.
The approach described therein can be understood further with reference to FIG. 1 which is a network diagram illustrating a P2MP network and FIG. 2 which is a flow diagram illustrating the steps involved in a node joining the network. The network shown in FIG. 1 is designated generally 100 and includes nodes comprising, for example routers R1, reference 102, R2, reference numeral 104, R3, reference numeral 106 and R4, reference numeral 108. Node R1, R2 and R4 are joined to node R3 via transit S0, S1, S2, reference numerals 110, 112, 114 respectively. Nodes R1 and R2 comprise leaf or receiver nodes which can receive multicast traffic from root node R4 via transit node R3.
Referring to FIG. 2, at step 200, receiver node R2 joins the multicast tree according to any appropriate mechanism, and obtains the relevant identifiers of the tree,namely the root node and the FEC of traffic belonging to the tree. It then creates an LDP path from the root R4. In particular, at step 202 R2 identifies its nexthop to the root of the tree for example from its IP forwarding table, in the present case, node R3. At step 204 node R2 constructs a P2MP label mapping message 116 indicating the multicast tree FEC (for example an identifier “200”), the root R4 of the multicast tree and the label it pushes to R3, label L2. In the case of a P2MP network the downstream direction for traffic is from R4 via R3 to R2 and hence the label mapping message is sent upstream from R2 to R3.
At step 206 node R3 similarly allocates a label L5 and updates its forwarding state such that incoming packets with label L5 will have the label swapped for label L2 and forwarded along interface S1 to R2. Node R3 further sends a P2MP label mapping message to node R4 indicating the FEC 200, the root R4 and its label L5 at step 208. At step 210 root node R4 updates its forwarding state with label L5 for the FEC 200. It will be noted that steps 200 to 210 are repeated for each leaf or receiver node joining the multicast tree. For example if node R1 joins the tree then it sends a P2MP label mapping message to R3 with FEC 200, route R4 and label L1. In this case, as is appropriate for multicast, R3 does not construct a further label to send to R4 but adds label L1 to the forwarding state corresponding to incoming packets with label L5.
P2MP LDP Multicast can be further understood with reference to FIG. 3 which shows the network of FIG. 1 with the datapath of multicast traffic, and FIG. 4 which comprises a flow diagram showing the steps performed in the forwarding operation. At step 400 the root node R4, acting as ingress node to the P2MP network, recognizes in any appropriate manner traffic for example ingress IP traffic for the multicast tree 100 and forwards the traffic shown as packet 300 to which the label L5 302 is appended to an IP payload 304. The forwarding table or multicast LFIB (mLFIB) 306 maintained at R3 for traffic incoming on interface S2 is shown in FIG. 3 for “down” traffic, that is, traffic from the route to the receivers. At step 402 node R3 carries out an RPF check to ensure that the incoming packet with label L5 arrived on the correct interface S2. If so, then at step 404, labels L1 and L2 are swapped for label L5 for forwarding along respective interfaces S0 and S1. As a result packets 308,310 are sent to the respective receivers with the appropriate label appended to the payload.
Provision is also made for withdrawal of labels. For example referring to FIG. 5, which is a flow diagram illustrating the steps performed in a label withdrawal transaction, where a node for example node R2 wishes to leave the multicast tree then at step 500 it sends a label withdraw message to its nexthop neighbor R3. At step 502, node R3 deletes the relevant state for example label L2 and at step 504 R3 sends a label release message to R2. It will be noted that if node R1 also leaves the tree then node R3 will remove all of the state corresponding to FEC 200 and will send a label withdraw message to node R4.
FIG. 6 is a flow diagram illustrating the steps performed when a nexthop changes but without removal of any receiver node from the multicast tree. An example topology is shown in FIG. 7, which is a network diagram corresponding to FIGS. 1 and 3 but with an additional node R5 700 as node R3's nexthop to node R4, and an additional node R6 702 as an alternative nexthop for node R2 to node R4. Node R2's nexthop to node R4 will change if the link between node R5 and node R4 fails, and change to, for example, node R6.
In that case at step 600 node R2 sends a label withdraw message to node R3 and at step 602 node R2 clears the relevant entries in its mLFIB. At step 604 node R2 sends its new label for example L6 to node R6 following the label mapping procedures described above with reference to FIG. 2. At step 606 node R6 installs the label L6 and forwards a label mapping message to root R4 again in the manner described above.
It will be noted that LDP allocates a local label for every FEC it learns, and if the FEC is removed, the local label and an associated binding (i.e., remote corresponding labels) for the FEC are preserved for a timeout period. If the FEC is reinstated before the timeout expires, LDP uses the same local label binding for that FEC. Accordingly where there is a network change which changes the route of the multicast tree's unicast nexthop, the same local label binding is used and rewritten in an ingress interface independent manner such that the label rewrite is used on the data plane, i.e., in the mLFIB, before and after the network change.
In the case of an MP2MP multicast network, this is effectively treated as M individual P2MP networks in which each leaf can either be a receiver from the root node as with a P2MP network, or a sender of multicast traffic to the other leaves on the network. Because of this bi-directionality it will be noted that traffic can be considered as either “down traffic” i.e., from the root to the leaves acting as receivers, or “up traffic” in the form of traffic from the leaves, acting as senders, towards the root. Accordingly the direction of “upstream” and “downstream” traffic depends on whether it is “up traffic” in which case the downstream direction it towards the root, or “down” traffic in which case the downstream direction is away from the root. Further discussion of MP2MP multicast with LDP is provided in “Multicast Extensions for LDP” of Wijnands et al which is available at the time of writing on the file “watersprings.org/pub/id/draft-wijnands-mpls-ldp-mcast-ext-00.txt” in the directory “pub/ID” of the domain “watersprings.org” on the World Wide Web.
FIG. 8 is a network diagram showing an MP2MP network. The network shown is different to that shown in FIGS. 1 and 3 and hence different numbering is used although the nodes are named similarly. In particular the network is designated generally 800 and includes receiver/sender nodes R1, R2 reference numbers 802, 806, a transit node R3 reference number 808, a root node R4, reference numeral 810 and a further receiver/sender node R5, reference number 812. Nodes R1, R2 and R4 are joined to node R3 via respective interfaces S0, 814 S1, 816, and S2, 818. Node R4 is joined to node R5 by a further interface 53. It will be noted that the root node R4 is a shared root although it may in addition be an ingress or receiver or sender node as appropriate.
FIG. 9 is a flow diagram illustrating the manner in which a receiver/sender node for example node R2 joins an MP2MP multicast tree. At step 900 node R2 joins the tree and at step 902 node R2 identifies its nexthop to the root node R4, namely node R3, in the manner described above with respect to P2MP. At step 904 node R2 sends a “pseudo label mapping message” or pseudo label 820 to node R3. The request pseudo includes identification of the FEC 200, the root R4, and R2's ingress label L2. Accordingly the message is generally in the similar form to a P2MP label mapping message however it is termed here a pseudo label request label mapping message as it must be distinguishable from a standard P2MP label mapping message as described in more detail below. In practice, of course, the message can be recognizable as a pseudo label request message in any appropriate manner.
At step 906, node R3 recognizes the message as a pseudo label request message and sends a return MP2MP label mapping to node R2 identifying the FEC 200 and providing its own ingress label L3. As a result node R3 provides a label to node R2 for use with “up traffic” from R2 towards the route. At step 908 node R3 sends a pseudo label request message 824 to node R4 indicating the FEC 200, root R4 and node R3's ingress label L5. At step 910, node R4 sends its MP2MP label mapping 826 for up traffic to node R3 indicating FEC 200 and its ingress label L6.
It will be noted that each additional receiver/sender carries out the same procedure, for example node R1 will send a pseudo label request message 828 to node R2 indicating FEC 200, root R4 and label L1 and will receive a label mapping 830 from R3 indicating FEC 200 and label L4 for up traffic.
FIG. 10 is a network diagram corresponding to FIG. 8 and showing some of the forwarding state or mLFIB's constructed following the transactions described with reference to FIG. 9. In particular for down traffic at node R3, that is traffic arriving from root node R4 on interface S2, the forwarding table is shown at 840. Referring to FIG. 11 which is a flow diagram illustrating forwarding of MP2MP multicast traffic, at step 1100, traffic arriving with label L5 is RPF checked to ensure that it arrived on ingress interface S2. Then at step 1102 label L5 is replaced by label L1 and the traffic is forwarded on interface S0 to node R1. At step 1104 label L2 is added and the traffic forwarded interface S1 to node R2.
For up traffic from node R1 towards the root on interface S0 forwarding table 842 is shown and forwarding of such traffic at node R3 can be understood with reference to FIG. 12 which is a flow diagram illustrating forwarding of incoming traffic on interface S0. At step 1200 an RPF check is carried out on traffic with label L4 to ensure that it arrives on interface S0. At step 1202 traffic to node R4 is forwarded on interface S2 with label L6. It will be noted that this label is learnt from the MP2MP label mapping from node R4. At step 1204, label L2 is added for traffic on interface S1 for node R2. It will be noted that this forwarding information can be inherited from the downstream state table 840.
Table 844 shows the forwarding state for up traffic received at node R3 on interface S1 from node R2. FIG. 13 is a flow diagram illustrating the steps in forwarding said up traffic. At step 1300 an RPF check is carried out on traffic carrying label L3 to ensure that it arrived on interface S1. At step 1302 traffic towards the root R4 is forwarded on interface S2 with label L6 which again is learnt from the MP2MP label mapping from node R4. At step 1304 traffic for node R1 is forwarded on interface S0 with label L1 which again is inherited from the downstream state.
It will be noted that as a result of this arrangement, restricted label space is required and labels are reused where possible. In addition, information can be inherited from appropriate routing tables. Yet further, it will be seen that up traffic does not need to proceed all of the way to the root before it can be multicast to all other receivers, but can be forwarded at transit nodes as appropriate. For example traffic from node R1 acting as receiver to node R2 acting as sender is sent to R3 which then forwards it directly to node R2 rather than up to node R4 and back again.
A problem inherent in both unicast and multicast traffic is that of micro looping. In essence, micro loops occur when a network change takes place and nodes converge on the new network at different times. While the nodes are not all converged, there is a risk that one node will forward according to an old topology whereas another node will forward according to a new topology such that traffic will be sent back and forth between two or more nodes in a micro loop. In IP networks, transient micro loops can occur for example because of control plane inconsistency between local and remote devices (that is, for example, inconsistencies in the RIB), control and data plane inconsistency on a local device (that is inconsistencies between the RIB and the FIB if the FIB has not yet been updated), and inconsistencies on the data plane between local and remote devices, for example where the FIB or LFIB or respective nodes are converged on different topologies.
Transient micro loops are in fact common in IP networks, and in unicast IP routing the impact and number of devices affected is restricted. However, in the case of multicast networks there is the risk of exponential-traffic loops during convergence. For example, if there are 100,000 multicast trees through a multicast core router such R3 then during a network change, transient micro loops could bring down the entire network.
Other problems can arise as a result of re-using labels. Typically, on the control plane, a local label withdraw message is sent to the old nexthop and the same label may be distributed to the new nexthop. There is no strict timing for sending the label withdraws and releases and even if the withdraw message is sent to the old nexthop before the label mapping message is sent to the new nexthop, because of the asynchronous nature of the communication and processing of the node or router, the old nexthop may not have been updated before the new nexthop uses the label which can lead to a FIB/mLFIB inconsistency between local and remote devices for a period of time. In particular, because the local label is the same for the old and new trees, ingress traffic from the new tree could be forwarded to the old tree and traffic from the old tree could be forwarded to the new tree forming a transient micro loop. Similarly the reverse can take place whereby traffic from the old tree is forwarded to the new tree which is then forwarded back to the old tree to form a transient micro loop.
In fact, the problem is exacerbated as nodes do not withdraw or release their labels immediately but wait for the label hold down timer to expire, which again slows down the convergence process and increases the window in which errors can occur.