Network operators and carriers are deploying packet-switched communications networks in place of circuit-switched networks. In packet-switched networks such as Internet Protocol (IP) networks, IP packets are routed according to routing state stored at each IP router in the network. Similarly, in Ethernet networks, Ethernet frames are forwarded according to forwarding state stored at each Ethernet switch in the network. The present invention applies to communications networks employing any Protocol Data Unit (PDU) based network and in this document, the terms “packet” and “packet-switched network”, “routing”, “frame” and “frame-based network”, “forwarding” and cognate terms are intended to cover any PDUs, communications networks using PDUs and the selective transmission of PDUs from network node to network node.
Multicast forwarding of data packets (where packets are sent from a source node to multiple destination nodes more or less simultaneously) is of increasing importance as demand for services such as PTV and Video on Demand (VoD) grows.
Protocols such as Intermediate System—Intermediate System (IS-IS) and Open Shortest Path First (OSPF) and Multicast OSPF are used to distribute topology information to permit distributed calculation of paths that interconnect multiple nodes, resulting in the installation the forwarding state required to implement those paths. OSPF and IS-IS are run in a distributed manner across nodes of the network so that, for example, when a topology change occurs in the network such as a node or link failure, this information is flooded to all nodes by the protocol's operation, and each node will locally recompute paths to circumvent the failure based on a consistent view of network topology.
In Ethernet networks, Provider Backbone Transport (PBT), also known as Provider Backbone Bridging-Traffic Engineering (PBB-TE), as described in Applicant's British patent number GB 2422508 is used to provide a unicast Ethernet transport technology. Provider Link State Bridging (PLSB) as described in Applicant's co-pending U.S. patent application Ser. No. 11/537,775 will be used to provide a multicast transport capability for Ethernet networks using IS-IS to set up both unicast paths and multicast trees in the network. Both above patent documents are hereby incorporated by reference.
While the present invention is not limited to the application of a routing system to Ethernet bridging, Ethernet terminology is used in this disclosure where possible. So, for example, the term filtering database (FDB) can be considered interchangeable with any term for an information repository of packet forwarding information, such as forwarding information base or label information base.
FIG. 1 is a flowchart illustrating the principle steps in an all-pairs shortest path multicast route computation algorithm (known, for example, from Applicant's co-pending U.S. Patent Application Publication No. 20070165657), which is normally implemented in each node. In this example it is assumed that included in routing system advertisements is multicast group membership information, although it is easy to envision that multiple systems may be combined to achieve the same result.
As shown in FIG. 1, upon receipt of either a multicast group membership change or a network topology change (for example via a Link State Packet—LSP) the node employs algorithms such as Dijkstra's algorithm to compute both unicast connectivity (at S2) and the set of pairs of network nodes where the computing node lies on the shortest path between the pair. For that set of node pairs, the node determines where intersections of multicast group membership occur, which define the required FDB entries to instantiate it's portion of multicast paths accordingly. Both Unicast and Multicast forwarding state implementing the computed forwarding is then installed in the node's filtering database (FDB), at S4, so that received packets can be forwarded to the appropriate output port(s) of the node, based on the destination address in the frame.
As is known in the art, network nodes can be implemented with either a single common FDB which is used to control forwarding of traffic received through all input ports (interfaces), or a respective different FDB for each input port or subsystem. In the case of a node having a respective different FDB associated with each input port, multicast forwarding state can be installed in the respective FDB of the appropriate input port, which may be identified using the computed unicast path to the root node of the multicast tree.
Typically, changes in the network topology, whether detected directly by a node (e.g. a failure of a physical link connected directly to the node) or indirectly (e.g. via receipt of a Link State Advertisement, LSA) will be reflected in changes in a Network Topology Database. Accordingly, recomputation of forwarding state in response to changes in network topology may be triggered by a change in the network topology database. In any event, following a network topology change, the (old) forwarding state will remain in effect until new forwarding state is installed in the FDB.
In a network where path computation is distributed, there is always the danger of the loose synchronization of the routing databases that the local FDB is derived from, and other variations in individual node implementation such as compute capacity, speed with which the internals can be synchronized etc. This loose synchronization can result in transient loops. A high level summary is that transient loops can occur due to the physical impossibility of instantaneously distributing and acting upon state change information across multiple nodes of the network. Looping of packets is at best wasteful of network resources, and at worst may result in congestive network failure. Looping is significantly more serious for multicast forwarding than for unicast forwarding because packets may be replicated outside of and forwarded around, such a loop, resulting in an explosion of packet creation and forwarding.
There are various approaches to mitigating the problems of loops appearing in a network. In IP networks, IP packets have a Time To Live (TTL) counter which is decremented at each hop and will eventually cause looping packets to be discarded. Routers will not forward packets where the TTL counter has been decremented to zero. However, this merely “limits the size of the blast crater” created by the loop. Spanning Tree Protocol is used in Ethernet networks to block ports during periods of network instability, which shuts down all traffic, not simply the traffic whose forwarding paths were directly impacted by the network change, and unblocks the ports only when the network has converged in a new loop free solution. This prevents loops, but is wasteful of network resources in reasonably sized networks, disrupts traffic out of proportion to the topology change, and is incompatible with technologies that exploit Ethernet mesh connectivity such as PBT and PLSB. Other mitigating approaches include ordering the installation of forwarding state in a controlled manner as described in a paper “Avoiding Transient Loops During the Convergence of Link-State Routing Protocols” Pierre Francois and Olivier Bonaventure, IEEE/ACM TRANSACTIONS ON NETWORKING 15(6):1280-1932, December 2007. However, this slows down fault recovery times which is unattractive to network operators.
The application of a Reverse Path Forwarding Check (RPFC) to packets is a well known technique that reduces the probability of packet looping by eliminating promiscuous packet receipt at intermediate nodes (i.e. arrival on any port is not acceptable), converting the forwarding to what is known as a directed tree. This is accomplished by ensuring that any packet received from a given source arrives on an expected port for that source at each intermediate node. In the case of an Ethernet bridge, there will be only one expected port. When a packet sent from a given source node arrives at an intermediate node on a particular port or interface, a check is performed to see if there is a matching entry for the source address of the packet in the intermediate nodes filtering database for that port or interface. If there is, the packet is forwarded as normal. If not, the packet is dropped. In other words, a check is performed to see if the packet came in on a port or interface that the intermediate node would itself use for forwarding a packet on the “reverse” unicast path to the source node. For some packet forwarding paradigms, there may be more than one valid port that can be used to reach a given source (e.g. equal cost multipath), in which case the degree of robustness provided by RPFC is diminished. For PLSB there is a one-to-one correspondence between the partial multicast tree from the source node to the intermediate node, and the reverse unicast path from the intermediate node back to the source node, in any given Backbone VLAN Identifier (B-VID). Accordingly, if a packet is received from the source node via any port other than the one port that corresponds to the reverse unicast path, then an inference can be made that a loop may exist.
When constructing multicast trees, it may be necessary or desirable to construct individual source-specific point-to-multipoint trees (known as (S, G) trees). In such trees, the source is encoded as part of the destination address. As a result, an explicit Reverse Path Forwarding Check (RPFC) is not required if the (S, G) tree multicast address is only installed on ports facing the tree root, because an implicit RPFC is performed by the presence of the multicast address on the port. Throughout this description, the term “RPFC” is used to cover both explicit and implicit versions of the technique.
RPFC eliminates most circumstances in which looping may occur. However, there remain circumstances in which a transient loop may occur. Specifically, it can be shown that, even when using RPFC, a transient loop may occur when two or more topology changes occur more or less simultaneously. It is possible to consider a number of permutations of two simultaneous topology changes and the partial dissemination of knowledge of each which could achieve the same result, the example considered being of interest as both changes are not immediately adjacent to the nodes that will ultimately break the loop when they have completed computation and installation of their forwarding tables.
FIGS. 2a-d illustrate a simple scenario in which a transient loop may occur. In these figures, a network fragment is shown, which comprises nodes B, C, D, E and R, where R is the source or root node for a multicast tree considered in this example. In the illustrated network, physical links are shown by lines between respective nodes, along with the respective cost of each link (indicated by the value of c). The route followed by packets being forwarded through the multicast tree is shown by arrows, which traces the least cost routing through the network. Thus, in the network state illustrated in FIG. 2a, forwarding state is installed in node R for forwarding packets to node B; and in nodes B, C and D for forwarding packets to nodes C, D and E, respectively. FIGS. 2b-d illustrate state transitions that occur in the network as a result of two topology changes in the network; in this example, the physical link between nodes R and B is broken, and a new, low-cost, link becomes available between nodes E and B, so that this new link is part of the lowest cost route between nodes D and C.
Referring to FIGS. 2b and 2c, when the physical link between nodes R and B is broken (indicated by a cross in the figures), this topology change will be propagated through the network (initially from nodes R and B), for example using a conventional Link State Advertisement (LSA) process. Consequently, nodes B, C, D and E will become aware of the topology change, and will begin re-computing the multicast tree to utilize the physical link from R to D. Similarly, when the new link between nodes E and B becomes available, this topology change will be propagated through the network (initially from nodes E and B). Consequently, nodes B, C, D and E will become aware of the topology change, and will begin re-computing the multicast tree to utilize the new link between nodes E and B. If these two topology changes occur sufficiently far apart in time, then the recomputation of the multicast tree in response to failure of the physical link between nodes R and B, and installation of new forwarding state in all of the affected nodes, will have time to finish before the recomputation of the multicast tree to utilize the new link between nodes B and E begins.
However, if both changes occur close enough in time (that is, they are approximately simultaneous), as shown in FIG. 2b, then the two multicast tree recomputations will overlap in time. For example, in response to the topology change due to failure of link R-B, nodes R, B, C, D and E begin recomputing the multicast path to utilize the link between nodes R and D. While this recomputation is proceeding, the previous forwarding state installed in each node remains in place, so node B continues to forward queued packets to node C. Nodes C and D, in turn, continue forwarding packets to nodes D and E, respectively.
Meanwhile, nodes B and E will be the first nodes to become aware of the availability of the new link, and so will be the first nodes to begin recomputing the multicast tree to use this link. When they complete their respective path recomputations and install the new forwarding state, node E will begin forwarding packets to node B, and node B will continue forwarding packets to node C. If this occurs before nodes C or D have completed their respect path re-computations, the scenario illustrated in FIG. 2c can occur. In this scenario, node E has installed forwarding state for forwarding packets to node B, but node C is still forwarding packets to node D (in accordance with its previous forwarding state—which has not yet been updated), resulting in a loop around nodes B, C, D and E. This loop will persist until such time as either of nodes C or D has recomputed the multicast tree and installed forwarding state to account for at least one of the two topology changes. When this latter recomputation is completed, the network will transition to the loop-free state illustrated in FIG. 2d, in which the loop is broken. However, during the transient period while the network is in the intermediate state shown in FIG. 2c, the loop may cause significant congestion or damage to the network.
Techniques for reducing the probability of transient loops in packet switched networks remain highly desirable.