Large communication systems often comprise a plurality of networks which may be connected with each other via a network interconnect solution. Usually, each network of the communication system comprises a plurality of network nodes which are interconnected through internal links, whereas the networks as a whole are interconnected via external links. Such network nodes which interconnect the networks of the system may be referred to as “interconnect nodes” or “edge nodes”.
As an example for interconnect nodes, Distributed Resilient Network Interconnect (DRNI) nodes can be mentioned. Standardization of DRNI is ongoing in IEEE. DRNI may be defined as an extension to the existing IEEE link aggregation standard. DRNI nodes that belong to the same provider may use the Inter-Chassis Communication Protocol (ICCP) to communicate with each other.
Node failures, or faults, may occur in one or more of the interconnect nodes due to a plurality of reasons. Node recovery out of a node fault state is therefore an issue important for network management and maintenance. DRNI node fault management operation rules may be implemented using a linear protection switching approach. As an example network interconnect nodes may implement the International Telecommunication Union Standardization Automatic Protection Switching (ITU-T APS) or IEEE Provider Backbone Bridge Traffic Engineering (PBB-TE) protection switching protocol over a tunnel or physical link between each other, which in case of node (including link or tunnel) faults trigger the node fault management actions.
FIGS. 1A and 1B show possible forwarding errors due to status collisions among two interconnect nodes of one network. In FIGS. 1A and 1B, a communication system 100 comprises a first network 101, a second network 103 and an interconnect interface 102 between the first and second networks 101, 103. The interconnect interface 102 comprises four interconnect nodes, that is, a first node 1011, a second node 1012, a third node 1031 and a fourth node 1032. The first and second interconnect nodes 1011, 1012 belong to the first network 101, whereas the third and fourth interconnect nodes 1031, 1032 belong to the second network 103.
In FIGS. 1A and 1B, the first to third nodes 1011, 1012, 1031 are pre-configured with an active data plane (or active status; depicted by “A”) for any given service, whereas the fourth node 1032 is preconfigured with a passive data plane (or passive status; depicted by “P”) for any given service. It should be noted that an individual interconnect node 1011, 1012, 1031, 1032, when being operational, could either assume an active status or a passive status with respect to an individual service. Only the interconnect nodes 1011, 1012, 1031 assuming an active status for a given service is enabled to transfer the associated service-related data via an internal link from and towards the associated network 101, 103. The interconnect node 1032 assuming a passive status is only allowed to transfer data to another interconnect node 1011, 1012, 1031.
In FIG. 1A, there are accordingly two active nodes (first and second nodes) 1011, 1012 for a given service at the same time. This situation may cause problems with forwarding, such as duplicate frame delivery of broadcast and unknown frames (see forked double arrow in FIG. 1A), as internal network nodes within the first network 101 (not shown) rely on the fact that only one active interconnect node is present at a time. Thus, the same frame may be relayed to both the first and second nodes 1011, 1012, which two nodes 1011, 1012 then transmit the same frame in duplicate to the third and fourth nodes 1031, 1032. The passive node 1032 will simply relay the received frame to the active node 1031. In turn, the active node 1031 may have no means to check whether the frame received from the active node 1011 and the frame relayed from the passive node 1032 actually are identical. Although such a check was theoretically possible, it would cause an exponential workload on the active node 1031 to check whether an N-th received frame is identical to N−1 stored frames. But even such identity between two frames is found, the active node 1031 cannot be sure whether the identity is actually erroneous, or whether a recipient node (not shown) in the second network 103 has requested a re-transmission of that (identical) frame.
In the scenario depicted in FIG. 1B, basically the same situation as in FIG. 1A arises. In FIG. 1B, broadcast frames and/or unknown frames may be turned back over the DRNI (see “round-trip” arrow in FIG. 1B).
FIG. 2 shows a sequence of events that may lead to the forwarding problems illustrated in FIGS. 1A and 1B. When starting on the left portion of the time axes for both the first node 1011 and the second node 1012, both nodes 1011, 1012 exchange No Request (NR) signals to assure one another that the first and second nodes 1011, 1012 are both operational.
At time “Node 1011 down”, the first node 1011 experiences a node fault (including a tunnel or link fault), and thus turns non-operational. Shortly afterwards, for example by means of a network surveillance tool, the second node 1012 is informed of the fault of the first node 1011 at time “Node down detected”. Accordingly, the second node 1012 sets its data plane from passive to active so as to back-up the one or more services for which the first node 1011 has had an active status. As shown in FIG. 2, a first Wait-to-Restore (WTR) indication/signal is sent by the second node 1012, but cannot be received by the first node 1011 still being non-operational.
Then, at time “Node 1011 up”, the first node 1011 (including an associated link or tunnel) recovers from its fault to the operational state. As soon as the first node 1011 recovers, a local WTR timer is started and the data plane of the first node 1011 is set to passive. Shortly afterwards, at time “Node up detected”, the second node 1012 is informed of the first node 1011 having recovered. Likewise, the second node 1012 starts its own local WTR timer. However, not having received a confirmation from the recovery of the first node 1011, the second node 1012 maintains its data plane as active.
As soon as the local WTR timer of the first node 1011 expires, the first node 1011 will set its data plane as active for dedicated services, and substantially at the same time will clear the WTR indication/signal (e.g., from the APS channel). The second node 1012 receives the following NR signal from the first node 1011 with some delay, during which delay the second node 1012 keeps its data plane active for the same services, since prior to reception of the NR signal, the second node 1012 cannot ascertain that the first node 1011 has already set its data plane to an active status.
Hence, during the temporary period marked with “Forwarding problems” in FIG. 2, the first and second nodes 1011, 1012 set their data plane as active for the same services. Within this period, the problems shown in FIG. 1A and FIG. 1B may arise. As soon as the recovered second node 1012 receives the NR signal from the first node 1011, the second node 1012 sets its data plane to passive, and the period of forwarding problems ends.
The problem with the solution illustrated in FIG. 2 resides inter alia in that neither the ITU-T APS nor the IEEE PBB-TE protection switching protocol provides means to coordinate the sequence of actions between the two participating interconnect nodes 1011, 1012. In other words, in existing protocols, the actions of the two active interconnect nodes 1011, 1012 are not coordinated, which results in the above-described problems with respect to frame forwarding (duplicate frame delivery, turn back of frames, etc.).