The following abbreviations are herewith defined, at least some of which are referred to within the following description of the state-of-the-art and the present invention.
APS Automatic Protection Switching
CC Continuity Check
CFM Connectivity Fault Management
ERP Ethernet Ring Protection
ETH Ethernet
FDB Forwarding Database
IEEE Institute of Electrical and Electronics Engineers
ITU International Telecommunication Union
LAN Local Area Network
MAC Media Access Control
MAN Metropolitan Area Network
NR No Request
OAM Operation, Administration and Maintenance
RB RPL Blocked
RPL Ring Protection Link
SF Signal Failure
STP Spanning Tree Protocol
TTL Time to Live
TLV Type Length Value
VLAN Virtual Local Area Network
WTR Wait to Restore
Computers are often connected together through a network (e.g., LAN, MAN) that is made up of nodes (bridges, switches, routers) in which it is desirable for data that is being transmitted from one bridge to be constrained to follow a loop-free path. Unfortunately, the previous draft standard of ITU-T G.8032 Ethernet Ring Protection Switching exhibited the possibility for some data loops to be created when old information circulates within the ring. The most critical problem is when old information interpretation allows the creation of a loop of data traffic that may last several minutes. This problem where a data loop can be formed if a node wrongly interprets an old message is demonstrated in an exemplary scenario discussed in detail below with respect to FIGS. 1A-1K (PRIOR ART).
Prior to describing this exemplary scenario, a brief discussion is provided next to promote an understanding of some of the main terms and concepts associated with the ITU-T G.8032 standard (the contents of which are incorporated by reference herein) that may be relevant to the present discussion. Of course, those people who are skilled in the art will already be well aware of these main terms and concepts commonly associated with the protocol of ITU-T G.8032.
The ITU-T G.8032 standard's Objectives and Principles are highlighted here:                Use of standard 802 MAC and OAM frames around the ring.        Uses standard 802.1Q (and amended Q bridges), but with the xSTP disabled.        Ring nodes support standard FDB MAC learning, forwarding, flush behavior and port blocking/unblocking mechanisms.        Prevents loops within the ring by blocking one of the links (either a pre-determined link or a failed link).        Monitoring of the ETH layer for discovery and identification of SF conditions.        Protection and recovery switching within 50 ms for typical rings.        Total communication for the protection mechanism should consume a very small percentage of total available bandwidth.        
The ITU-T G.8032 standard's Terms and Concepts are highlighted here:                ERP—The common name for the ITU-T G8032 draft standard.        RPL—Link designated by mechanism that is blocked during Idle state to prevent loop on bridged ring.        RPL Owner—Node connected to RPL that blocks traffic on RPL during idle state and unblocks during protected state.        Link Monitoring—Links of ring are monitored using standard ETH CC OAM messages (CFM).        SF—Signal Fail is declared when ETH trail signal fail condition is detected.        NR—No Request is declared when there are no outstanding conditions (e.g., SF, etc.) on the node.        Ring APS (R-APS) Messages—Protocol messages defined in G.8032 and ITU-T Y.1731 entitled “OAM Functions and Mechanisms for Ethernet Based Networks” (the contents of which are incorporated by reference herein).        APS Channel—Ring-wide VLAN used exclusively for transmission of OAM messages including R-APS messages.        TLV—Optional information that may be encoded as a type-length-value or a TLV element and used within data communication protocols and particularly within ITU-T Y1731 onto which the ITU-T G.8032 has based its frame format for R-APS. The type and length fields are fixed in size (typically 1-4 bytes), and the value field is of variable size. These fields are used as follows:                    Type: a numeric code which indicates the kind of field that this part of the message represents.            Length: the size of the value field (typically in bytes).            Value: variable sized set of bytes which contains data for this part of the message.                        
Some of the advantages of using a TLV representation are:                TLV sequences are easily searched using generalized parsing functions.        New message elements which are received at an older node can be safely skipped and the rest of the message can be parsed.        TLV elements are typically used in a binary format which makes parsing faster and the data smaller.        
The ITU-T G.8032 standard specifies the use of different timers to avoid race conditions and unnecessary switching operations. These timers are highlighted here:                WTR Timer—Used by RPL Owner to verify that the ring has stabilized before blocking the RPL after SF Recovery. The WTR timer may be configured by the operator in 1 minute steps between 5 and 12 minutes; the default value is 5 minutes.        Hold-off Timers—Used by underlying ETH layer to filter out intermittent link faults, where faults will only be reported to the ring protection mechanism if this timer expires.        
The ITU-T G.8032 standard's Controlling the Protection Mechanism is highlighted here:                Protection switching triggered by:                    Detection/clearing of Signal Failure (SF) by ETH CC OAM.            Remote requests over R-APS channel (Y.1731).            Expiration of G.8032 timers.                        R-APS requests control the communication and states of the ring nodes:                    Two basic R-APS messages specified—R-APS(SF) and R-APS(NR).            RPL Owner may modify the R-APS(NR) indicating the RPL is blocked—R-APS(NR,RB).                        Ring nodes may be in one of two states:                    Idle—normal operation, no link/node faults detected in ring.            Protecting—Protection switching in effect after identifying a signal fault.                        
The ITU-T G.8032 standard's link failure scenario is highlighted here:                1. Link/node failure is detected by the nodes adjacent to the failure.        2. The nodes adjacent to the failure will block the failed link and report this failure to the ring using R-APS (SF) message.        3. R-APS (SF) message triggers:                    RPL Owner unblocks the RPL.            All nodes perform FDB flushing.                        4. Ring is in protection state.        5. All nodes remain connected in the logical topology.        
The ITU-T G.8032 standard's link failure recovery scenario is highlighted here:                1. When the failed link recovers, the traffic is kept blocked on the nodes adjacent to the recovered link.        2. The nodes adjacent to the recovered link transmit RAPS(NR) message indicating they have no local request present.        3. When the RPL Owner receives RAPS(NR) message it starts the WTR timer.        4. Once the WTR timer expires, RPL Owner blocks RPL and transmits a R-APS (NR, RB) message.        5. Nodes receiving the message perform a FDB Flush and unblock their previously blocked ports.        6. Ring is now returned to Idle state.        
Other useful information: the ERP uses the R-APS messages to manage and coordinate the protection switching. The R-APS messages (which are continuously repeated) and the OAM common fields are well known to those skilled in the art and are defined in ITU-T Y.1731.
Referring to FIGS. 1A-1K (PRIOR ART), there are illustrated several diagrams of an exemplary network 100 at different steps 1A-1L which are used to help describe how a node (e.g., bridge, switch, router) can wrongly interpret an old message which leads to the formation of an undesirable data loop. The discussion below first describes how the bridge can wrongly interpret an old message which leads to the formation of the undesirable data loop then a discussion is provided to explain the deficiencies of the current ITU-T G.8032 standard which proposes to use a guard timer in an attempt to prevent the formation of the undesirable data loop. The different steps 1A-1K respectively correspond to FIGS. 1A-1K.
1A. Assume the exemplary network 100 has a ring of six nodes that are numbered from 1 to 6 and called node 1 to node 6, respectively. The node 1 is the RPL owner.
1B. Assume node 1 periodically sends R-APS1 (NR,RB) messages reflecting its idle state, across the ring (as per standard). Assume node 1 is blocking a port 102 to RPL link 104 to prevent a loop (as per standard). Assume all nodes 1-6 are in idle states.
1C. Assume there is a failure 106 on link 108 between node 5 and node 6.
1D. Node 5 and node 6 respectively block ports 110 and 112 on failed link 108 and send R-APS(SF) messages when they transition from the idle state to the protection state (as per standard).
1E. Assume the link 108 is up again between node 5 and node 6. Node 5 and node 6 send R-APS(NR) messages and remain in the protection state (as per standard).
1F. Assume the RPL owner (node 1) receives the R-APS (SF) message sent by node 5 or node 6 during step 1D. The RPL owner (node 1) unblocks the non failed RPL port 102 and goes from the idle state into the protective state.
1G. Assume the RPL owner (node 1) receives a R-APS(NR) message sent from node 5 or node 6 during step 1E. The RPL owner (node 1) starts a WTR 114 and remains in the protective state (as per the standard).
1H. Assume that the WTR 114 expires, the RPL owner (node 1) blocks the RPL port 102 again and goes back to the idle state. The RPL owner (node 1) periodically sends R-APS2(NR,RB) messages.
1I. Node 5 and node 6 receive the R-APS2(NR,RB) message from step 1H, unblock the non failed ports 110 and 112 and transition from the protection state to the idle state (as per standard).
Steps 1H and 1I are the expected sequence of steps but a non-expected sequence of steps 1J and 1K could occur after step 1G which would lead to the undesirable creation of the data loop in the network 100. The problematical and un-expected sequence of steps 1J and 1K are as follows:
1J. The WTR timer 114 is still running at RPL owner (node 1).
1K. Node 5 and node 6 receive the R-APS1(NR,RB) message from step 1A, unblock the non-failed ports 110 and 112 and transition from the protection state to the idle state (as per standard).
Steps 1J and 1K are possible if there is a delay in transmitting messages from node 1 to node 5 because of, for example, congestion/queueing or software processing (if trap and forward in software). In this situation, the R-APS1 (NR,RP) message from step 1A could still be transiting over the ring while the RPL owner (node 1) was already in the protective state. In this case, the RPL owner (node 1) would have RPL port 102 forwarding and the nodes 5 and 6 would have their ports 110 and 112 all forwarding at the same time which means there would be an undesirable loop 116 (see FIG. 1K). Unfortunately, this loop 116 cannot be characterized as “transient” which means its duration is very short probably less than 500 ms. Instead, the loop 116 can last for as long as the WTR timer is configured meaning 10 minutes at worse. Of course, this type of situation should prevented at all cost because TTL is not implemented at layer 2 in the network 100 which means that a layer 2 loop 116 would allow some packets to loop forever.
The current solution to this problem is described in the ITU-T G.8032 draft standard. The ITU-T G.8032 attempts to solve this problem by configuring and using a guard timer to ignore certain messages that are susceptible to being too old. To clarify the goal of the guard timer, the standard states the following:
“R-APS messages are continuously repeated with an interval of 5 seconds. This, combined with the R-APS messages forwarding method, in which messages are copied and forwarded at every ring node around the ring, can result in a message corresponding to an old request, which is no longer relevant, being received by ring nodes. The reception of messages with outdated information could result in erroneous interpretation of the existing requests in the ring and lead to erroneous protection switching decisions.
The guard timer is used to prevent ring nodes from receiving outdated R-APS messages. During the duration of the guard timer, all received R-APS messages are ignored by the ring protection control process. This allows that old messages still circulating on the ring may be ignored. This, however, has the side effect that, during the period of the guard timer, a node will be unaware of new or existing ring requests transmitted from other nodes.
The period of the guard timer may be configured by the operator in 10 ms steps between 10 ms and 2 seconds, with a default value of 500 ms. This time should be greater than the maximum expected forwarding delay for which one R-APS message circles around the ring.
The guard timer may be started and stopped. While the guard timer is running the received R-APS Request/State and Status information is not forwarded. If the guard timer is not running, the R-APS Request/State information is forwarded unchanged. “see Section 10.1.5 in ITU-T G.8032 (June 2008).
Also, the guard timer is started on detection of a “local clear SF” meaning that the failure condition has been detected before (link down for example) and is now not happening anymore and is therefore cleared. Thus, when this guard timer is applied to present scenario this means that the user will have to configure the guard timer by estimating how long at worse, a frame can be delayed, so that all previous R-APS(NR,RB) messages are not wrongly interpreted which if done can lead to the creation of some very long lasting loops 116. The guard timer solution is inadequate for the following reasons (for example):
1. A node may discard a valid message, and this may create other kinds of problems. For example, the discarding of an R-APS message carrying a flush (in its status field) can create a loss of connectivity between elements connected through this ring, since the node does not react to this important instruction allowing flooding and addresses to be relearned. This loss of connectivity may last 5 seconds assuming that this is the chosen interval for sending the R-APS message.
2. The guard timer relies on the user setup (because the guard timer is not mandated by the ITU-T G8032 draft standard) of a timer therefore the user needs to understand this kind of complex problem. Then, even if the user configures the guard timer, it can still be too short and the problem can still arise and if it is too long some valid frames can be lost.
Accordingly, there has been and still is a need to address the aforementioned shortcomings and other shortcomings associated with the creation of undesirable loops and the proposed guard timer. These needs and other needs are satisfied by the present invention.