Computer networks responsible for the forwarding of data frames to end stations have been known in the past. Computer networks may be organized in local area networks with bridges allowing communications between end stations attached to separate LANs, just as if the stations were attached to the same LAN. A bridge, such as a bridge, is typically a computer with a plurality of ports that couple the bridge to other entities. The bridging function includes receiving data from one of the ports and transferring the data to other ports for receipt by other entities in the network. The bridge is able to move data frames from one port to another port very fast since its decision is generally based on the end station information, such as the media access control (MAC) address information contained in the header of such frames. Bridges typically utilize one of a number of potential protocols for the movement of data as set out in industry standards. One such standard is the IEEE 802.1D-2004 entitled “IEEE Standard for Local and Metropolitan Area Networks Media Access Control (MAC) Bridges” published 3 Jun. 2004 by the IEEE and which is incorporated herein by reference. Other protocols are also available at present and may be possible in the future.
When a computer network is formed, the network will generally have a redundant and usually random communication path between each of the bridges. This arises from various bridges in the networks having their ports connected to other bridges in the network in a redundant manner. Furthermore, bridges may be added or removed periodically to the existing network. In addition, bridges may fail during the operation of the network. This is particularly the case if the bridges are used in harsh environments, such as may be found in industrial applications and/or power generating stations and/or other harsh environments. Furthermore, the network connections between the bridges could fail for a number of reasons. In general, redundant paths in the network are desirable in order to improve the robustness of the network and prevent failure of the network if any one specific connection between two bridges fails or an entire bridge fails. In this way, redundant paths, where two different paths connect the same bridges, can be used to overcome link failures and bridge failures in the network.
However, redundant paths also raise ambiguity in the network. In other words, if there is the possibility of a circuitous or “loop” path being formed in the network, such that a frame could travel in the loop continuously and never reach the end user for which the frame is destined. The creation of a loop in a bridge network therefore raises the possibility that data frames continuously traverse the loop without reaching the end user until the network saturates. The creation of loops in a bridge network also raises ambiguities in the address table within each specific bridge decreasing the efficiency of the network.
To permit the existence of redundant communication paths, but to avoid looping problems, various methods of “pruning” a network into a loop-free or tree configuration have been proposed in the past. One such protocol is the Rapid Spanning Tree Algorithm and Protocol (“RSTP”) described in the IEEE 802.1D-2004 standard which is incorporated herein by reference. Previous protocols, such as the “Spanning Tree Algorithm and Protocol” or STP has been proposed in the past but now have been superseded by the “Rapid Spanning Tree and Algorithm and Protocol” (RSTP). A commonality of these protocols is that the resulting topology has a root or root bridge from which the loop-free topology spans forth in a non-redundant loop-free manner.
Difficulties arise, however, in these types of protocols when the root bridge fails. These types of failures, commonly known as “root bridge failures”, are particularly problematic because various bridges within the network will continue to assert the failed root bridge as the current root even if they receive information to the contrary from other bridges in the network. Therefore, recovering from a root bridge failure can be more problematic and have a higher reconfiguration time than the original configuration of the spanning tree protocol because in the original configuration, none of the bridges have a predetermined value identifying which bridge is the current root bridge.
The problem arises, in part, because when a root bridge fails, the other bridges identifying the failure of the previous root bridge will asynchronously assert themselves as the new root bridge, but bridges that are not aware of the root bridge failure will continue to assert the original root bridge. In one embodiment, to obtain information necessary to run a spanning tree protocol, bridges will exchange special configuration messages, often called bridge protocol data units (BPDU). More specifically, upon start up of the network, each bridge initially assumes itself to be the root bridge and transmits BPDUs reflecting this. Upon failure of the root bridge, the bridges adjacent to the original root bridge will initially assume themselves to be the new root and transmit BPDUs reflecting this assumption. Upon receipt of a BPDU from a neighbouring device, the bridge will examine the contents of the BPDU and if the root bridge identified in the received BPDU is “better”, based on predetermined criteria, than the stored root node identifier in the receiving bridge, the bridge adopts the better information and uses it in its own BPDUs that it sends to other bridges from its ports.
While this process works well at start up, if the original root bridge fails, some of the bridges in the network may continue to send BPDUs identifying the original, now failed, root bridge. This arises for a number of reasons. For example, each bridge will become aware of the potential failure of the original root bridge and asynchronously send BPDUs asserting itself as the new root bridge. Furthermore, in large networks, some bridges located remotely from the root bridge may not become aware of the failure of the root bridge and may reassert the root bridge identifier of the original, now failed, root bridge. It is important to note that the root bridge identifier of the failed root bridge will be the “better” selection which is why the original, now failed, root bridge was selected as the root bridge in the original configuration.
This may increase the time by which a convergence to a new loop-free topology can be created after the failure of a root bridge. Furthermore, while the original root bridge information will eventually be timed out, this may not occur for a significant amount of time, such as a few seconds, because the bridges will be periodically receiving information from some of the other bridges in the network identifying the old, now failed, root bridge as still being active, even though the information is not correct but rather outdated. Such a problem has euphemistically been referred to as “counting to infinity” which refers to the endless process by which the failure of a root bridge is not identified by all of the bridges in a network and they continuously advise each other through different BPDU messages of various potential root bridges including the original, now failed, root bridge, thereby erroneously refreshing the original, now failed, root bridge information.
The problem is further complicated because when the bridge neighbouring the root bridge detects a failure, the bridge neighbouring the root bridge cannot always determine if the failure results from a root bridge failure or from a failure in the link between the root bridge and the neighbouring root bridge. If it is a link failure, then eventually one of the bridges connected to the root bridge will identify an alternative path to the original root bridge. However, if it is a root bridge failure, rather than a root link failure, the above difficulties may arise. Therefore, failure between a root bridge and a neighbouring bridge raises an ambiguity as to whether or not the failure arose due to a failure in the link, a failure in the port of either the root bridge, or the neighbouring root bridge, or, an actual root bridge failure.
Because root bridge failures are not that common, many networks can simply tolerate a temporary shut down of the network due to a root bridge failure while the network reconfigures. Unfortunately, in critical networks, such as industrial applications and power generating stations, a failure of a network, even for a relatively short period of time, such as one second, could result in catastrophic effects. Moreover, a root bridge failure may occur when a small portion of the network has been damaged, such as through an electrical failure or an explosion and it is crucial to have the entire network reconfigure itself to a new loop-free topology quickly to avoid the spread of the catastrophic event throughout the system.
Therefore, while it may take seconds to configure a new root, these seconds can be critical when the reason for the root failing may be a systemic or network wide failure such that the longer the network is down, reconfiguring a new loop-free topology, the more likely it is that the effects of a catastrophic event may spread. Also, it is important that all communications on the network be completed quickly and efficiently. In other words, it is important that all BPDUs are a single frame in length, or 60 bytes in the case of an Ethernet frame, to avoid needless network traffic and the potential for BPDUs to be lost or damaged during transmission, particularly if a portion of the network has been damaged.
In the past, other solutions were proposed. For instance, European patent application EP 1 722 518 A1 to Siemens Aktiengesellschaft, provided a modified Root Failure Notification (RFN) BPDU, which did not exist in any current STP, RSTP or MSTP standard. This modified root failure notification first propagates throughout the system causing restart of the state machines and then a subsequent configuration BPDU is sent to configure a new topology. The difficulty with this solution is that a RFN BPDUs must be sent and received by the bridges and then the bridges must restart their state machines, and then a further configuration BPDUs must then be sent. This increases the time for reconfiguration to a new topology. This solution also lacks any control over false positive root failure notification which could cause a “count to infinity” dilemma of RFN BPDU notifications falsely asserting a failure of a root bridge when only a link or port of the root bridge has failed.
Accordingly, there is a need in the art for an improved method and system for reconfiguring a new loop-free topology with a new root bridge after a root bridge failure. Also, there is a need in the art to provide a method and system to avoid a “counting to infinity” dilemma where bridges in the network asynchronously assert themselves as the new root bridge and no one bridge is identified as the new root bridge because various bridges continuously reassert the previous, now failed, root bridge. Also, there is a need in the art to provide a method and system to avoid a “false positive” counting to infinity dilemma where a link failure to a root bridge is misinterpreted as a root failure and the original root bridge has difficulties reasserting itself as the root bridge in the original non-meshed topology. There is also a need in the art for an improved method and system which at least partially satisfies these needs without deviating from existing and accepted IEEE standards, such as the RSTP including BPDUs used in the RSTP.