The present invention relates to communication networks and in particular to spanning tree algorithms for local networks.
A local communication network comprises a plurality of bridging-devices and communication links. Each communication link connects between two or more bridging-devices or between a bridging-device and a non-bridging device, such as an end-station (e.g., a computer), a router or a server. Each bridging-device comprises a plurality of ports which serve as interfaces between the bridging-device and the links to which it is connected. Each port may be active (referred to also as forwarding), blocking or disconnected, for reasons described below. When a source station sends a message to a destination station, the source station sends the message to a nearest bridging-device which sends the message to one of its neighboring bridging-devices (bridging-devices which are directly connected to a common link are referred to herein as neighbors). The neighboring bridging-device passes the message to another bridging-device until the message finally reaches the bridging-device connected to the destination station. In many cases, messages are broadcast to all the bridging-devices in a local network. When a message is broadcast, each bridging-device passes the message through all of its active ports, except for the port through which it was received. This broadcast scheme operates properly only if the active ports do not form a loop in the network. If the network includes a loop of active ports, a single message may be repeatedly sent through the network and the network will fail. A topology of active ports which connects all the bridging-devices in a network without forming loops is referred to as a spanning tree.
In many cases redundant links are added to networks to be used in case one or more of the bridging-devices and/or links fail. To properly use these redundant links instead of the bridging-devices and/or links which failed there is a need for a method for blocking and activating the ports of the various bridging-devices of the network. The method must ensure that a loop is never formed in the network and a spanning tree of active ports is available as often as possible. One common algorithm which performs these tasks is the 802.1D standard spanning tree algorithm (STA) which is described in xe2x80x9cInformation technology Telecommunications and information exchange between systemsxe2x80x94Local and metropolitan area networksxe2x80x94Media access control (MAC) bridgesxe2x80x9d, International Standard ISO/IEC 15802-3, 1998, ANSI/IEEE Std 802.1D, 1998 edition, the disclosure of which is incorporated herein by reference.
The 802.1D STA is a distributed algorithm, i.e., it is performed separately by a STA software package in each of the bridging-devices of the network. In most cases, no single bridging-device knows the entire topology of the spanning tree. Rather, each bridging-device decides which of its local ports are part of the spanning tree according to predetermined rules and information received from neighboring bridging-devices. Each bridging-device activates its ports accordingly.
According to the 802.1D STA each bridging-device has a unique identifier which represents the priority of the bridging-device. A root bridging-device is chosen as the bridging-device with the lowest priority. The spanning tree is built as a distance-vector tree around the root, according to link costs associated with the links of the network. Each bridging-device designates one of its ports, which leads to the root along a lowest cost path, as a root port. If two paths to the root have the same cost, the path leading through the neighboring bridging-device with the lowest priority determines the root port. In addition, for each link, one of the ports leading to the link is chosen as a designated port of the link. The designated port of the link is chosen as the port of the bridging-device which has a shortest path from the root. Therefore, the designated ports are never root ports. The bridging-devices activate their designated ports and root port and keep all their other ports blocked. It is noted that messages (except control messages described below) pass from a first bridging-device to a second bridging-device over a link only if the ports of both the first and second bridging-devices leading to the link are active.
The operation of the algorithm is based on exchanging STA update messages (referred to as Bridge Protocol data Unitsxe2x80x94BPDUs) on the state of the network between bridging-devices which are neighbors. The STA BPDUs are sent also through blocking ports, unlike all other messages which are not passed through blocking ports. The BPDUs are identified by receiving bridge devices, either in hardware or software, according to a special destination address which they have. The receiving bridging-device passes the BPDUs to the STA software within the bridging-device and does not forward the BPDU to any other port. Thus, it is ensured that BPDUs are exchanged only between neighboring bridging devices.
The STA software in each bridging-device keeps track of the following parameters:
1) a current supposed ID of the root,
2) a current cost of the shortest path to the current supposed root,
3) a current supposed root port, and
4) a list of local ports which serve as designated ports for their associated links.
These parameters are updated according to received BPDUs, and are used to send updated BPDUs to neighboring bridging-devices. With time, information on the network propagates throughout the bridging-devices of the network and the tree is properly formed. It is noted that between sending a BPDU and sending out an updated BPDU (as a result of new information, for example), the bridging-device waits for a hold-time of a second in order to prevent inaccurate information from spreading throughout the network before the information is corrected. It is possible to change the hold-time to shorter or longer periods, for example to half a second, in some or all of the bridging-devices.
The time required by the 802.1D STA to converge after a change in the network (e.g., failing of a link) is relatively long (many seconds). The convergence time is dependent on the diameter of the network, i.e., the largest number of bridging-devices a message passes in passing between two bridging-devices. With default time-out parameters, the standard 802.1D STA is also limited to networks with a diameter smaller than or equal to seven.
A manager of a network may set a port to a disconnected state, in which the port does not forward any messages, and does not participate in a spanning tree. Usually, a port is set as disconnected by shutting down its hardware. Some bridging devices automatically set a port to the disconnected state if they sense that the port is not connected to any other device and/or if the port is faulty or is connected to a faulty link or device. When a disconnected port begins to operate, it is set to blocking state, and the STA adjusts accordingly.
Use of the standard 802.1D STA allows a user to connect bridging-devices from different manufacturers to a single network. Any deviations from the standard algorithm must be transparent to the bridging-devices of the network in which the changes were not performed.
Many modem LAN bridging-devices support a feature named virtual local area networks (VLANs). Some or all of the messages sent through the network are given a VLAN ID which represents the VLAN to which the messages belong. The ports of the bridging-devices of the network are configured as active or blocking for each VLAN separately. VLANs allow a single physical network to operate as a plurality of independent networks. For example, a station may be connected to a network through a port in which only a VLAN X is enabled. The station therefore can only forward packets to, and receive packets from, stations which are connected to VLAN X. An emerging standard for VLANs is described in xe2x80x9cDraft Standard P802.1Q/D9, IEEE Standards for Local and Metropolitan Area Networks: Virtual Bridged Local Area Networksxe2x80x9d, 1998, the disclosure of which is incorporated herein by reference.
In some cases it is desired to define a cluster of bridging-devices which perform some tasks as if the bridging-devices of the cluster comprise a single bridging-device. For example, instead of using a single large switch, a user may use a stacked switch which is formed of a cluster of switches. The cluster of switches is more modular and flexible in its attributes as compared to a single switch. In the eyes of the user, who is not interested in the number of switches through which a packet passes, the stacked switch operates like a single switch.
A simple application of the 802.1D STA on a network which includes one or more clusters, ignores the clustering and relates to the bridging-devices of the clusters as to all other bridging-devices. However, ignoring the clustering enlarges the diameter of the network and therefore lengthens the convergence time of the algorithm. The extra time required for convergence may require changes in the time-out parameters of the algorithm in all the bridging-devices of the network. In some bridging-devices it may be impossible to perform these changes. Furthermore, the 802.1D STA may create a spanning tree in which one or more of the links internal to the cluster are blocked. Such a spanning tree defeats the purpose of clustering and is therefore undesirable.
One solution to this problem is to have a single bridging-device represent all the bridging-devices of the cluster in performing the algorithm. This requires a method of assigning the single bridging-device which represents the cluster. The assigning method must take into account the possibility that the assigned bridging-device may fail and another bridging-device must be assigned. This may require reinitiating the entire spanning tree algorithm, although from the point of view of the bridging-devices outside of the cluster nothing has changed. In addition, the assigned bridging-device must receive the BPDUs from all the bridging-devices in the network and must send the BPDUs it generates to specific ports of specific bridging-devices of the cluster. Furthermore, the assigned bridging-device must have control of the status of all the bridging-devices in the cluster and must receive operational status information from all the bridging-devices in the cluster. Therefore, this solution is very complicated and undesirable.
It is an object of some preferred embodiments of the invention to provide a method for implementing a spanning tree algorithm (STA) in each of the bridging-devices of a cluster, such that the algorithm converges in substantially the same amount of time as it would if the cluster were a single bridging-device. Preferably, the implementation of the present invention is totally compatible with other implementations which appear in other bridging-devices of the network.
It is an object of some preferred embodiments of the invention to provide a method for running a STA in a network including a cluster such that the algorithm converges in substantially the same amount of time as it would if the cluster were a single bridging-device, without altering the software implementing the STA.
It is an object of some preferred embodiments of the invention to provide a method for implementing a spanning tree algorithm (STA) in each of the bridging-devices of a cluster, such that the algorithm does not block internal links of the cluster. Stated otherwise, the method does not allow formation outside of the cluster of an unblocked path between two bridging-devices of the cluster.
One aspect of some preferred embodiments of the present invention relates to having the STA code in bridging-devices within a cluster (referred to herein as cluster bridging-devices) operate as if the cluster bridging-devices are connected via a single emulated link. In addition, all the cluster bridging-devices are preferably forced to choose the same lowest cost path to the root so that none of the cluster bridging-devices chooses to block its port to the emulated link.
Preferably, the cluster bridging-devices are led to act as if they are connected by a single emulated link, by having each cluster bridging-device send BPDUs to all the cluster bridging-devices and not only to those cluster bridging-devices which are actually neighbors. The BPDUs received by a cluster bridging-device from another cluster bridging-device are provided to the STA code in the receiving cluster bridging-device (or are related to by the STA code) as arriving through a single emulated port. A convenient method for performing the above process is in defining a Virtual LAN (VLAN), which includes all the bridging-devices of the cluster, and sending the internal BPDUs along the VLAN, with an altered destination MAC-address. Preferably, the altered address comprises a broadcast or multicast address. Alternatively, the altered address comprises an unknown unicast address which does not belong to any of the devices in the network, and therefore the BPDU message is handled like a broadcast message.
By having all the bridging-devices of the cluster operate as if they are connected to a single link, the decisions made by the STA software in each of the bridging-devices of the cluster are performed under the (incorrect) assumption that all the members of the cluster are mutual neighbors.
Preferably, the cluster bridging-devices are forced to choose the same root path by assigning a zero cost to the emulated link. In addition, in case there are equal-cost paths to the root from two or more bridging-devices of the cluster, the STA code of all the bridging-devices are forced to choose the same path. Preferably, when two or more paths have equal cost, the STA chooses the path through the bridging-device which has the designated port of the emulated link.
By forcing the cluster bridging-devices to choose the same root path, it is ensured that the emulated port of each cluster bridging-device is always part of the spanning-tree. Thus, the STA code does not set the emulated port to blocking state, except possibly for a short period at startup.
In a preferred embodiment of the present invention, the cluster bridging-devices activate their ports which lead to other cluster bridging-devices immediately at startup. Thus, the ports may be used to send and receive BPDUs although the BPDUs do not have a BPDU destination address.
In some preferred embodiments of the present invention, the hold-time kept by cluster bridging-devices between sending consecutive BPDUs is reduced to half a second, rather than the standard full second. Information propagating through the network and passing through a cluster is delayed at most twice within the cluster. The information is delayed for a first hold-time at the cluster bridging-device which receives the information and for a second hold-time at any other bridging-device of the cluster, since information received by a cluster bridging-device is passed to all the other cluster bridging-devices. Using a hold-time of half a second in the cluster bridging-devices results in a total delay in the cluster of up to a second, substantially the same as the hold-time in a regular bridging-device.
Alternatively or additionally, the cluster bridging-devices use different hold-times for different BPDUs they generate and/or receive. Preferably, BPDUs sent to and/or received from other cluster bridging-devices are delayed for a hold-time of half a second while other BPDUs are delayed for a full second.
In some preferred embodiments of the present invention, the above required changes are performed without altering the software which performs the STA. Preferably, an intermediate software changes the contents of the BPDUs received by the STA software so that the software operates as desired. Alternatively, the implementation of the STA in the cluster bridging-devices is altered.
There is therefore provided in accordance with a preferred embodiment of the present invention, a method of implementing a distributed algorithm which is based on sending Bridge Protocol Data Units (BPDUs) only between neighboring bridging-devices in a network, including sending BPDUs from a first bridging-device of the network to at least one non-neighboring second bridging-device, and determining a characteristic of the network responsive to the BPDUs.
Preferably, the network includes at least one cluster having cluster member bridging-devices and sending the BPDUs includes sending BPDUs from a cluster member bridging-device to substantially all the bridging-devices in the cluster.
Preferably, sending the BPDUs includes sending the BPDUs via an emulated port leading to an emulated link which is connected to substantially all the bridging-devices in the cluster. Preferably, the emulated link has a zero cost.
Preferably, determining the characteristic of the network includes determining information on a path to a root bridging-device. Preferably, determining the information on the path to the root includes selecting the emulated port as a root port if the emulated port is not a designated port of the emulated link. Alternatively or additionally, determining the information on the path to the root includes choosing a path common to substantially all the bridging-devices in the cluster.
Preferably, sending the BPDUs includes defining a VLAN and sending the BPDUs as a broadcast along the VLAN. Further preferably, sending the BPDUs includes sending the BPDUs without substantial delay between sending by the first bridging-device and receiving by the non-neighboring bridging-device. Preferably, sending the BPDUs includes sending BPDUs with a multicast destination address. Preferably, sending the BPDUs includes sending BPDUs substantially compatible with the 802.1D standard tree algorithm.
There is further provided in accordance with a preferred embodiment of the present invention, a method of activating links which form a spanning tree in a network formed of bridging-devices and links, the network including at least one cluster of cluster-member bridging-devices, external bridging-devices not included in the cluster and external links which directly connect to at least one external bridging-device, the method including sending messages between bridging-devices of the network, determining a link suitable for being part of the spanning tree which may be activated without forming a path of activated external links between two cluster-member bridging-devices of the at least one cluster, and activating the determined link.
Preferably, determining the link includes determining a root bridging-device and a lowest cost path to the root bridging-device from each of the bridging-devices in the network, the determined link being along a lowest cost path. Preferably, determining the lowest cost path includes assuming a zero cost path between any two cluster-member bridging-devices belonging to the same cluster.
There is further provided in accordance with a preferred embodiment of the present invention, a method of activating links of a network, including determining a plurality of links which form a spanning tree of the network, and activating at least one link irrespective of the determined plurality of links.
Preferably, activating the at least one link irrespective of the determined plurality of links includes activating the at least one link before the determining of the plurality of links. Alternatively or additionally, activating the at least one link irrespective of the determined plurality of links includes activating internal links of a cluster. Further alternatively or additionally, activating the at least one link irrespective of the determined plurality of links includes activating a link which connects two different clusters. Preferably, activating the link which connects two different clusters includes activating the link although it forms a loop in the network. Alternatively or additionally, activating the at least one link irrespective of the determined plurality of links includes activating the at least one link only for some types of messages. Preferably, activating the at least one link only for some types of messages includes activating the link for messages of a specific VLAN.
Preferably, activating the at least one link only for some types of messages includes activating the link for only some types of messages for a predetermined period and thereafter activating the at least one link for substantially all types of messages.
There is further provided in accordance with a preferred embodiment of the present invention, a method of activating links which form a spanning tree in a network formed of bridging-devices and links, the network including a cluster of cluster-member bridging-devices, external bridging-devices not included in the cluster and external links which directly connect to at least one external bridging-device, including sending messages between bridging-devices of the network, waiting in each bridging-device a hold-time between sending successive messages from the bridging device, and activating a plurality of links forming the spanning tree, the total time until the spanning tree is formed is substantially equal to the time required if the cluster were replaced by a single bridging-device.
Preferably, waiting the hold-time includes waiting in at least one of the bridging-devices, different hold-times dependent on an identity of the bridging-device to which the successive messages are sent.
Further preferably, waiting the hold-time includes waiting in cluster member bridging-devices, a first hold-time for messages sent to another cluster member bridging-device and a second, different, hold-time for messages sent to bridging-devices which are not cluster members.
Alternatively or additionally, sending the messages includes sending at least some of the messages by a first bridging device responsive to receiving information in messages from other bridging devices which information induces sending the messages, and waiting the hold-time includes waiting in the first bridging-device, different hold-times for different messages dependent on the identity of the bridging-device from which the information inducing sending a particular message was received.
There is further provided in accordance with a preferred embodiment of the present invention, a method of implementing a distributed spanning tree algorithm in a first bridging-device, including receiving a spanning-tree-algorithm message from a second bridging device, generating at least one message, including a message to a third bridging device, responsive to the received message, determining a hold-time to wait before sending the generated message to the third bridging device from a plurality of available hold-times, and sending the message after the hold-time.
Preferably, generating the message includes generating a BPDU message. Preferably, determining the hold-time includes determining the hold-time responsive to the identity of the second bridging-device. Further preferably, determining the hold-time includes determining the hold-time responsive to whether the second bridging-device belongs to a common cluster with the first bridging-device.
Alternatively or additionally, determining the hold-time includes determining the hold-time responsive to the identity of the third bridging-device. Preferably, determining the hold-time includes determining the hold-time responsive to whether the third bridging-device belongs to a common cluster with the first bridging-device.
In a preferred embodiment of the present invention, determining the hold-time includes determining a standard hold-time if both the second and third bridging-devices do not belong to a common cluster with the first bridging-device. Preferably, determining the hold-time includes determining a shortened hold-time if either the second or third bridging-devices belong to a common cluster with the first bridging-device.
There is further provided in accordance with a preferred embodiment of the present invention, a cluster-member switch, including a forwarding circuit, and a processor which runs a spanning tree algorithm code which generates and receives Bridge Protocol Data Units (BPDUs) in order to configure the forwarding circuit, and an intermediate software which alters at least some of the generated or received BPDUs.
Preferably, the intermediate software changes a destination address of the generated BPDUs to a broadcast, multicast or unknown unicast address. Alternatively or additionally, the intermediate software changes a VLAN field of the generated BPDUs to a predetermined VLAN identity.
Preferably, the intermediate software changes a port indication of some received BPDUs to an emulated port identity. Further preferably, the intermediate software reports a zero cost for the emulated port.
Preferably, the intermediate software changes an indication of the identity of a bridging-device sending at least one received BPDU. Further preferably, the intermediate software changes the indication of the identity of the sending bridging-device responsive to a required selection of a root port. Preferably, the intermediate software changes the indication of the identity of the sending bridging-device to a minimal or maximal value.
Preferably, the spanning tree algorithm (STA) code includes a standard STA code.
Preferably, the cluster bridging-device includes a switch-module of a modular switch.