The present invention relates generally to computer networks, and more specifically, to a method and apparatus for quickly identifying and selecting loop-free topologies in computer networks.
A computer network typically comprises a plurality of interconnected entities. An entity may consist of any device, such as a computer or end station, that xe2x80x9csourcesxe2x80x9d (i.e., transmits) or xe2x80x9csinksxe2x80x9d (i.e., receives) data frames. A common type of computer network is a local area network (xe2x80x9cLANxe2x80x9d) which typically refers to a privately owned network within a single building or campus. LANs typically employ a data communication protocol (LAN standard), such as Ethernet, FDDI or token ring, that defines the functions performed by the data link and physical layers of a communications architecture (i.e., a protocol stack). In many instances, several LANs may be interconnected by point-to-point links, microwave transceivers, satellite hook-ups, etc. to form a wide area network (xe2x80x9cWANxe2x80x9d) or intranet that may span an entire country or continent.
One or more intermediate network devices are often used to couple LANs together and allow the corresponding entities to exchange information. For example, a bridge may be used to provide a xe2x80x9cbridgingxe2x80x9d function between two or more LANs. Alternatively, a switch may be utilized to provide a xe2x80x9cswitchingxe2x80x9d function for transferring information between a plurality of LANs or end stations. Typically, the bridge or switch is a computer and includes a plurality of ports that couple the device to the LANs or end stations. Ports used to couple switches to each other are generally referred to as a trunk ports, whereas ports used to couple a switch to LANs or end stations are generally referred to as access ports. The switching function includes receiving data from a sending entity at a source port and transferring that data to at least one destination port for forwarding to the receiving entity.
Switches and bridges typically learn which destination port to use in order to reach a particular entity by noting on which source port the last message originating from that entity was received. This information is then stored by the bridge in a block of memory referred to as a filtering database. Thereafter, when a message addressed to a given entity is received on a source port, the bridge looks up the entity in its filtering database and identifies the appropriate destination port to reach that entity. If no destination port is identified in the filtering database, the bridge floods the message out all ports, except the port on which the message was received. Messages addressed to broadcast or multicast addresses are also flooded.
Additionally, most computer networks are either partially or fully meshed. That is, they include redundant communications paths so that a failure of any given link or device does not isolate any portion of the network. The existence of redundant links, however, may cause the formation of circuitous paths or xe2x80x9cloopsxe2x80x9d within the network. Loops are highly undesirable because data frames may traverse the loops indefinitely. Furthermore, because switches and bridges replicate (i.e., flood) frames whose destination port is unknown or which are directed to broadcast or multicast addresses, the existence of loops may cause a proliferation of data frames that effectively overwhelms the network.
Spanning Tree Algorithm
To avoid the formation of loops, most bridges and switches execute a spanning tree algorithm which allows them to calculate an active network topology that is loop-free (i.e., a tree) and yet connects every pair of LANs within the network (i.e., the tree is spanning). The Institute of Electrical and Electronics Engineers (IEEE) has promulgated a standard (the 802.1D standard) that defines a spanning tree protocol to be executed by 802.1D compatible devices. In general, by executing the IEEE spanning tree protocol, bridges elect a single bridge to be the xe2x80x9crootxe2x80x9d bridge. Since each bridge has a unique numerical identifier (bridge ID), the root is typically the bridge with the lowest bridge ID. In addition, for each LAN coupled to more than one bridge, only one (the xe2x80x9cdesignated bridgexe2x80x9d) is elected to forward frames to and from the respective LAN. The designated bridge is typically the one closest to the root. Each bridge also selects one port (its xe2x80x9croot portxe2x80x9d) which gives the lowest cost path to the root. The root ports and designated bridge ports are selected for inclusion in the active topology and are placed in a forwarding state so that data frames may be forwarded to and from these ports and thus onto the corresponding paths or links of the network. Ports not included within the active topology are placed in a blocking state. When a port is in the blocking state, data frames will not be forwarded to or received from the port. A network administrator may also exclude a port from the spanning tree by placing it in a disabled state.
To obtain the information necessary to run the spanning tree protocol, bridges exchange special messages called configuration bridge protocol data unit (BPDU) messages. FIG. 1 is a block diagram of a conventional BPDU message 100. The BPDU message 100 includes a message header 102 compatible with the Media Access Control (MAC) layer of the respective LAN standard. The message header 102 comprises a destination address (DA) field 104, a source address (SA) field 106, and a Service Access Point (SAP) field 108, among others. The DA field 104 carries a unique bridge multicast destination address assigned to the spanning tree protocol. Appended to header 102 is a BPDU message area 110 that also contains a number of fields, including a root identifier (ROOT ID) field 112, a root path cost field 114, a bridge identifier (BRIDGE ID) field 116, a port identifier (PORT ID) field 118, a message age (MSG AGE) field 120, a maximum age (MAX AGE) field 122, a hello time field 124, and a forward delay (FWD DELAY) field 126, among others. The root identifier field 112 typically contains the identifier of the bridge assumed to be the root and the bridge identifier field 116 contains the identifier of the bridge sending the BPDU. The root path cost field 114 contains a value representing the cost to reach the assumed root from the port on which the BPDU is sent and the port identifier field 118 contains the port number of the port on which the BPDU is sent.
Upon start-up, each bridge initially assumes itself to the be the root and transmits BPDU messages accordingly. Upon receipt of a BPDU message from a neighboring device, its contents are examined and compared with similar information (e.g., assumed root and lowest root path cost) stored by the receiving bridge in non-recoverable memory. If the information from the received BPDU is xe2x80x9cbetterxe2x80x9d than the stored information, the bridge adopts the better information and uses it in the BPDUs that it sends (adding the cost associated with the receiving port to the root path cost) from its ports, other than the port on which the xe2x80x9cbetterxe2x80x9d information was received. Although BPDU messages are not forwarded by bridges, the identifier of the root is eventually propagated to and adopted by all bridges as described above, allowing them to select their root port and any designated port(s).
In order to adapt the active topology to failures, the root periodically (e.g., every hello time) transmits BPDU messages. The hello time utilized by the root is also carried in the hello time field 124 of its BPDU messages. The default hello time is 2 seconds. In response to receiving BPDUs on their root ports, bridges transmit their own BPDUs from their designated ports, if any. Thus, every two seconds BPDUs are propagated throughout the bridged network, confirming the active topology. As shown in FIG. 1, BPDU messages stored by the bridges also include a message age field 120 which corresponds to the time since the root instigated the generation of this BPDU information. That is, BPDU messages from the root have their message age field 120 set to xe2x80x9c0xe2x80x9d. Thus, every hello time, BPDU messages with a message age of xe2x80x9c0xe2x80x9d are propagated to and stored by the bridges.
After storing these BPDU messages, bridges proceed to increment the message age value every second. When the next BPDU message is received, the bridge examines the contents of the message age field 120 to determine whether it is smaller than the message age of its stored BPDU message. Assuming the received BPDU message originated from the root and thus has a message age of xe2x80x9c0xe2x80x9d, the received BPDU message is considered to be xe2x80x9cbetterxe2x80x9d than the stored BPDU information (whose message age has presumably been incremented to xe2x80x9c2xe2x80x9d seconds) and, in response, the bridge proceeds to re-calculate the root, root path cost and root port based upon the received BPDU information. The bridge also stores this received BPDU message and proceeds to increment its message age field 120. If the message age of a stored BPDU message reaches a maximum age value, the corresponding BPDU information is considered to be stale and is discarded by the bridge.
Normally, each bridge replaces its stored BPDU information every hello time, thereby preventing it from being discarded and maintaining the current active topology. If a bridge stops receiving BPDU messages on a given port (indicating a possible link or device failure), it will continue to increment the respective message age value until it reaches the maximum age threshold. The bridge will then discard the stored BPDU information and proceed to re-calculate the root, root path cost and root port by transmitting BPDU messages utilizing the next best information it has. The maximum age value used within the bridged network is typically set by the root, which enters the appropriate value in the maximum age field 122 of its transmitted BPDU messages. Neighboring bridges similarly load this value in their BPDU messages, thereby propagating the selected value throughout the network. The default maximum age value under the IEEE standard is twenty seconds.
As BPDU information is up-dated and/or timed-out and the active topology is re-calculated, ports may transition from the blocking state to the forwarding state and vice versa. That is, as a result of new BPDU information, a previously blocked port may learn that it should be in the forwarding state (e.g., it is now the root port or a designated port). Rather than transition directly from the blocking state to the forwarding state, ports transition through two intermediate states: a listening state and a learning state. In the listening state, a port waits for information indicating that it should return to the blocking state. If, by the end of a preset time, no such information is received, the port transitions to the learning state. In the learning state, a port still blocks the receiving and forwarding of frames, but received frames are examined and the corresponding location information is stored in the filtering database, as described above. At the end of a second preset time, the port transitions from the learning state to the forwarding state, thereby allowing frames to be forwarded to and from the port. The time spent in each of the listening and the learning states is referred to as the forwarding delay and is entered by the root in field 126.
As ports transition between the blocked and forwarding states, entities may appear to move from one port to another. To prevent bridges from distributing messages based upon incorrect information, bridges quickly age-out and discard the xe2x80x9coldxe2x80x9d information in their filtering databases. More specifically, upon detection of a change in the active topology, bridges transmit Topology Change Notification Protocol Data Unit (TCN-PDU) frames toward the root. The format of the TCN-PDU frame is well known (see IEEE 802.1D standard) and, thus, will not be described herein. The TCN-PDU is propagated hop-by-hop until it reaches the root which confirms receipt of the TCN-PDU by setting a topology change flag in all BPDUs subsequently transmitted by the root for a period of time. Other bridges, receiving these BPDUs, note that the topology change flag has been set, thereby alerting them to the change in the active topology. In response, bridges significantly reduce the aging time associated with their filtering databases which, as described above, contain destination information corresponding to the entities within the network. Specifically, bridges replace the default aging time of 5 minutes with the forwarding delay time, which by default is fifteen seconds. Information contained in the filtering databases is thus quickly discarded.
Although the spanning tree protocol is able to maintain a loop-free topology despite network changes and failures, re-calculation of the active topology can be a time consuming and processor intensive task. For example, re-calculation of the spanning tree following an intermediate device crash or failure can take approximately thirty seconds. In particular, a crash or failure typically wipes out the BPDU information stored by a bridge. Upon re-start, the bridge assumes itself to be the root, places all of its ports in the listening state and proceeds to transmit BPDU messages accordingly. It thus takes approximately thirty seconds for a bridge to recover from a crash or failure (e.g., fifteen seconds in the listening state and another fifteen seconds in the learning state). During this time, message delivery is often delayed as ports transition between states, because ports in the listening and learning states do not forward or receive messages. Such delays can have serious consequences on time-sensitive traffic flows, such as voice or video streams.
Furthermore, short duration failures or crashes of the spanning tree protocol at a given bridge is not an infrequent problem. For example, failures or crashes can occur due to power fluctuations, glitches in the running of the spanning tree protocol software modules, glitches running other bridge processes that cause the spanning tree protocol to fail, etc. Even if a bridge or just the spanning tree protocol is only xe2x80x9cdownxe2x80x9d for a few seconds and thus no change in port states may be warranted, re-calculation of the spanning still requires on the order of thirty seconds. Accordingly, significant time is wasted re-calculating the spanning tree following re-starts, even though no change in network topology has occurred and the ports are ultimately returned to their original states.
It is an object of the present invention to provide a method and apparatus for enhancing the operation of the spanning tree protocol in computer networks.
It is a further object of the present invention to provide a method and apparatus for providing fast spanning tree re-starts following intermediate device or software failures or crashes.
It is a still further object of the present invention to provide a method and apparatus for determining whether pre-crash port state information is still valid.
Briefly, the invention relates to a method and apparatus for rapidly re-starting a spanning tree protocol. According to the invention, a spanning tree entity running one or more instances of a spanning tree protocol stores a record of spanning tree parameter information and port states in a non-volatile memory. The entity may be operating at an intermediate network device having a plurality of ports. Following a re-start, the spanning tree entity verifies whether the parameter information in the non-volatile memory is still valid. If so, the spanning tree entity adopts the corresponding port states and the intermediate device resumes forwarding messages accordingly. If the spanning tree parameter information in the non-volatile memory is no longer valid, the spanning tree entity discards it along with the port states and proceeds to re-calculate the spanning tree and to transition its port states in a conventional manner. Thus, in situations where the spanning tree entity is re-started before any change in port states is warranted, the network device is able to resume the forwarding of messages without having to spend any time recalculating the spanning tree.
In the preferred embodiment, the spanning tree entity includes at least one state machine engine configured to transition the ports of the device among a plurality of states. In addition, the spanning tree entity records the state of each port and certain parameter information per instance of the spanning tree protocol in one or more novel spanning tree data structures stored at the non-volatile memory. In particular, for each instance of the spanning tree protocol, the spanning tree data structures contain at least the identifier for the root, the identifier of the root port and the root path cost. Furthermore, for each port per spanning tree instance, the data structures also contain at least the port state, the identifier of the designated bridge and designated bridge port and the root path cost from the designated bridge port. To verify the spanning tree parameter information stored at the non-volatile memory following a re-start, the spanning tree entity transmits test bridge protocol data unit (BPDU) messages from each port in order to trigger the receipt of reply BPDU messages from its neighboring devices. The entity then compares the contents of the received BPDU messages with the spanning tree parameters in the spanning tree data structures. If the information matches, then the entity xe2x80x9cknowsxe2x80x9d that no change in its port states is warranted. If the compared information conflicts, then the entity discards the stored information and proceeds to re-calculate the spanning tree in a conventional manner.
In another aspect of the invention, the network device also includes a filtering database for associating network entity addresses with destination ports. In response to an impending crash or failure, the spanning tree entity places a time stamp in the non-volatile memory as one of its last acts before crashing or failing. Upon re-start, the entity examines the time stamp and compares it with the current time to thereby determine how long the entity was down. If the entity was down longer than the time that a root bridge would leave a Topology Change Notification (TCN) flag set in its BPDU messages, the entity realizes that it may have missed a topology change. In response, the entity causes the filtering database to shorten its age-out time so as to quickly discard possibly stale information. If the entity determines that it was down for a period of time less than the time for which TCN flags would remain set, the entity leaves the age-out time associated with the filtering database unchanged.