1. Field of the Invention
The present invention relates generally to computer networks, and more specifically, to a method and apparatus for quickly resuming the forwarding of network messages despite failures.
2. Background Information
A computer network typically comprises a plurality of interconnected entities. An entity may consist of any device, such as a computer or end station, that “sources” (i.e., transmits) or “sinks” (i.e., receives) data frames. A common type of computer network is a local area network (“LAN”) which typically refers to a privately owned network within a single building or campus. LANs typically employ a data communication protocol (LAN standard), such as Ethernet, FDDI or token ring, that defines the functions performed by the data link and physical layers of a communications architecture (i.e., a protocol stack). In many instances, several LANs may be interconnected by point-to-point links, microwave transceivers, satellite hook-ups, etc. to form a wide area network (“WAN”) or intranet that may span an entire country or continent.
One or more intermediate network devices are often used to couple LANs together and allow the corresponding entities to exchange information. For example, a bridge may be used to provide a “bridging” function between two or more LANs. Alternatively, a switch may be utilized to provide a “switching” function for transferring information between a plurality of LANs or end stations. Typically, the bridge or switch is a computer and includes a plurality of ports that couple the device to the LANs or end stations. Ports used to couple switches to each other are generally referred to as a trunk ports, whereas ports used to couple a switch to LANs or end stations are generally referred to as access ports. The switching function includes receiving data from a sending entity at a source port and transferring that data to at least one destination port for forwarding to the receiving entity.
Switches and bridges typically learn which destination port to use in order to reach a particular entity by noting on which source port the last message originating from that entity was received. This information is then stored by the bridge in a block of memory referred to as a filtering database. Thereafter, when a message addressed to a given entity is received on a source port, the bridge looks up the entity in its filtering database and identifies the appropriate destination port to reach that entity. If no destination port is identified in the filtering database, the bridge floods the message out all ports, except the port on which the message was received. Messages addressed to broadcast or multicast addresses are also flooded.
Additionally, most computer networks are either partially or filly meshed. That is, they include redundant communications paths so that a failure of any given link or device does not isolate any portion of the network. The existence of redundant links, however, may cause the formation of circuitous paths or “loops” within the network. Loops are highly undesirable because data frames may traverse the loops indefinitely. Furthermore, because switches and bridges replicate (i.e., flood) frames whose destination port is unknown or which are directed to broadcast or multicast addresses, the existence of loops may cause a proliferation of data frames that effectively overwhelms the network.
Spanning Tree Algorithm
To avoid the formation of loops, most bridges and switches execute a spanning tree algorithm which allows them to calculate an active network topology that is loop-free (i.e., a tree) and yet connects every pair of LANs within the network (i.e., the tree is spanning). The Institute of Electrical and Electronics Engineers (IEEE) has promulgated a standard (the 802.1D standard) that defines a spanning tree protocol to be executed by 802.1D compatible devices. In general, by executing the IEEE spanning tree protocol, to bridges elect a single bridge within the bridged network to be the “root” bridge. Since each bridge has a unique numerical identifier (bridge ID), the root is typically the bridge with the lowest bridge ID. In addition, for each LAN coupled to more than one bridge, only one (the “designated bridge”) is elected to forward frames to and from the respective LAN. The designated bridge is typically the one closest to the root. Each bridge also selects one port (its “root port”) which gives the lowest cost path to the root. The root ports and designated bridge ports are selected for inclusion in the active topology and are placed in a forwarding state so that data frames may be forwarded to and from these ports and thus onto the corresponding paths or links of the network. Ports not included within the active topology are placed in a blocking state. When a port is in the blocking state, data frames will not be forwarded to or received from the port. A network administrator may also exclude a port from the spanning tree by placing it in a disabled state.
To obtain the information necessary to run the spanning tree protocol, bridges exchange special messages called configuration bridge protocol data unit (BPDU) messages. FIG. 1 is a block diagram of a conventional BPDU message 100. The BPDU message 100 includes a message header 102 compatible with the Media Access Control (MAC) layer of the respective LAN standard. The message header 102 comprises a destination address (DA) field 104, a source address (SA) field 106, and a Service Access Point (SAP) field 108, among others. The DA field 104 carries a unique bridge multicast destination address assigned to the spanning tree protocol. Appended to header 102 is a BPDU message area 110 that also contains a number of fields, including a Topology Change Acknowledgement (TCA) flag 112, a Topology Change (TC) flag 114, a root identifier (ROOT ID) field 116, a root path cost field 118, a bridge identifier (BRIDGE ID) field 120, a port identifier (PORT ID) field 122, a message age (MSG AGE) field 124, a maximum age (MAX AGE) field 126, a hello time field 128, and a forward delay (FWD DELAY) field 130, among others. The root identifier field 116 typically contains the identifier of the bridge assumed to be the root and the bridge identifier field 120 contains the identifier of the bridge sourcing (i.e., sending) the BPDU 100. The root path cost field 118 contains a value representing the cost to reach the assumed root from the port on which the BPDU is sent and the port identifier field 122 contains the port number of the port on which the BPDU is sent.
Upon start-up, each bridge initially assumes itself to be the root and transmits BPDU messages accordingly. Upon receipt of a BPDU message from a neighboring device, its contents are examined and compared with similar information (e.g., assumed root and lowest root path cost) stored by the receiving bridge in non-recoverable memory. If the information from the received BPDU is “better” than the stored information, the bridge adopts the better information and uses it in the BPDUs that it sends (adding the cost associated with the receiving port to the root path cost) from its ports, other than the port on which the “better” information was received. Although BPDU messages are not forwarded by bridges, the identifier of the root is eventually propagated to and adopted by all bridges as described above, allowing them to select their root port and any designated port(s).
In order to adapt the active topology to failures, the root periodically (e.g., every hello time) transmits BPDU messages. The hello time utilized by the root is also carried in the hello time field 128 of its BPDU messages. The default hello time is 2 seconds. In response to receiving BPDUs on their root ports, bridges transmit their own BPDUs from their designated ports, if any. Thus, every two seconds BPDUs are propagated throughout the bridged network, confirming the active topology. As shown in FIG. 1, BPDU messages stored by the bridges also include a message age field 124 which corresponds to the time since the root instigated the generation of this BPDU information. That is, BPDU messages from the root have their message age field 124 set to “0”. Thus, every hello time, BPDU messages with a message age of “0” are propagated to and stored by the bridges.
After storing these BPDU messages, bridges proceed to increment the message age value every second. When the next BPDU message is received, the bridge examines the contents of the message age field 124 to determine whether it is smaller than the message age of its stored BPDU message. Assuming the received BPDU message originated from the root and thus has a message age of “0”, the received BPDU message is considered to be “better” than the stored BPDU information (whose message age has presumably been incremented to “2” seconds) and, in response, the bridge proceeds to recalculate the root, root path cost and root port based upon the received BPDU information. The bridge also stores this received BPDU message and proceeds to increment its message age timer. If the message age of a stored BPDU message reaches a maximum age value, as specified in the MAX AGE field 126, the corresponding BPDU information is considered to be stale and is discarded by the bridge.
Normally, each bridge replaces its stored BPDU information every hello time, thereby preventing it from being discarded and maintaining the current active topology. If a bridge stops receiving BPDU messages on a given port (indicating a possible link or device failure), it will continue to increment the respective message age value until it reaches the maximum age threshold. The bridge will then discard the stored BPDU information and proceed to re-calculate the root, root path cost and root port by transmitting BPDU messages utilizing the next best information it has. The maximum age value used within the bridged network is typically set by the root, which enters the appropriate value in the maximum age field 126 of its transmitted BPDU messages. Neighboring bridges similarly load this value in their BPDU messages, thereby propagating the selected value throughout the network. The default maximum age value under the IEEE standard is twenty seconds.
As BPDU information is updated and/or timed-out and the active topology is recalculated, ports may transition from the blocking state to the forwarding state and vice versa. That is, as a result of new BPDU information, a previously blocked port may learn that it should be in the forwarding state (e.g., it is now the root port or a designated port). Rather than transition directly from the blocking state to the forwarding state, ports typically transition through two intermediate states: a listening state and a learning state. In the listening state, a port waits for information indicating that it should return to the blocking state. If, by the end of a preset time, no such information is received, the port transitions to the learning state. In the learning state, a port still blocks the receiving and forwarding of frames, but received frames are examined and the corresponding location information is stored in the filtering database, as described above. At the end of a second preset time, the port transitions from the learning state to the forwarding state, thereby allowing frames to be forwarded to and from the port. The time spent in each of the listening and the learning states is referred to as the forwarding delay and is entered by the root in the FWD DELAY field 130.
As ports transition between the blocked and forwarding states, entities may appear to move from one port to another. To prevent bridges from distributing messages based upon incorrect address information, bridges quickly age-out and discard the “old” information in their filtering databases. More specifically, upon detection of a change in the active topology, a bridge periodically transmits a Topology Change Notification Protocol Data Unit (TCN-PDU) frame on its root port. The format of the TCN-PDU frame is well known (see IEEE 802.1D standard) and, thus, will not be described herein. A bridge receiving a TCN-PDU sends a TCN-PDU of its own from its root port, and sets the TCA flag 112 in BPDUs that it sends on the port from which the TCN-PDU was received, thereby acknowledging receipt of the TCN-PDU. By having each bridge send TCN-PDUs from its root port, the TCN-PDU is effectively propagated hop-by-hop from the original bridge up to the root. The root confirms receipt of the TCN-PDU by setting the TC flag 114 in the BPDUs that it subsequently transmits for a period of time. Other bridges, receiving these BPDUs, note that the TC flag 114 has been set by the root, thereby alerting them to the change in the active topology. In response, bridges significantly reduce the aging time associated with their filtering databases which, as described above, contain destination information corresponding to the entities within the bridged network. Specifically, bridges replace the default aging time of 5 minutes with the forwarding delay time, which by default is fifteen seconds. Information contained in the filtering databases is thus quickly discarded.
Although the spanning tree protocol is able to maintain a loop-free topology despite network changes and failures, re-calculation of the active topology can be a time consuming and processor intensive task. For example, re-calculation of the spanning tree following an intermediate device crash or failure can take approximately thirty seconds. In particular, a crash or failure typically wipes out the BPDU information stored by a bridge. Upon re-start, the bridge assumes itself to be the root, places all of its ports in the blocking and/or listening states and proceeds to transmit BPDU messages accordingly. It thus takes at least thirty seconds for a bridge to recover from a crash or failure (e.g., fifteen seconds in the listening state and another fifteen seconds in the learning state). During this time, message delivery is often delayed as ports transition between states, because ports in the listening and learning states do not forward or receive messages. Such delays can have serious consequences on time-sensitive traffic flows, such as voice or video traffic streams.
Furthermore, short duration failures or crashes of the spanning tree protocol at a given bridge is not an infrequent problem. For example, failures or crashes can occur due to power fluctuations, glitches in the running of the spanning tree protocol software modules, glitches running other bridge processes that cause the spanning tree process to fail, etc. Even if a bridge or just the spanning tree process is only “down” for a few seconds and thus no change in port states may be warranted, re-calculation of the spanning still requires on the order of thirty seconds. Accordingly, significant time is wasted recalculating the spanning tree following re-starts, even though no change in network topology has occurred and the ports are ultimately returned to their original states.
Virtual Local Area Networks
It is also known to segregate a computer network into a series of logical network segments. U.S. Pat. No. 5,394,402, issued Feb. 28, 1995 (the “'402 patent”), for example, discloses an arrangement for associating any port of a switch with any particular segregated network group. Specifically, according to the '402 patent, any number of physical ports of a particular switch may be associated with any number of groups within the switch by using a virtual local area network (VLAN) arrangement that virtually associates the port with a particular VLAN designation. More specifically, the '402 patent discloses a switch or hub that associates VLAN designations with its ports and further associates those VLAN designations with messages transmitted from any of the ports to which the VLAN designation has been assigned.
The VLAN designation for each port is stored in a memory portion of the switch such that every time a message is received on a given access port the VLAN designation for that port is associated with the message. Association is accomplished by a flow processing element which looks up the VLAN designation in the memory portion based on the particular access port at which the message was received. In many cases, it may be desirable to interconnect a plurality of these switches in order to extend the VLAN associations of ports in the network. The '402 patent, in fact, states that an objective of its VLAN arrangement is to allow all ports and entities of the network having the same VLAN designation to exchange messages by associating a VLAN designation with each message. Thus, those entities having the same VLAN designation function as if they are all part of the same LAN. Message exchanges between parts of the network having different VLAN designations are specifically prevented in order to preserve the boundaries of each VLAN segment or domain. For convenience, each VLAN designation is often associated with a different color, such as red, blue, green, etc.
In addition to the '402 patent, the Institute of Electrical and Electronics Engineers (IEEE) has promulgated the 802.1Q standard for Virtual Bridged Local Area Networks. The 802.1Q standard, among other things, defines a specific VLAN-tagged message format.
To provide redundancy, it is also known to install at least two bridge processing cards in an intermediate network device. The Catalyst 5500 and 6000 series of network devices from Cisco Systems, Inc. of San Jose, Calif., for example, include two bridge processing cards. Each of these cards, moreover, includes facilities for running the spanning tree protocol, including processing and memory components. If a crash or failure occurs on the currently active processing card, the back-up card takes over and begins running the spanning tree protocol. The back-up card, however, starts calculating the spanning tree protocol as if the device were just activated. That is, the back-up card transitions all ports to the blocking state and begins transmitting BPDU messages assuming it is the root. Accordingly, it typically takes on the order of 30 seconds or more for the device to begin forwarding messages again. As indicated above, such delays can seriously affect audio, video and other types of network traffic.