1. Field of the Invention
The present invention relates generally to computer networks, and more specifically to the prevention, reduction, and elimination of count-to-infinity problems in Ethernet networks.
2. Brief Description of the Related Art
Computer networks typically comprise a plurality of interconnected computers, computer systems or other devices. The term “device” in the present application is used broadly to refer to all devices, components, entities, or anything else connected to a computer network. A common type of computer network is referred to as a local area network, or “LAN.” A LAN typically is a private network within a building, campus, etc. Computer networks such as LANs typically employ a data communication protocol using protocol messages. Multiple LANs may be interconnected with one another via, for example, point-to-point links, microwave transceivers, satellite hookups, or other known methods. One or more intermediate network devices, such as bridges or switches, may be used to couple LANs together and allow the devices on the LANs to exchange information with each other. A bridge or a switch may, for example, be a computer having a plurality of ports that couple the device to the LAN or to an end station. The switching function includes receiving data from a sending device at a source port and transferring that data to at least one destination port for forwarding to the receiving device.
Switches and bridges typically learn which destination port to use in order to reach a particular device by noting on which source port the last message originating from that device was received. This information is then stored by the bridge in a block of memory. Thereafter, when a message addressed to a given device is received on a source port, the bridge looks up the device in its memory and identifies the appropriate destination port to reach that entity. If no destination port is identified from memory, the bridge floods the message out to all ports, except the port on which the message was received.
Most computer networks include redundant communication paths so that failure of any given link does not isolate any portion of the network. The existence of such redundant communication paths, however, may cause the formation of circular paths referred to as “loops” within the network and may result in count-to-infinity problems. Loops can cause problems in networks because data frames in loops may continue indefinitely. Additionally, because switches and bridges replicate data frames whose destination ports is unknown or which are directed to multicast addresses, the existence of loops may cause proliferation of data frames to such an extent that the network becomes overwhelmed.
At the present time, Ethernet is the dominant networking technology in environments ranging from home networks, office networks, data center networks, campus networks, and is becoming more popular in metropolitan-area networks as well. By far the most important reasons for Ethernet's dominance are its high performance-to-cost ratio and its ubiquity. Virtually all computer systems today have an Ethernet interface built in. Ethernet is also fully plug-and-play, requiring no error-prone manual configuration. Moreover, because Ethernet is a layer 2 technology, many layer 3 protocols can easily co-exist on Ethernet networks. Even though Ethernet has all of these compelling benefits, mission-critical applications also demand high network dependability.
Ethernet has a unique combination of features enabling plug-and-play operation. First, Ethernet requires no manual interface address configuration for switches or end systems. Ethernet addresses are simple globally unique identifiers, usually assigned by hardware manufacturers, that do not have any special hierarchical structure for packet forwarding. To deliver a packet from a source to an unknown destination address, Ethernet switches flood the packet throughout the network to ensure it reaches its destination. However, flooding is highly inefficient. Fortunately, an Ethernet switch can observe the flooding of a packet to determine the switch port at which a packet from a particular source address S arrives. This switch port then becomes the outgoing port for packets destined for S and so no flooding is required to deliver future packets to S. Thus, an Ethernet network dynamically discovers the topological locations of interface addresses and dynamically builds packet forwarding tables accordingly. This mechanism is called address learning.
To support the flooding of packets for unknown destinations and address learning, an Ethernet network also dynamically and distributedly computes a cycle-free active forwarding topology using the Rapid Spanning Tree Protocol (RSTP). This active forwarding tree is a logical overlay on the underlying physical topology. Cycles in the underlying physical topology provide redundancy in the event of a link or switch failure. It is critical to not allow cycles in the active forwarding topology. Otherwise, first of all, flooded packets will persist indefinitely in the network cycle causing congestion. Secondly, address learning will not function correctly because a switch may receive packets from a source S via multiple switch ports, making it impossible to build the forwarding table correctly.
RSTP is the current standard Ethernet spanning tree protocol. The Spanning Tree Protocol (STP) is the predecessor of RSTP. The spanning tree protocols are link management protocols that are designed to allow for redundancy while preventing loops in the active topology. Redundancy is important for fault tolerance to link or bridge failures. However, having loops in the active topology can result in packets persisting in the network as Ethernet packets do not have a time-to-live field. The Spanning Tree Algorithm (STA) builds a unique spanning tree out of the network of bridges. The tree is rooted at the bridge with the lowest ID in the network and spans all bridges in the network. A path from any bridge to the root bridge is guaranteed to be of minimum cost. Traffic is forwarded along these paths within the tree. Since the active topology is a tree, it is by definition loop free. Redundant links are kept in a standby mode (blocked). The STA enables these standby links whenever it detects some failure or a change in the cost of some tree path motivating a reconfiguration of the tree.
Protocol messages such as Bridge Protocol Data Units (BPDUs) are used by bridges to exchange information regarding their state. The STA uses the BPDU information to elect the root bridge. Each bridge uses the information conveyed in BPDUs to choose the port which lies on the shortest path to the root bridge (its root port) and the ports that connect it to its children in the spanning tree (its designated ports). The root port is the port that has received the best information for a path to the root. Other ports in the bridge send BPDUs with their path cost to the root to other bridges in the network. Ports that receive inferior information than the one they are sending are chosen to be designated ports. Bridges send a BPDU every HelloTime which acts as a heartbeat. A BPDU has a message age that represents the age of the message and is capped by a MaxAge value, when the message age exceeds the MaxAge value the message gets dropped. Each bridge port caps the number of BPDUs it can transmit every second. It has a counter (TxCount) that keeps track of the transmitted BPDUs, if the counter reaches Transmit Hold Count (TxHoldCount) no more BPDUs can get transmitted during the current second. The counter is decremented by one every second.
A topology change can result in the invalidation of a bridge's learned address location information. This is because a topology change can result in the reconfiguration of the spanning tree which may lead to some network segments to appear as if they have moved from one bridge's perspective. This requires the flushing of the forwarding database that caches stations' locations. STA implements this by making a bridge send a Topology Change (TC) message whenever a port is becoming a part of the active topology, it sends such message on all its ports participating in the active topology. A bridge receiving a TC message forwards it on all its ports participating in the active topology except the one it has received the TC message on. Whenever a bridge sends a TC message on one if its ports, it flushes the cached forwarding information at that port.
The following two sections present the differences between the two spanning tree protocols—the Spanning Tree Protocol (STP) and its successor Rapid spanning Tree Protocol (RSTP)—that are relevant here.
Spanning Tree Protocol (STP)
In the event of a topology change, STP relies on timers before switching ports to the forwarding state. This is to ensure that the new information has been spread across the network. The total waiting time can get up to 50 seconds. This conservative value for the waiting time is to protect against prematurely switching a port to the forwarding state resulting in a forwarding loop. Whenever a bridge gets disconnected from the root bridge, it waits until the information cached at its root port is aged out, then it starts accepting other BPDUs from other bridges to discover another path to the root.
In STP the root bridge sends a hello message every HelloTime. Other bridges relay such messages to their children after adjusting the appropriate fields (ex: message age, path cost, . . . ). A bridge losing a hello message could be due to a problem anywhere along the path to the root.
Rapid Spanning Tree Protocol (RSTP)
RSTP tries to overcome the shortcomings of STP's long convergence time by introducing few optimizations that intend to reduce the convergence time without affecting the functionality of the protocol. For the purpose of understanding the present invention, subset of these optimizations is presented. RSTP relies on a handshake between bridges to transition a designated port into the forwarding state rather than waiting for timers. Unlike in STP where a bridge just forwards the root's BPDU messages, in RSTP every bridge sends a BPDU every HelloTime that acts as a heartbeat indicating the liveness of such bridge. This allows for better detection of failed components. If a bridge misses three consecutive BPDU messages on some port, it assumes that the connection has failed and ages out the information at such port. Physical link failures are detected even faster. If a bridge detects failure at its root port, it falls back immediately to an alternate port if it has any. An alternate port is a port with an alternate path to the root bridge (Cisco Systems, Inc., “Understanding Rapid Spanning Tree Protocol (802.1w),” http://www.cisco.com/warp/public/473/146.html.) A port is chosen to be either an alternate port or a backup port if it is not the root port and receives superior information than the one it is transmitting. In a switched Ethernet a backup port is a port directly connected to another port on the same bridge. For RSTP a topology change event is when a port that was not forwarding switches to be forwarding.
The dependability of Ethernet therefore heavily relies on the ability of RSTP to quickly recompute a cycle-free active forwarding topology upon a partial network failure. Some pathological causes for forwarding loops in RSTP have been previously documented by Cisco (Cisco Systems, Inc., “Spanning Tree Protocol Problems and Related Design Considerations,” http://www.cisco.com/warp/public/473/16.html.) However, even under normal operation, RSTP may exhibit a “count-to-infinity” problem which can allow a temporary forwarding cycle to exist in the network for tens of seconds. During this period, network congestion may sharply increase and packets may be forwarded incorrectly. This highly unacceptable behavior was mentioned by Myers et al. (A. Myers, T. S. E. Ng, and H. Zhang, “Rethinking the Service Model: Scaling Ethernet to a Million Nodes,” Third Workshop on Hot Topics in networks (HotNets-III), March 2004.)
A temporary forwarding loop may form when there is a cycle in the physical topology and that this cycle loses connectivity to the root bridge due to a network failure. FIG. 1 gives a simple example of a vulnerable topology having bridges 110, 120, . . . 170. The path between bridge 110 (the root) and bridge 120 does not have to be a direct link. A failure in this path can result in a count-to-infinity situation in RSTP that may create a temporary forwarding loop.