1. Field
This application relates to communication networks and, more particularly, to a method and apparatus for layer 2 loop prevention in a multi-node switch cluster.
2. Description of the Related Art
Data communication networks may include various computers, servers, hubs, switches, nodes, routers, other devices coupled to and configured to pass data to one another. These devices will be referred to herein as “network elements”. Data is communicated through the data communication network by passing protocol data units, such as frames, packets, cells, or segments, between the network elements by utilizing one or more communication links. A particular protocol data unit may be handled by multiple network elements and cross multiple communication links as it travels between its source and its destination over the network.
One way to make networks more reliable is to provide redundant connections between network elements using multiple physical links. In this scenario, although physically the links are separate, logically they may be viewed as a single trunk by upper layers of the networking stack so that a failure of one of the links forming the logical trunk will not require corrective action at the link or networking layers. Rather, the network is able to accommodate a failure of one of the physical links by causing traffic to be shifted to one of the other links interconnecting the network elements. A link that is implemented in this manner will be referred to herein as a “multi-link trunk”.
To further increase reliability, it is possible to cause the physical links implementing a multi-link trunk to be connected to different switches forming a switch cluster. A link implemented in this manner will be referred to herein as a “split multi-link trunk” or SMLT. The switches at the split end of the SMLT are interconnected using a subset of their normal Input/Output (I/O) ports. The connection between the switches of the switch cluster is referred to herein as an Inter Switch Trunk (IST) and the ports that are used to communicate between the switches of the switch cluster are referred to as IST ports. The IST may be implemented using one or more physical links or may be implemented as a logical connection over one or more intervening nodes.
All I/O ports that are not IST type are referred to as User I/O ports or User ports. Endpoint devices connect to the switch cluster using user ports. An endpoint device can connect to the switch cluster via either a single physical user port or a set of user ports bundled as one logical port. These bundled ports are referred to as a link aggregation group or Multi-Link Trunk. When a single physical user port is used, the endpoint is connected to only one of the switches within the switch cluster. When a LAG/MLT is used, the endpoint can be connected to one or more switches within the switch cluster. For optimal resiliency, each port member in the LAG/MLT is connected to separate switches within the cluster. This type of LAG/MLT connectivity is referred to as Split MLT or SMLT.
When a switch member within the switch cluster receives a broadcast packet or a unicast packet with an unknown destination address from a user port that belongs to a split LAG/MLT, the receiving switch broadcasts the packet to all the IST ports as well as to all the ports that are members of the VLAN ID of the incoming packet. Broadcasting the packet on the IST ports allows other nodes of the switch cluster to receive a copy of the packet, so that they likewise may broadcast the packet to all members of the VLAN ID of the incoming packet.
However, all other switches that receive a copy of the broadcast packet via their IST ports must forward this packet copy such that no loop is created. Since at least some of the members of the switch cluster will also have user ports connected to links that form part of the LAG/MLT on which the packet was received, care must be taken to prevent these switches from transmitting a copy of the packet out these ports. Specifically, to prevent loop formation, the switches are required to not send a copy of the packet to a user port that is also a member of the receiving LAG/MLT. Specifically, the switches of the switch cluster are required to not forward the packet back toward the endpoint device over another user port that is part of the LAG/MLT on which the packet was received.
One common way to prevent this from occurring is for each switch member to maintain two Multicast Group IDs (MGIDs) per VLAN ID. An MGID may be thought of as a bitmap, in which each bit corresponds to an outgoing port. When a multicast packet is received, an MGID is applied to the packet and used by the switch to determine which output ports should receive a copy of the packet for forwarding. To prevent loops from occurring, one MGID is used to forward packets received from a user port and a second MGID is used to forward packets received from the IST ports. In particular, if a packet is received from a user port, a first MGID is assigned to the packet that includes all VLAN ID's user port members as well as the IST ports, whereas if the packet is received from an IST port a second MGID is assigned that includes all the IST ports as well as the VLAN ID's user port members that are not a member of any split LAG/MLT. By using two MGIDs, in this manner, traffic received from a user port will be forwarded to all other user ports associated with the VLAN ID as well as over the IST, while traffic received from the IST will only be forwarded over other IST ports and over user ports that are not part of a split LAG/MLT.
While this solution prevents loops from occurring, by preventing traffic from being passed from the IST back over one of the links associated with a split MLT, it has two drawbacks. First, the number of MGIDs available within a given switch is typically limited. Using two MGIDs per VLAN ID doubles the usage of MGIDs, which may adversely affect scalability. Second, in the advent of a port failure, the MGID memberships may be modified. For example, assume two links form a split LAG/MLT across two switches of a switch cluster. Failure of one of the ports on one of the switches will cause the other link to change status from a LAG/MLT port to a normal user port. Accordingly, when two MGIDs are used, port failure and other failure information may require inter-switch control plane synchronizations and can cause large packets loss during failover and recovery. Likewise, during recovery it can be very complex to eliminate any time window for loop creation. Accordingly it would be advantageous to provide a method and apparatus for loop prevention in a multi-node switch cluster.