1. Field of the Invention
The present disclosure relates generally to packet network devices such as switches, and more particularly to methods for configuring and controlling the operation of a Link Aggregation Group on two or more switches that are connected together in a stacked arrangement.
2. Description of Related Art
Link aggregation refers to a process for operating a group of physical links as if they were a single link. At least one standard for link aggregation has been promulgated by the Institute of Electrical and Electronic Engineers, e.g., in the IEEE 802.1AX-2008 standard, incorporated herein by reference.
There are a number of motivations for implementing Link Aggregation on network switches. One motivation for implementing Link Aggregation is to increase bandwidth by combining the capacity of multiple physical links together into one logical link. FIG. 1 shows two network switches, switch 100 and switch 102, connected to each other by four physical links, Links 1-4. If each of the physical links can support network traffic at a rate of 10 Mb/s, then configuring the four physical links as a single, logical link can result in a combined link bandwidth of up to 40 Mb/s. The single, logical link is referred to as a Link Aggregation Group (LAG.
Another motivation for implementing Link Aggregation in network switches is to provide link redundancy. In the event that a physical link between two network switches fails, the flow of network traffic assigned to this link can be interrupted with the loss of some or all of the packets in the flow. Referring again to FIG. 1, if link 1 fails, some or all of the traffic that was being transmitted over this physical link can be transmitted over links 2-4 subsequent to the failure of link 1.
Some basic link aggregation terminology and concepts from the IEEE 802.1-2008 standard are useful to the understanding of the embodiments. Referring to FIG. 2, several logical components of a packet network device (switch) 200 are shown, including a Media Access Control (MAC) client 210, a link aggregation sublayer 220, four individual MACs MAC1-MAC4, and four individual physical layer transponders (PHYs) PHY1-PHY4. The purpose of the link aggregation sublayer 220 is to combine a number of physical ports (represented by MACn/PHYn) logically for presentation to MAC client 210 as a single logical MAC. More or less than four physical ports are supportable by the framework, with up to the same number of MAC clients as physical ports supportable as well.
Link aggregation sublayer 220 is further subdivided into several logical components, including control parser/multiplexers (muxes) CPM1-CPM4, an aggregator 230, and aggregation control 260. Each control parser/mux CPMn couples to a corresponding MAC MACn across an IEEE 802.3 MAC service interface. For egress frames (transmitted by one of the PHYs), each control parser/mux passes frame transmission requests from aggregator 230 and aggregation control 260 to the appropriate port. For ingress frames (received by one of the PHYs), each control parser/mux distinguishes Link Aggregation Control (LAC) Protocol Data Units (PDUs) from other frames, and passes the LACPDUs to aggregation control 260, with all other frames passing to aggregator 230. It is noted that although one aggregator 230 is shown, in the particular implementation shown in FIG. 2 there could be up to four aggregators—each control parser/mux CPMn passes its non-LACPDU ingress traffic to a particular aggregator bound to MACn, or discards the non-LACPDU traffic when MACn is not bound to an aggregator.
Aggregator 230 comprises a frame collection block 240, a frame distribution block 250, and up to four (in this embodiment) aggregator parser/muxes APM1-APM4. Aggregator 230 communicates with MAC client 210 across an IEEE 802.3 MAC service interface. Aggregator 230 also communicates with each control parser/mux CPMn that corresponds to a MAC MACn bound to aggregator 230.
Frame collection block 240 comprises a frame collector 242 and a marker responder 244. The frame collector 242 receives ordinary traffic frames from each bound MAC MACn and passes these frames to MAC client 210. Frame collector 242 is not constrained as to how it multiplexes frames from its bound ports, other than it is not allowed to reorder frames received on any one port. The marker responder 244 receives marker frames (as defined in IEEE 802.3-2005) from each bound port and responds with a return marker frame to the port that received the ingress marker frame.
Frame distribution block 250 comprises a frame distributor 252 and an optional marker generator/receiver 254. The frame distributor 252 receives ordinary traffic frames from MAC client 210, and employs a frame distribution algorithm to distribute the frames among the ports bound to the aggregator. Frame distributor 252 is not constrained as to how it distributes frames to its bound ports, other than that it is expected to supply frames from the same “conversation” to the same egress port. Marker generator/receiver 254 can be used, e.g., to aid in switching a conversation from one egress port to another egress port. Frame distribution 250 holds or discards any incoming frames for the conversation while marker generator/receiver 254 generates a marker frame on the port handling the conversation. When a return marker frame is received, all in-transit frames for the conversation have been received at the far end of the aggregated link, and frame distribution may switch the conversation to a new egress port.
Aggregator parser/muxes APM1-APM4, when bound to one of the physical ports, transfer frames with their corresponding control parser/mux CPM1-CPM4. On transmit, aggregator parser/muxes APM1-APM4 takes egress frames (ordinary traffic and marker frames) from frame distribution 250 and marker responder 244 and supply them to their respective bound ports. For ingress frames received from their bound port, each aggregator parser/mux distinguishes ordinary MAC traffic, marker request frames, and marker response frames, passing each to frame collector 242, marker responder 244, and marker generator/receiver 254, respectively.
Aggregation control 260 is responsible for configuration and control of link aggregation for its assigned physical ports. Aggregation control 260 comprises a link aggregation control protocol (LACP) handler 262 that is used for automatic communication of aggregation capabilities and status among systems, and a link aggregation controller 264 that allows automatic control of aggregation and coordination with other systems. The aggregation control 260 also functions to maintain per link information in forwarding tables on each of the line cards comprising a switch. This information can include such things as the identity of a LAG to which the link belongs, the identity of the aggregator to which the LAG belongs, and the status of interaction between the frame collection function or the frame distribution function of the aggregator and the link. More specifically, the aggregation control 260 communicates the state information to an aggregator on a line card, and the aggregator on the line card uses this information to maintain information in the forwarding tables necessary to forward network information over a LAG. The aggregation control 260 also receives information from a layer-2 protocol (OSPF for instance) running in the control module relating to the state of each link assigned to the LAG. So, in the event that one link assigned to the LAG fails, the control 260 becomes aware of this failure and communicates this to the aggregator which then reprograms the forwarding tables accordingly.
The frames exchanged between LACP 262 and its counterparts in peer systems each contain a LAC PDU (Protocol Data Unit), e.g., with a format 300 as shown in FIG. 3. The actor information and partner information contained in the LACPDU structure are used to establish and break link aggregations, with the “actor information” pertaining to the system sending the LACPDU, and the “partner information” indicating the state of the system receiving the LACPDU, as understood by the system sending the LACPDU.
The actor and partner information include a system ID, system priority, key, port ID, port priority, and state flags. The system ID is a globally unique identifier such as a MAC address assigned to the system. The system priority is a priority value assigned by the system administrator to the system. The key is a value assigned to the port by its system, and may be static or dynamic. The key is the same for each port on the system that is capable of aggregation with other ports transmitting that key. The port ID is a port number assigned by the system administrator to each port, and should be unique on the system. The port priority is a priority value assigned by the system administrator to the port, and should be unique among ports that are potentially aggregable. The state flags include LACP_Activity, LACP_Timeout, Aggregation, Synchronization, Collecting, Distributing, Defaulted, and Expired, and are defined as specified in IEEE 802.3-2005. In particular, the Synchronization bit is set TRUE when the link has been allocated to the correct Link Aggregation Group (LAG), the group has been associated with a compatible Aggregator, and the identity of the LAG is consistent with the System ID and Key transmitted by the port.
In operation, peered systems exchange LACPDUs to determine whether multiple ports that are aggregable to each other appear on both ends of the same link. To accomplish this, both endpoints calculate a Link Aggregation Group Identifier (LAG ID) for each participating port. The LAG ID combines actor and partner system priorities, system IDs, and keys. When the LAG IDs on two or more aggregable ports match, those ports are automatically assigned to the same LAG group, as long as both link endpoint systems make the aggregation.
Single chassis packet switches, similar to the switch 200 described above with reference to FIG. 2, can only support a limited number of line cards and ports. Some vendors provide special link cards or a “back-end” port that can be used to connect or stack two separate switches together to form a stacked system that in at least some ways acts with peer devices like a single larger chassis. With two chassis stacked in this manner, when a packet arrives at one of the switches that must egress on the other switch, instead of processing the packet normally the first switch places the packet in a special proprietary wrapper and hands the packet off to the other switch using the proprietary connection. The second switch reads the wrapper, removes it, and processes the packet.
U.S. patent application Ser. No. 12/828,514 entitled “Multiple Chassis Stacking using Front-End Ports” which was filed Jul. 1, 2010 and assigned to Force 10 Networks, Inc. describes a method and apparatus for creating a single, logical chassis out of two fully functional physical chassis, linked only through their normal front-end traffic ports. A link aggregation group (LAG) with enough member ports to support anticipated cross-platform traffic is set up between the two chassis, and route processing managers on the two chassis negotiate to determine a stack master. The stack master configures line cards on each chassis for appropriate behavior in each traffic situation, as is described in the application. Such behavior generally using the same types of lookup and forwarding operations already employed in single-chassis operation, but with instructions that vary, sometimes per line card, depending on the ingress and egress chassis of a packet. Extra processing is largely avoided, and some unique features such as a single LAG with member ports on both chassis, even further reduce cross-chassis traffic and to reduce the likelihood that the entire stacked system will fail due to the failure of any one line card.
One motivation for connecting two or more network switches in a stacked switch configuration can be to increase the number of ports that can be controlled by a single management/control plane. From the perspective of the packet network, the stacked switches operate as a single, logical switch.
Another motivation for connecting two or more network switches in a stacked configuration is to provide link redundancy to network devices (switches) that are neighbors to the stacked switches. FIG. 3 illustrates a packet network topology comprised of two switches 300 and 302 in a stacked arrangement 308. Each of the stacked switches 300 and 302 are connected to each of two other network switches 304 and 306 by different physical links. Network switch 300 is connected to switch 304 and to switch 306 over links 1 and 3 respectively, and network switch 302 is connect to switch 304 and to switch 306 over links 2 and 4 respectively. In order to provide for link redundancy between each of the network switches 304 and 306, a LAG-A is established between switch 304 and the stacked switches 308 that includes links 1 and 2, and a LAG-B is established between switch 306 and the stacked switches 308 that includes links 3 and 4.
Typically, each network switch is limited in the number of LAGs it can support. This limitation can be different with each switch and is generally determined by the architecture of a device (packet processor) that operates to process network information ingressing to and egressing from the switch. Specifically, packet processors offered by different vendors typically include more or less memory dedicated to storing information used to by the packet processor to forward the network information to its correct destination. Such forwarding information can include, among other things, destination address information, egress port information, VLAN ID information, ingress port information, MAC address information, LAG ID information and LAG membership information. The limitation to the number of LAGs supported by a switch cannot be increased by merely arranging multiple switches in a stacked configuration. The total number of LAGs that can be supported by a stacked switch is not the sum of the number of LAGs that can be supported by each of the switches comprising the stacked switch. For instance, if each of the switches 300 and 302 in FIG. 3 includes 300 ports and can support up to 128 LAGs, then the stacked switch arrangement 308 has a total of 600 ports, but still can only support up to 128 LAGs. Merely adding additional switches to a stacked switch does not result in the stacked switch being able to support additional LAGs beyond the number that any single switch can support. This unfortunately limits the number of redundant links that can be supported in the network topology of FIG. 3. This limitation in the number of LAGs supported by a stacked switch is a particular problem in Data Center architectures, as Top of Row (TOR) switches can connect to two or more core switches using a LAG.