Link aggregation relates to combining various network connections in parallel to increase throughput, beyond what a single connection could sustain, and to provide redundancy between the links. Link aggregation including the Link Aggregation Control Protocol (LACP) for Ethernet is defined in IEEE 802.1AX, IEEE 802.1aq, IEEE 802.3ad, as well in various proprietary solutions. IEEE 802.1AX-2008 and IEEE 802.1AX-2014 are entitled Link Aggregation, the contents of which are incorporated by reference. IEEE 802.1aq-2012 is entitled Shortest Path Bridging, the contents of which are incorporated by reference. IEEE 802.3ad-2000 is entitled Link Aggregation, the contents of which are incorporated by reference. Multi-Chassis Link Aggregation Group (MC-LAG), is a type of LAG with constituent ports that terminate on separate chassis, primarily for the purpose of providing nodal redundancy in the event one of the chassis fails. The relevant standards for LAG do not mention MC-LAG, but do not preclude it. MC-LAG implementation varies by vendor.
LAG is a technique for inverse multiplexing over multiple Ethernet links, thereby increasing bandwidth and providing redundancy. IEEE 802.1AX-2008 states “Link Aggregation allows one or more links to be aggregated together to form a Link Aggregation Group, such that a MAC (Media Access Control) client can treat the Link Aggregation Group as if it were a single link.” This layer 2 transparency is achieved by LAG using a single MAC address for all the device's ports in the LAG group. LAG can be configured as either static or dynamic. Dynamic LAG uses a peer-to-peer protocol for control, called Link Aggregation Control Protocol (LACP). This LACP protocol is also defined within the 802.1AX-2008 standard the entirety of which is incorporated herein by reference.
LAG can be implemented in multiple ways, namely LAG N and LAG N+N/M+N. LAG N is the load sharing mode of LAG and LAG N+N/M+N provides the redundancy. The LAG N protocol automatically distributes and load balances the traffic across the working links within a LAG, thus maximizing the use of the group if Ethernet links go down or come back up, providing improved resilience and throughput. For a different style of resilience between two nodes, a complete implementation of the LACP protocol supports separate worker/standby LAG subgroups. For LAG N+N, the work links as a group will failover to the standby links if any one or more or all of the links in the worker group fail. Note, LACP marks links as in standby mode using an “out of sync” flag.
Advantages of Link Aggregation include increased throughput/bandwidth (physical link capacity*number of physical links), load balancing across aggregated links and link-level redundancy (failure of a link does not result in a traffic drop; rather standby links can take over as active role for traffic distribution). One of the limitations of Link Aggregation is that it does not provide node-level redundancy. If one end of a LAG fails, it leads to a complete traffic drop as there is no other data path available for the data traffic to be switched to the other node. To solve this problem, “Multi-Chassis” Link Aggregation Group (MC-LAG) is introduced, that provides a node-level redundancy in addition to link-level redundancy and other merits provided by LAG.
MC-LAG allows two or more nodes (referred to herein as a Redundant Group (RG)) to share a common LAG endpoint (Dual Homing Device (DHD)). The multiple nodes present a single logical LAG to the remote end. Note that MC-LAG implementations are vendor-specific, but cooperating chassis remain externally compliant to the IEEE 802.1AX-2008 standard. Nodes in an MC-LAG cluster communicate to synchronize and negotiate automatic switchovers (failover). Some implementations may support administrator-initiated (manual) switchovers.
The multiple nodes in the redundant group maintain some form of adjacency with one another, such as the Inter-Chassis Communication Protocol (ICCP). Since the redundant group requires the adjacency to operate the MC-LAG, a loss in the adjacency (for any reason including a link fault, a nodal fault, etc.) results in a so-called split-brain problem where all peers in the redundant group attempt to take an active role considering corresponding peers as operationally down. This can lead to the introduction of loops in the MC-LAG network and result in the rapid duplication of packets.
Thus, there is a need for a solution to the split-brain which is solely implemented between the RG members that are interoperable with any vendor supporting standard LACP on the DHD and which does not increase switchover time.