In packet data networks, Link Aggregation Group “LAG” and in particular Multi-Chassis “MC-LAG” are widely used to improve network resiliency and/or bandwidth availability between network devices. A Layer 2 (L2) LAG comprises multiple links directly connecting physical network interfaces of two network devices in a network. A load balancing decision is performed at the forwarding plane of a network device, to distribute traffic across the different links of the LAG. Thus, a LAG combines a number of physical network interfaces (or ports) together to make a single high-bandwidth data path, so as to implement the traffic load sharing among the member network interfaces in the group and to enhance the connection reliability. An MC-LAG refers to a LAG that directly connects one network device with two or more other network devices.
Internet Engineering Task Force (IETF), Request for Comments (RFC) 7130 entitled “Bidirectional Forwarding Detection (BFD) on Link Aggregation Group (LAG) Interfaces,” defines a protocol enabling failure detection of a link member of a LAG. The use of BFD for failure detection over a LAG provides a fast failure detection even in the absence of Link Aggregation Control Protocol (LACP) (which is part of an Institute of Electrical and Electronics Engineers (IEEE) specification (802.3ad) and is typically the protocol used to detect failure in a LAG). IETF RFC 7130, enables the verification of link continuity for every member link of the LAG using BFD. The approach taken in IETF RFC 7130 is to run an Asynchronous mode BFD session over each LAG member link and use BFD to control whether a LAG member link should be part of the Layer 2 load-balancing table of the LAG interface. Each Asynchronous mode BFD session that runs over a LAG member link can be referred to as a “micro-BFD session.”
However, IETF RFC 7130 does not address the case of applying BFD to an MC-LAG environment, in which a network device is coupled with at least two separate network devices through the MC-LAG. Current approaches rely on LACP to detect link failure in an MC-LAG environment. However since LACP has a minimal timer which can be set at 1 second, thus the failure detection convergence time in a system that runs LACP is generally at least 3 seconds. Thus at best failure detection performed through LACP is in the single seconds range.