A. Technical Field
The present invention relates to communication networks and devices and, more particularly, to systems, devices, and methods of configuring and controlling the operation of a link fallback within a network.
B. Background of the Invention
A blade switch device known as I/O aggregator (IOA) is a zero-touch device that is a plug-and-play type of switch that allows administrators and users to connect a device within a server chassis and expect the device to obtain network connectivity without any further intervention by the administrator, such that once the device is connected to the chassis, the desired connectivity is established without necessitating the configuration of any additional protocols.
In an IOA configuration, Link Aggregation Control Protocol (LACP) link fallback is a useful feature that aids server administrators to bring up server ports during installation and when performing troubleshooting tasks. In addition, a server administrator can, for example, verify network connectivity and server parameters without requiring input from a network administrator.
Typically, during a start-up procedure, a boot protocol will automatically provision all uplink ports of the IOA into a Link Aggregation Group (LAG). However, in scenarios where no Link Aggregation Control Protocol Protocol Data Units (LACPDUs) are received on these ports, for example because the uplink (Top-of-Rack) TOR has not been configured for LAG operation yet, the LAG session is not established, and the LAG remains in an inactive state. As a consequence, based on Uplink Fault Detection (UFD), the uplink ports on the IOA are not activated, such that the state of a corresponding downlink server port interface also remains inactive. In other words, if the uplink LAG is operationally inactive, the UFD feature of the IOA negatively impacts the connectivity from the IOA to the outside world and brings down the downlink ports of the servers as well. Since the condition of the server ports is, thus, decided by the state of the uplink LAG, once the uplink ports are inactive, none of the downlink servers will have network connectivity to communicate with other network devices.
FIG. 1 shows an example of a general network operating in IOA mode. System 100 comprises server chassis 102, servers 106, network blade switch (IOA) 108, and TOR 112. Server chassis 102 typically comprises up to 32 servers 106 and IOA 108. Network connectivity between servers 106 and TOR 112 is achieved through IOA 108. Typically, four or eight uplink ports 110 are connected to TOR 112. Uplink ports 110 that connect IOA 108 to TOR 112 constitute a logical entity in which a set of links is grouped and serves as gateway to the outside world. Downlink ports 120 provide connectivity between IOA 108 and downstream servers 106.
Server chassis 102 is typically maintained by a server administrator, while TOR 112 is maintained by a network administrator. In operation, once the server administrator connects IOA 108 between server 106 and TOR 112, and the network administrator configures TOR 112, e.g., by connecting links 110 accordingly, network connectivity is established and links 110 are, at an L2 link level, are considered to be in an operationally active condition, such that the status of links 110 is discoverable by devices such as IOA 108.
By default, IOA 108 treats uplink ports 110 as LAG 114. For LAG 114 to reach an active status, a corresponding matching LAG configuration on TOR 112 is required. Assuming an LACP configuration is present only on IOA 108, but no corresponding configuration exists on TOR 112, then no LACPDUs are being received from TOR 112 and no LAG session can be established resulting in LAG 114 remaining in an inactive state. Then, if uplink ports 110 on IOA 108 are inactive, for example based on UFD, the corresponding connection between downlink server 106 ports and IOA 108 also remain in an inactive state, such that none of servers 106 has network connectivity to communicate with the outside world. In order to overcome this problem, numerous attempts have been made. However, each approach has significant shortcomings.
One traditional approach provides an LACP link fallback option that encompasses an internal implementation that brings down uplink port channel 110, removes one of links 110 (e.g., port 1) from LAG 114 on IOA 108, and then configures it as a separate, plain L2 port in order to provide network connectivity with TOR 112. However, this approach suffers from various limitations and has additional requirements that system 100 must satisfy. First, elected port 110 has to be part of all the 4K Virtual Local Area Networks (VLANs) for L2 connectivity from the server to TOR 112. Second, elected port 110 is to be made part of the UFD group to monitor and modify the operational status of the ports of server 106 based on the current uplink connectivity to TOR 112. Third, elected port 110 must be programmed as a multicast router port for IGMP snooping. Fourth, election of the fallback link and L2 port can occur only after a number of trial attempts and expiration of a timeout period before confirmation can be obtained that LACPDUs are no longer received, all of which causes undesired network delays.
Finally, since the uplink port channel is down, i.e., LACP LAG 114 goes inactive, while the port is removed, the ports of downlink server 106 will experience a flap, i.e., a change in activity state that temporarily halts or drops traffic until link 110 is re-activated. In fact, due to UFD, a drop in network connectivity occurs on each flap; port 110 will need to be moved back as part of the port-channel; and IGMP and 4K configurations will need to be removed from elected port 110, further adding to the delay and slowing down convergence.
One existing approach, known as LACP “force-up,” is a mechanism that allows administrators to statically choose a particular link. However, in IOA mode IOA 108, which is plugged into server chassis 102, will have neither preexisting information nor control over which specific uplink could be operationally active with TOR 112, such that the static approach of designating a particular port fails in circumstances in which the port is inactive or simply not connected.
In yet another existing approach, static uplink LAG 114 cannot be kept as a static LAG, as IOA 108 will have multiple uplink ports 110, and if all are made operationally active within LAG 114, this creates the possibility that downstream server 106 receiving multiple copies of a packet in case of Broadcast, Unknown unicast, and Multicast (BUM) traffic.
What is needed are tools for network architects and administrators to overcome the above-described limitations.