The present disclosure relates generally to information handling systems, and more particularly to managing failures of inter-chassis links between information handling systems.
As the value and use of information continues to increase, individuals and businesses seek additional ways to process and store information. One option available to users is information handling systems. An information handling system generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, information handling systems may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
Some information handling systems such as, for example, switch devices, use aggregation protocols that allow for the aggregation of links between multiple switch devices. For example, Virtual Link Trunking (VLT) is a proprietary, layer-2 aggregation protocol utilized by switch devices available from DELL® Inc. of Round Rock, Tex., United States, and provides for the aggregation of links to multiple logical switch devices. In some configurations, switch devices (also referred to as VLT peer devices in VLT systems) may be coupled together by an Inter-Chassis Link (ICL) (also referred to as a VLT interconnect (VLTi) in VLT systems) that may be an aggregation of links (e.g., a Link Aggregation Group (LAG)) between those switch devices and that may be used to exchange control information (e.g., VLT control information). In addition, each of the VLT peer devices may be coupled via their “VLT ports” to Top Of Rack (TOR) switch devices using port channel interfaces (also referred to as VLT LAGs) that span across the VLT peer devices, as well as coupled via “orphan ports” (non-VLT ports) to host devices in some situations. The failure of the ICL between VLT peer devices can raise several issues.
For example, when an ICL between VLT peer devices fails, the VLT peer devices are isolated from each other because the ICL is no longer available for exchanging VLT control information between the VLT peer devices. As such, functionality associated with the Address Resolution Protocol (ARP), Media Access Control (MAC), Spanning Tree Protocol (STP), and/or other control operations will be unavailable. In a specific example, ARP learning failures can lead to new layer-3 streams being blocked, as the control information exchange enabled by the ICL is needed to learn addresses associated with those layer-3 streams so that they can be forwarded properly (e.g., when a first VLT peer device receives a packet that has been incorrectly hashed and needs to be forwarded to a second VLT peer device.) In another specific example, MAC synchronization failure can lead to new layer-2 streams being flooded instead of unicasted, as when a first VLT peer device cannot access a second VLT peer device via the ICL to unicast a received packet, it may flood that packet to the network. In yet another specific example, the STP may be unable to detect loops in the VLT fabric without the control communication enabled by the ICL.
Conventional solutions to these issues associated with ICL failure typically operate to disable any VLT ports on a secondary VLT peer device when the ICL between that secondary VLT peer device and a primary VLT peer device fails. While this solution avoids some of the issues discussed above such as those due to incorrect hashing, as well as those due to the the formation of temporary loops, it results in a reduction of the availability of the VLT fabric (e.g., by 50% due to the unavailability of the VLT ports on the VLT peer device that is made unavailable), as well as a reduction in the overall bandwidth of the VLT fabric that can lead to traffic loss. Furthermore, any “east-west” traffic (e.g., traffic between the host devices connected to the VLT peer devices) may be blocked by such solutions as well.
Accordingly, it would be desirable to provide an improved Inter-Chassis Link (ICL) failure management system.