As known in the art, a “stackable switch” is a network switch that can operate independently as a standalone device or in concert with one or more other stackable switches in a “stack” or “stacking system.” FIG. 1A illustrates the front face of an exemplary stackable switch 100. As shown, the front face includes a set of data ports 102 (denoted by the letter “D”), a set of stacking ports 104 (denoted by the letter “S”), and an out-of-band management port 106 (denoted by the letter “M”). Data ports 102 are operable for connecting stackable switch 100 to one or more hosts and/or data networks. Stacking ports 104 are operable for linking stackable switch 100 to other stackable switches in the same stacking system/topology. Stacking ports 104 can be dedicated ports (i.e., ports designed specifically for stacking) or high bandwidth data uplink ports that operate in a stacking mode. Out-of-band management port 106 is operable for connecting stackable switch 100 to a separate terminal device, such as a laptop or desktop computer. Once connected, an administrator can use the terminal device to access the management console of stackable switch 100 and perform various switch management functions.
FIG. 1B illustrates an exemplary stacking system 150 comprising stackable switches 100(1), 100(2), and 100(3), each of which is substantially similar to stackable switch 100 of FIG. 1A. As shown, stackable switches 100(1)-100(3) are linked together via their respective stacking ports, thereby establishing a data path between the switches for forwarding network traffic. With this stack configuration, switches 100(1)-100(3) can behave as a single, logical switch having the combined data port capacity of the individual switches.
In a system of interconnected devices like stacking system 150, port failures can occasionally occur that affect the ability of system members to communicate with each other. For instance, in FIG. 1B, a failure may occur with respect to stacking port 152 of stackable switch 100(1) that prevents port 152 from sending data packets to and/or receiving data packets from port 154 of stackable switch 100(2). Generally speaking, if this failure causes stacking port 152 to transition from an “UP” status to a “DOWN” status, stackable switch 100(1) can detect that the port is down and can re-route traffic for the port on an alternative link/path to stackable switch 100(2) (e.g., through stackable switch 100(3)).
However, in some failure scenarios, a port may fail in a manner that does not cause its status to change. For example, ports that support speeds of 10 Gigabits per second (Gbps) or higher typically have sophisticated electronic and/or optical components and firmware logic. Further, such ports are internally connected to a packet processor that handles queuing, makes wire-speed forwarding decisions, and so on. A failure that arises due to a component/firmware problem or due to an issue with a connected packet processor may prevent the affected port from sending or receiving packets, but may nevertheless cause the port remain in an UP status. This, in turn, can prevent the switch that owns the port from detecting the failure, potentially leading to packet mis-forwarding, packet black holes, and other conditions that can result in a partial or complete network breakdown.
There are certain existing protocols, such as Unidirectional Link Detection Protocol (UDLD), that can mitigate the issue above by determining when a bidirectional link has become unidirectional or nonfunctional and marking the end ports of the link as being logically down. However, these existing protocols generally operate with respect to a single link at a time. For example, in stacking system 150 of FIG. 1B, a separate instance of UDLD would need to be run on each stacking link interconnecting stackable switches 100(1)-100(3), and each protocol instance would only be able to communicate, between the two end switches of the link, information regarding the link's end ports. This means that UDLD and other similar protocols do not have a holistic view of the port statuses of all of the devices in a system, which limits their ability to detect different types of port/link problems. For instance, UDLD cannot distinguish between (1) a scenario where a bidirectional link has become unidirectional (due to, e.g., a failure of a single end port) and (2) a scenario where the same bidirectional link has become completely nonfunctional (due to, e.g., failures of both end ports or a cable failure). Further, UDLD cannot detect a problem where one end port of a link remains up while the other end port of the link has gone down (referred to as the “one-end-up, one-end-down” problem).