A network segment typically consists of multiple interconnected switches arranged in some sort of hierarchy. These switches at different points in the segment must meet different requirements, and are thus typically of multiple types. At the core of the segment, high throughput modular switches with high speed interfaces may be required. At the edges, the most important factors may be reduced physical size, lower power consumption, tolerance for a wider range of environment conditions, or support for a variety of interface types for connecting to a variety of external devices. Tradeoffs between redundancy and cost are often different at different locations in the network segment.
Conventionally, each switch in the network segment was an independently management standalone device, with its own standalone redundancy. Making the switches standalone and fully manageable increases the cost and complexity of the switch. Managing the network segment requires not just managing the segment's external ports (through which the network segment provides the services that give it value) but managing the switches and the interconnections between those switches. Network segment expansion requires more switches, more interconnections, and more network management overhead.
In an effort to contain this growth in costs, various architectures have been developed to aggregate multiple switches into a single logical switch, managed as a single entity and with a collective redundancy design. Among these architectures are port extenders, Virtual Switching System (VSS) and switch stacks.
Port extension is a hierarchical (controller/satellite) aggregation technique in which certain switches are treated as distributed or “extended” line cards. Conventional modular switch architectures relied on the use of one or redundant supervisor modules coupled to one or more port termination modules (i.e., “line cards”) through specialized (and often proprietary) backplane circuitry. With port extension, conventional line card functionality is provided by a separate “satellite” switch that is remotely connected back to the “controller” switch, e.g. over one or more Ethernet links or over some underlying network. The satellite switch is treated as a remote line card of the controller switch. The satellite switch and its connectivity with the controller switch are handled automatically, just as is done with a local linecard and its connection to the backplane.
VSS, switch stacks, and the like are symmetrical (primary/secondary) aggregation techniques, in which two or more physical switches are joined together by some set of links or some underlying network, and where one of the switches acts as the “primary” switch, representing the logical switch to external entities and managing the others physical switches. These approaches may differ in the details of how the physical switches are connected as the capacity, scalability, reliability and other aspects of this interconnection is a key aspect of the aggregated switch (e.g. stackable switches are often connected in a ring topology, possibly with special hardware), and may also differ in terminology (e.g. “stack master” may be used for the current primary switch and “ordinary stack member” may be used for a current secondary switch). These symmetrical approaches have in common, however, that multiple, often all, of the physical switches are capable of acting as the primary switch, providing a level of redundancy to the logical switch design. In the event the current primary switch experiences some event (power outage, fault) that prevents its operation, the other physical switches can automatically detect this outage and one of them can automatically assume the role of primary physical switch, allowing the logical switch to continue to function.
For the symmetric aggregation techniques (VSS, stacking, etc.) described here, a subtlety in their redundancy architecture must be considered. Typically, the switches are connected via a dedicated communication link, which provides an interface for the switches to detect failure events in the others. If the primary switch fails, one of the secondary switches can detect the failure and maintain network operation by substituting as the new primary switch. In some cases, irregular conditions that do not amount to a failure of the primary switch may cause the secondary switch to attempt to assert itself as the primary switch. For example, in some situations a failure in the communication connection between the primary switch and one of the secondary switches will be perceived by the secondary switch as a failure in the current primary switch, potentially resulting in the secondary switch asserting itself to its neighbor switches as the new primary switch. Because the triggering event was a communication link failure rather than a failure in the original primary switch, the original primary switch still asserts itself as the primary switch. The condition where two physical switches from the same logical switch both present themselves as the primary switch may create a “split-node” conflict (known variously as “stack split”, “split brain”, “dual master”, and similar terms, depending on the aggregation technique in use) in the network segment. The failure in the logical switch's internal communication connection need not be of extended duration (e.g., in order to minimize network interruption, this failover process is typically configured to be very sensitive). That is, the system is configured to provide redundancy when almost any irregularity by the primary switch is detected on the network.
Finally, these aggregation techniques are not mutually exclusive, and may be combined. The hierarchical port extender architecture describes a logical port-extended switch that aggregates a controller and multiple satellites, but the controller node may itself be a logical switch (that aggregates multiple symmetric physical switches and looks like a single controller to the satellites), and/or any satellite node may itself be a logical switch (that aggregates multiple symmetric physical switches and looks like a single satellite node to the controller node). Thus, a controller node consists of possibly multiple controller switches and a satellite node consists of possibly multiple satellite switches.
In the port-extended network described here, split-node conditions, such as those described above, may occur either in the controller node (split-controller) or in a satellite node (split-satellite). In order for the port-extended logical system to function properly, such conditions should be resolved in a timely and orderly fashion. The presently disclosed systems and methods for detecting and resolving split-node conditions in port-extended networks of aggregated nodes are directed to overcoming one or more of the problems set forth above and/or other problems in the art.