Switches that are used as part of a network are well known. A switch is defined as any entity, in the network, that provides some ability to transfer input traffic to some outputs, possibly with some actions being performed on the traffic, such as merging, dropping, policing, etc . . . Switches include routers, ATM switches, Fibre Channel switches, switched Gigabit Ethernet, SONET switches or optical switches. The term “cabling” as used herein denotes any means of connecting entities in the network, be they fibers, coax cables, wireless links or cables within a single hardware unit that encompasses several logical units. The term “connectors” is used to denote any means of transmitting on cables. These connectors may be themselves part of switches, may be located at different nodes, and may have a variety of functions and capabilities beyond mere transmission.
The problem of recovery with redundant switches in a network is a problem that arises in most networks that seek to be failure-redundant, from enterprise networks to backbone networks. Redundancy is necessary to maintain operation even in the case of failure. Redundancy is desirable in the case of cabling failure, of connector failures, or of switch failures. There are several known approaches to recovery that are used in enterprise networks and networks using similar architectures to enterprise networks. These architectures are generally characterized by having single point-to-point links from the nodes to switches in configurations that are akin to stars or combinations of stars. Also known is a general method of recovering from failures in SONET networks and networks using related architectures, such as optical networks. These networks are generally arranged as rings or other mesh topologies, in which nodes may be switches themselves.
In networks using point-to-point links, cabling is brought from connectors at nodes in the network to switches. A common configuration is shown in FIG. 1. The nodes may be servers that use network interface cards (NICs) as connectors. The cables may be fiber cables, and the switch for instance may be Gigabit Ethernet or Fibre Channel switch. The nodes in the networks that need to be provided with recovery ability have two or more NICs each. The connectors at the switch are generally referred to as ports. In the case of failure of a connector, another connector on the same node is used to provide recovery. Generally the second NIC provides a connection to a second switch, generally referred to as secondary switch, which provides redundancy to the first switch, generally referred to as primary switch. After failure of a NIC or a cable, the secondary switch connects to the primary switch through a connection between the two switches, as shown in FIG. 2 for a NIC failure and as shown in FIG. 3 for a cable failure.
The capacity of the connection between the two switches, which may possibly be colocated in the same chassis, is a limitation on the recovery capability of the network. Indeed, cabling practices seldom allow for physical separation of cables, since such separation would require diverse physical paths and greatly impact the ability of the network manager to oversee network layout. In certain cases, as shown in FIG. 4, cables from several nodes are brought together either by physical attachment or through multiplexing at a concentrator, and a connection from the concentrator to the switch completes the link between the nodes and the switch. In that case, a single cut or disconnection in the connection between the concentrator and the switch may entail the concurrent failure of several cables, as shown in FIG. 4. In that case, the ability to recover the connection between the nodes and the switch may be limited by the inter-switch connection. Indeed, the ability to recover from multiple cable failures, possibly caused by a single failure as shown in FIG. 4, would require a number of ports at the switch to be as high as the number of possible failures in order to provide recovery. Since ports are generally a dominant portion of cost for switches, such the extension of recovery shown in FIG. 3 to the case where we have several concurrent cable failures (as would be the case in FIG. 4 if a breach were to occur in the connection between the concentrator and the switch) would be costly.
An alternative approach to having several ports dedicated to the inter-switch interconnection is to have all nodes use the secondary NICs and the secondary switch. However, this option generally requires the network to become temporarily logically disconnected and then re-connected through a cold start. This recovery involves a significant disruption of services. While such a wholesale shift from the primary switch to a secondary switch is required when the primary switch fails, it is generally desirable to prevent a single failure to cause service disruption to a large number of nodes.
In an architecture using rings or a general mesh configuration, nodes are generally also switches and the connections are arranged as rings or interconnections of rings. An example is that of rings in SONET. Within a single ring, a failure of a node is recovered through path protection in unidirectional path switched rings (UPSRs) or through loopback in bidirectional line switched rings (BLSRs). When two rings are interconnected, the means of recovery generally presented for failures occurring in nodes interconnecting two or more rings in SONET is that of matched nodes.
As shown I FIG. 5, in matched nodes, a switch or node acts as the primary means of interconnection between two nodes, and a secondary node only acts as such an interconnection in the case where the primary node fails. The primary node and the secondary nodes are usually referred to as matched nodes 1 and 2, respectively. The nodes generally operate in the following manner. Matched node 1 houses an add-drop multiplexer (ADM) that performs a drop-and-continue operation, in which it transfers signals from one node to another, and also send a replica of those signals to matched node 2. In case of failure of matched node 1, matched node 2 acts as the interconnections means between rings 1 and 2. There are several drawbacks to this technique. Failures at matched node 1 may be partial, including the failure of the ADM itself, and recovery in that case is complicated. Further, wholesale failure of matched node 1 may require loopback to occur in each ring, as well as having matched node 2 become the new interconnection between the rings. The timing issues associated within such triple recovery steps are generally difficult. In particular, distributed scheduling over the two rings may cause failures and heavy dependence on timing issues. Finally, the matched node may itself have connections to other nodes, such as routers, that are outside the SONET rings. In that case each of the matched nodes require ports to those routers or other nodes, thus increasing the number of ports.
In view of the foregoing it would be desirable to provide a multiple switch protection architecture. It would be further desirable to provide the architecture wherein recovery is possible in the event of cable failure, link failure, partial failures, and when several rings fail simultaneously.