Computers and other devices are commonly interconnected to facilitate communication among one another using any one of a number of available standard network architectures and any one of several corresponding and compatible network protocols. The nature of standard architectures and their topologies is typically dictated at the first two layers of the OSI (Open Systems Interconnection) Basic Reference Model for networks, which are the physical layer (layer-1) and the data link layer (layer-2). One of the most commonly employed of such standard architectures is the Ethernet® network architecture. Other types of network architectures that are less widely used include ARCnet, Token Ring and FDDI.
Packet switched network protocols are commonly employed with a number of architectures such as the Ethernet® standard. These protocols are typically defined by layers 3 and 4 of the OSI and dictate the manner in which data to be transmitted between devices coupled to the network are formatted into packets for transmission. These protocols are independent of the architecture and topology by virtue of their separation as hierarchical layers of the OSI. TCP/IP is one example of a layer-4/layer-3 protocol combination typically used in Internet applications, or in intranet applications such as a local area network (LAN).
One of the most basic and widely implemented network types is the Local Area Network (LAN). In its simplest form, a LAN is a number of devices (e.g. computers, printers and other specialized peripherals) connected to one another by some form of signal transmission medium such as coaxial cable to facilitate direct peer-to-peer communication there between. A common network paradigm, often employed in LANs as well as other networks, is known as the client/server paradigm. This paradigm involves coupling one or more large computers (typically having very advanced processing and storage capabilities) known as servers to a number of smaller computers (such as desktops or workstations) and other peripheral devices shared by the computers known as clients. The clients send requests over the network to the one or more servers to facilitate centralized information storage and retrieval through programs such as database management and application programs stored on the server(s).
Network resources are required to couple computers and other devices to a network. These network resources are sometimes referred to as network adapter cards or network interface cards (NICs). An adapter card or NIC typically has at least one port through which a physical link is provided between the network transmission medium and the processing resources of the network device. Data from the processing resources of one network device is formatted (as packets in the case of packet switched networks) by its resident protocol layer and communicated through its network resources, over the coupling media to the network resources of a second network device. The received data is then deformatted by the protocol layer of the second network device and then presented to the processing resources of the second device. Network resources such as adapter cards or NICs are commercially available and are designed to support one or more variations of standard network architectures and known topologies.
It is now common to couple some or all of the devices of a network to a single network or subnet through redundant (i.e. teamed) network interface resources to improve the reliability and throughput of the network. These redundant links to the network may be provided as a plurality of single-port NICs, one or more NICs each having more than one port, or a combination thereof. Teaming of network interface resources is particularly common for servers, as the demand for throughput and reliability is normally greatest for servers on a network. Resource teams are typically two or more NICs or NIC ports logically coupled in parallel to appear as a single virtual network adapter to the other devices on the network. These resource teams can provide aggregated throughput of data transmitted to and from the network device employing the team and/or fault tolerance (i.e. resource redundancy to increase reliability). Throughput aggregation can be optimized by employing load-balancing between the teamed NICs or NIC ports.
Fault tolerant teams of network resources commonly employ two or more network adapters or NIC ports, with one port being “active” and designated as the “primary” while each of the other members of the team are placed in a “standby” or “inactive” mode and designated as “secondary” members of the team. A NIC or NIC port in standby mode remains largely idle (it is typically only active to the limited extent necessary to respond to system test inquiries to indicate to the team that it is still operational) until activated to replace the current primary adapter when it has failed. In this way, interruption of a network connection to a critical server may be avoided notwithstanding the existence of a failed network adapter card or port.
Load-balanced teams of network resources aggregate two or more network adapters or NICs to increase the throughput capacity of data traffic between the network and the device. In the case of “transmit” load-balancing (TLB) teams, all members of the team are typically in the active mode and data transmitted from the device to the network and is typically aggregated and balanced over all of the team members in accordance with some load-balancing policy. One of the active team members is still designated as the primary for the team, however, and it handles all of the data received by the team. In the case of “switch-assisted” load-balanced (SLB) teams, throughput is balanced over all active team members for data transmitted to the network as in TLB teams as well as data received by the team from the network. Typically, the received data is balanced with the support of a switch that is capable of performing load-balancing of data destined for the team in accordance with some load-balancing policy. Load-balanced teams also provide fault tolerance by default, as team members that cease to function as a result of a fault will be inactivated and the aggregated throughput of the team will be reduced as a result.
Certain network configurations are designed to achieve redundancy of devices such as routers and switches in the network. For a network device such as a server system employing a TLB or NFT team, such configurations can cause members of the team to be coupled to the network through a different one of the redundant devices and thus through separate paths of the network or subnet. To ensure that all team members are coupled to the same broadcast domain (i.e. same layer-2 network or subnet), these device-redundant configurations require that all of the redundant devices (and therefore the team members) ultimately be interconnected in some way—either directly or by way of uplinks—to a common third device (e.g. a backbone or core switch).
If one of the redundant devices (e.g. switches) coupling a team member to the network fails in such a configuration, the team will typically detect the resulting loss of connectivity to the network based on the resulting loss of link (among other detected conditions) that ensues for those NIC(s) of the team coupled to the network through that failed device. If the team member losing link is the primary of an NFT team or a TLB team, the entire team (and therefore the network device employing the team) loses communication with the network. When the team detects this loss of link to the network, it will typically fail over automatically in a manner which designates a different NIC of the team to be the primary and thereby restores team connectivity to the network. If the team member losing link with the network is an active but secondary member of a TLB team, no failover is required, but the team member will be placed in an inactive mode and will no longer be able to transmit load-balanced packets to the network until the failed switch has been restored.
It is also possible, for this type of redundant configuration to suffer a failure in an uplink to the backbone or common core switch for example, rather than one of the redundant devices that couples the NICs of the team to the network. In this type of failure, various team members can become isolated from one another on newly created LAN segments such that they are no longer contiguous with the segment to which the primary NIC of the team is coupled. Thus, a simple failover mechanism such as that described above will typically not serve to restore full connectivity to the team (and therefore the server) for all clients on the network. Moreover, automatic failover mechanisms such as the one described above typically require that a loss of physical link to the network be detected for at least one of the team members as a condition for the mechanism to even be triggered. Although failure in an uplink to the core can isolate various team members from one another on newly created segments and thereby degrade the team's connectivity with the core network, a team member coupled to an isolated segment may still maintain physical link with its segment and would not necessarily trigger a traditional failover.
When a failure occurs as described above, clients on secondary network paths (i.e. paths coupled to members of the team that are designated as secondary and not to the NIC designated as primary) will no longer be able to communicate with the network if they become isolated from the primary network path (the path of the original network to which the primary team member is coupled). This is because NFT and TLB teams receive data for the entire team only through the member designated as primary (for the NFT team, the primary transmits data for the entire team as well). Because there is typically only one primary member per team, only a path still contiguous with the primary path (the path coupled to the primary team member) will still have communication with the team and therefore the server employing the team. If the failure occurs in an uplink coupling the core to the primary path, the server employing the team becomes isolated from the core network as well as those network devices coupled to the secondary paths. Under these circumstances, if a router is provided as part of the core network by which the server communicates with the Internet (e.g. a gateway), the team (and therefore the system that employs it) becomes unable to communicate with the Internet as well.