1. Field of the Invention
The present invention relates to the field of data communications networks. More specifically, the present invention relates to a method and system for providing high reliability to management of a cluster of network devices, and a method and system for managing a cluster of network devices.
2. The Background Art
Data communications networks known to those skilled in the art include Local Area Networks (LANs), Metropolitan Area Networks (MANs), and Wide Area Networks (WANs). Network devices are used to transmit information across networks, which may include various combinations of LANs, MANs, and WANs. Such network devices may include switches, bridges and routers.
Network management includes configuring network devices, monitoring the active network in order to diagnose problems, and gather statistics and information for administration. The Simple Network Management Protocol (SNMP) is one currently popular example of a network management tool and almost all network management software supports the SNMP. The SNMP is a simple request/response protocol that communicates management information between two types of SNMP entities: SNMP applications (also called SNMP managers) and SNMP agents. The SNMP applications are typically executed in a network management station, and the SNMP agents reside in external network devices (or “network elements”). The SNMP applications issue queries to gather information about the status, configuration, and performance of the network elements. The SNMP agents, which are hardware and/or software processes, report activity in each network element to the workstation console used to oversee the network. The agents return information contained in a management information base (MIB). A MIB is a data structure that defines what is obtainable from the network element and what can be controlled (turned off, on, etc.). The CiscoWorks™ software package, available from Cisco Systems, Inc. of San Jose, Calif., is an example of network management product supporting SNMP, and a LAN switch is an example of a network element that can be managed using SNMP.
A LAN switch is a network device that cross connects stations or LAN segments. A LAN switch is a Layer 2 switch that operates at Layer 2 (or Data Link Layer) of the OSI Reference Model, and forwards data traffic based on Media Access Control (MAC) layer addresses. LAN switches are available for Ethernet, Fast Ethernet, Token Ring, Fiber Distributed Data Interface (FDDI), and other similar LANs.
An Ethernet LAN switch improves bandwidth by separating collision domains and selectively forwarding traffic to the appropriate LAN segments. FIG. 1A illustrates a typical LAN switch 2 for a switched LAN. The LAN switch 2 contains a high-speed backplane and room for typically 4-32 plug-in line cards, for example, cards 3a-3d. Each card contains one to eight ports (connectors), for example, ports 4a-4p. Most often, each port is connected to a single host computer.
When a host 5a need to transmit data, it outputs a standard frame to the LAN switch 2. The card 3a getting the frame checks to see if the frame is destined for one of the other host connected to the same card 3a. If so, the frame is copied there and sent to the appropriate host on the same card, for example, the host 5b. If not, the frame is sent over the high-speed backplane to the destination's card, for example, to the card 3c. The card 3c sends the frame to the destination host, for example, the host 5k. In this kind of plug-in card, typically, only one transmission per card is possible at any instant. However, all the cards can be transmitting in parallel. With this design, each card forms its own collision domain, independent of the others.
Performance improves in LANs in which switches are installed because the LAN switches create isolated collision domains. Thus, by spreading users over several collision domains, collisions are reduced and performance improves. In addition, one or more ports of the LAN switch 2 (for example, a port 4p) may be used to connect another LAN switch 6 or LAN segment, rather than a single host.
As LAN grows, either due to additional users or network devices, additional switches must often be added to the LAN and connected together to provide more ports and new network segments. FIG. 1B schematically illustrates two LAN switches 2 connected in a cascaded configuration. On each of the LAN switches, four ports 4a-4d are dedicated to interswitch communication. The other ports on each LAN switch 2 are connected to hosts. For example, if each of the four interswitch connection is capable of supporting a 100 Mbps Fast Ethernet channel, the aggregate interswitch communication rate of the switches is 400 Mbps. However, the total number of ports available for connecting to hosts or other network devices on each LAN switch is diminished due to the dedicated interswitch connections that are necessary to implement the cascaded configuration.
As a computer network grows, network devices or switches are typically added to the network and interconnected according to the needs of the particular network to which they belong. Installing a network device traditionally includes inserting the device into the network and assigning it an Internet Protocol (IP) address. The IP address is a unique address that specifies the logical location of a host or client (i.e., the network device) on the Internet. In general, each network device must have its own IP address to be configured and managed, and each IP address must be registered with a domain name service (DNS). Once a network device has been assigned an IP address, a network administrator can access the network device by entering its IP address from a network management station. The network device can be configured from anywhere in the Internet using a protocol such as the SNMP.
However, assigning an IP address to each and every network device is undesirable, because registering IP addresses with a DNS is both costly and cumbersome, and the number of available IP addresses is limited. Furthermore, configuring each one of the network devices in a network requires considerable time and labor of a network administrator.
Clustering technology alleviates these problems by enabling a network administrator to configure and manage a group of switches using a single IP address. Such a group of switches is called as a cluster and is regarded as a single network entity. A cluster includes one commander switch and one or more member switches. A single IP address is assigned to the commander switch, and all of the switches in the cluster are then configured and managed though the commander switch using this single IP address.
FIGS. 2A-2C schematically illustrate examples of a cluster of switches, which includes one commander switch 7 and the member switches 9a-9h. FIG. 2A illustrates a cluster 11 in a star configuration, where all the member switches 9a-9h are directly connected to the commander switch 7. FIG. 2B illustrates a cluster 13 in a daisy-chain configuration, where only one the member switch 9a is directly connected to the commander device 7, and other the member switches 9b-9g are each connected to an “upstream” switch (fewer “hops” away from the commander switch 7). FIG. 2C illustrates a cluster 15 in a hybrid (or tree) configuration, in which star configuration and daisy chain configuration are combined. As shown in FIG. 2C, member switches 9a and 9e are directly connected to the commander switch 7, and other member switches are connected to either one of the member switches 9a and 9b via either star (parallel) or daisy chain (serial) configuration.
Typically, each switch in the cluster is capable of supporting a network management protocol, such as the SNMP discussed above, and contains its own management information base (MIB). Each switch in the cluster may be identified by a MAC address and/or unique identifier such as a unique community string in a SNMP implementation. However, only the commander switch is required to have an IP address. In a SNMP implementation, the cluster is configured and managed using the single IP address, a single password, and a single set of SNMP strings. The commander switch is the single point of contact for entire cluster, and all management requests are first sent to the commander switch and then forwarded to member switches.
The member switches of a cluster can be in the same location, or they can be distributed across a contiguous Layer 2 network. That is, a management network to which the cluster belongs may be a virtual LAN (VLAN). A VLAN is a switched network that is logically segmented by function, project team, or application, without regard to the physical locations of the user stations or physical LAN segment. Any switch port can belong to a VLAN. Since a VLAN is considered as a separate logical network, packets destined for stations that do not belong to the VLAN are forwarded through a router or bridge, but not through a LAN switch. Thus, in general, the commander switch and all member switches of the cluster must belong to the same management VLAN.
Although the clustering technology realizes efficient management of network switches, a single point of contact can be a single point of failure. The commander switch's failure may cause the entire cluster to break. It would be desirable to provide commander switch redundancy (a standby group) to the cluster. That is, if the commander switch (active commander) fails, another network switch (standby commander) should be able to take over and become the current active commander for the cluster. It would also be desirable to provide a self-recovery mechanism for the cluster information in the case where the active commander and the standby commander fail at the same time.