This invention relates generally to network clusters of distributed processing hosts, and more particularly to the automatic detection of the topology of dynamically changing network clusters.
Large data centers, as of enterprises, may comprise networks of hundreds or more physically interconnected processing nodes, each processing node having a host computer connected to one or more Layer 2 network switches. The processing nodes (hosts) may be organized into different logical clusters for use by different users, different groups and for different purposes. For instance, one logical cluster may be a large distributed database for warehousing enterprise data.
Large clusters are very dynamic, as hosts and switches are constantly being added, removed or going down. For these and other reasons, network technicians and administrators need current network topology information that maps hosts to switches for maintenance and administrative tasks. For example, if a switch shows warnings that it may fail and need to be replaced in the near future, the administrator will need mapping information to move hosts off of the failing switch. The mapping information should be current and should correspond to the physical layout of the network wiring so that effected hosts can be identified and migrated safely and so that the workloads running on those hosts can be shutdown in a safe way. If errors have occurred on a particular switch port, it may be desirable to try to correlate them to the particular host located on that switch port so that the host can be checked for potential performance issues associated with the switch port errors. Still another reason for network topology information may be the need to know the Ethernet MAC address of a host that is powered off. The MAC addresses of hosts connected to switch ports can be determined from the switch, but if the administrator does not know which host is connected to which switch port, he will not know which MAC address corresponds to which host. The MAC address could be useful in a DHCP configuration file where hosts that are powered off will be reinstalled with a new operating system (OS) using DHCP and PXE booting based installation of the OS. The DHCP configuration file needs the MAC address for each host in order to be correlated to the target IP address and OS installation personality.
Since hosts may fail or be powered down temporarily at any time, mapping data must be tolerant of and account for these occurrences intelligently. In this environment a “logical cluster” comprises a set of hosts considered to be part of a cluster or group in an administrative domain that may or may not have some shared application logic or uses. Current cluster mapping information is necessary for administration and maintenance of the cluster. Since other hosts that are not part of the logical cluster may be connected to the same switches as hosts of the logical cluster, there is a need to map accurately the hostnames of the hosts in the logical cluster to the cluster switches and to cluster switch port locations. Moreover, there is a need for the mapping to be frequently updated so that it is current in real time and is readily available.
One approach to generating topology mapping data has been to connect to all switches that are part of the logical cluster and obtain the MAC address and port information for each location in the switch, and then connect to all hosts in the logical cluster and get the MAC address and hostname. Mapping information may then be obtained by joining this data using the common MAC address field in order to map the hostname to the switch name and port location. There are, however, several problems with this approach. If a host is down or otherwise not available, that host would be excluded from the mapping data or the mapping generation process would fail. Likewise, if a switch is down during generation of the mapping data, all hosts connected to that switch would also be excluded from the mapping data or the generation of the mapping data would fail. Thus, this approach can produce inaccurate or incomplete mapping information and lacks resiliency.
There is a need for an approach for timely, robust and resilient topology detection and mapping of a dynamic cluster that addresses the foregoing and other problems with known approaches, and it is to these ends that the invention is directed.