A data communications network generally includes a group of devices, or objects, such as computers, repeaters, bridges, routers, etc., situated at network nodes and a collection of communication channels or interfaces for interconnecting the various nodes. Hardware and software associated with the network and the object devices on the network permit the devices to exchange data electronically via the communication channels.
The size of a data communications network can vary greatly. A local area network, or LAN, is a network of devices in close proximity, typically less than a mile, that are usually connected by a single cable, such as a coaxial cable. A wide area network (WAN) is a network of devices separated by longer distances and often connected by telephone lines or satellite links, for example. Some WANs span the United States, as well as the world. Furthermore, many of these networks are widely available for use by the public, including universities and commercial industries.
A very popular industry standard protocol for data communication in networks is the Internet Protocol (IP). This protocol was originally developed by the U.S. Department of Defense, and has been dedicated to public use by the U.S. government. In time, the Transmission Control Protocol (TCP) and the Unreliable Datagram Protocol (UDP) were developed for use with the IP. The TCP/IP protocol is a protocol that implements certain check functionality and thus guarantees transfer of data without errors. The UDP/IP protocol does not guarantee transfer of data but it offers the advantage of requiring much less overhead than does the TCP/IP protocol. Moreover, in order to keep track of and manage the various devices situated on a network, the Simple Network Management Protocol (SNMP) was eventually developed for use with the UDP/IP platform. The use of these protocols has become extensive in the industry, and numerous vendors now manufacture many types of network devices capable of operating with these protocols.
Network Management Systems, such as OpenView Network Node Manager (NNM) by Hewlett-Packard Company of Palo Alto, Calif., are designed to discover network topology (i.e., a list of all network devices or objects in a domain, their type, and their connections), monitor the health of each network object, and report problems to the network administration (NA). NNM contains a monitor program called netmon that monitors the network; NNM is capable of supporting a single netmon program in the case of a non-distributed network management environment and multiple netmon programs in the case of a distributed network management environment. In the distributed network management environment, a plurality of netmon processes run on various Collection Station hosts, each of which communicates topology and status information to a centralized control unit, called a Management Station, that presents information to the NA. The management station is configured to discover the network topology and from that, construct a network management map comprised of various submaps typically arranged in a hierarchical fashion. Each submap provides a different view of the network and can be viewed on a display device.
The monitoring function of a Network Management System is usually performed by a computer program that periodically polls each network object and gathers data that is indicative of the object's health. Thus, each collection station is responsible for polling of objects assigned to it while the management station is assigned to poll objects assigned to it. Based upon the results of the poll, a status value will be determined. For example, a system that fails to respond would be marked as “critical.” netmon performs the status polling function.
It is important to the proper operation of the network that the failure of any network object be known as soon as possible. The failure of a single network object can result in thousands of nodes and interfaces suddenly becoming inaccessible. Such a failure must be detected and remedied as soon as possible. Since collection stations are responsible for detecting the failure of their network objects through status polling, when a collection station itself goes down alternate arrangements must be made to ensure that status polling of the failed objects is maintained.
When a collection station has been downgraded from a normal status to a critical status due to an inability to communicate with the collection station, the objects normally polled by the critical collection station must continue to be polled. One way to ensure that a collection station's object are properly polled on a periodic basis is to build in redundancy to the network management system. A set of objects are thus polled by the management station as well as by the collection station. This practice of redundancy, however, while operating to ensure polling of objects has the disadvantage of increasing overhead costs of the network. Having a set of objects polled by both its collection station and the management station is, of course, inefficient for the vast majority of time during which such redundant polling is not necessary. There is therefore an unmet need in the art to be able to ensure that objects of a collection station will be status polled in a non-redundant manner in the event that the collection station is downgraded from a normal to a critical status.