1. Field of the Invention
The present invention relates to a method, system and program for establishing communication with a network device and, in particular reestablishing communication after communication with the device has been lost.
2. Description of the Related Art
Personal computers and workstations have become standard work tools in most office environments. To further improve the usefulness of the computer systems, most office computer systems have been linked together into an office Local Area Network (LAN). The Local Area Network allows the computer users of different computer systems to easily share information with each other. The network also allows the computer users to share computer hardware such as printers and modems. Many local area networks consist of a centralized network hub that is coupled to all the end computer systems.
This proliferation of network devices has resulted in very large and difficult-to-manage computer networks. For example, a computer network manager may be responsible for installing and maintaining numerous network hubs, network printers, network bridges, routers, gateways, file servers, and remote access servers. To simplify the task of managing all these network devices, network management systems have been devised.
Generally, to support the communication network, network management personnel want to know what nodes are connected to the network, what each node is (e.g., a computer, router, or printer), the status of each node, potential problems with the network, and if possible any corrective measures that can be taken when abnormal status, malfinction, or other notifiable events are detected.
To assist network management personnel in maintaining the operation of the network, a network management framework was developed to define rules describing management information, a set of managed objects and a management protocol. One such protocol is the simple network management protocol (SNMP).
Network management systems need to interact with existing hardware while minimizing the host processor time needed to perform network management tasks. In network management, the host processor or network management station is known as the network manager. A network manager is typically an end-system, such as a mainframe or workstation, assigned to perform the network managing tasks. More than one end-system may be used as a network manager. The network manager is responsible for monitoring the operation of a number of network devices, which are known as managed nodes. The network manager, the corresponding managed nodes and the data links therebetween are known as a subnet.
Many different tasks are performed by the network manager. One such task is to initially discover the different nodes, e.g., end-systems, printers, routers and media devices, connected to the network. After discovery, the network manager continuously determines how the network organization has changed. For example, the network manager determines what new nodes are connected to the network. Another task performed after discovery, is to determine which nodes on the network are operational. In other words, the network manager determines which nodes have failed.
Once the nodes on the network are discovered and their status ascertained, the information is stored in a database and can be displayed along with the status of the different nodes along the network to the network management personnel. Topology maps assist the personnel in the trouble shooting of network problems and with the routing of communications along the networks, especially if nodes have failed.
Through the discovery process, the network manager ascertains its Internet protocol (IP) address, the range of IP addresses for the subnet components (i.e., the subnet mask), a routing table for a default router and address resolution protocol (ARP) cache tables from known and previously unknown nodes with SNMP agents. To ascertain the existence of network nodes, the discovery process performs configuration polls of known nodes and retrieves the ARP cache tables from the known nodes, and the routing tables. The network manager then verifies the existence of those nodes listed in these tables that it has not previously recorded in its database.
Network manager systems can discover nodes and verify the existence and status of nodes by sending to each node an Internet control message protocol (ICMP) poll and waiting for a response. The ICMP poll is also known as a ping. If no response is received after a specified period of time, the node is determined to be nonoperational or to have failed. Instances may occur when the ping is not received by the node, or the node is busy performing another task when the ping is sent. Thus, to verify that a node has actually failed, the network manager sends out a sequence of pings. Each successive ping is transmitted if a corresponding acknowledgment is not received during an associated scheduled timeout interval.
One specific application of these network management functions can be found in the network printer art. Printer port monitor applications exist on any number of computers on a network and serve a network manager role with respect to the network printers. The printers are representative of the nodes which are monitored and managed by the port monitors.
Among the many functions of a typical printer port monitor include the periodic status polling of the printers to determine whether they still respond to queries. When a printer stops responding to queries, either due to loss of power or some other condition, then port monitors act in different ways, depending upon the type of monitor.
Some monitors continue their status polling at, for example, 5 second intervals as if nothing had happened. However this is wasteful of both the port monitor processor cycles and network bandwidth since it serves no purpose to continue polling a non-responsive printer. The present invention therefore provides an improved technique for verifying the status of network nodes while reducing the amount of polling, on the network.
To overcome the limitations in the prior art described above, preferred embodiments disclose a method, system, and program for establishing communication with multiple network devices. A detection is made of at least one network device that is not available for communication. A routine is executed at predetermined intervals that sends a message to each unavailable network device to establish communication with the unavailable network device. A determination is then made as to whether the message sent to each unavailable network device established communication with that network device. Indication is then made that each previously unavailable network device for which the message established communication is available on the network.
In further embodiments, the routine that sends a message to each unavailable network device is executed by a thread that initiates a separate thread for each message to transmit to the unavailable network devices.
In still further embodiments, the message sent to each unavailable network device comprises the minimum form of message which is sufficient to establish communications with the unavailable network device.
In further embodiments, the sending of all other messages to each unavailable network device is discontinued.
In still further embodiments, a plurality of codes are provided. In such case, the message to each unavailable network device includes one code. The network devices only respond to messages using a code that the network device recognizes. The routine sends subsequent messages to each network device including one code, until one of the following events occurs, communication is established with the network device or all of the codes have been included in messages to the network device that failed to establish communication.
Preferred embodiments efficiently use processing and network resources to reestablish communication with a network device after communication with the device has been lost, such as the case when the device no longer responds to queries from the network. For all such devices, preferred embodiments use a single thread at predetermined intervals to send a message in an attempt to reestablish communications. This conserves network bandwidth by limiting the messages to the network devices.