The present invention relates to polling schemes for network management systems. More specifically, the invention relates to time division polling schemes for network management systems.
With the advent of faster and cheaper network communication devices, the communication infrastructure has been expanding in size lately. These network devices, including specialized computer systems dedicated to processing communication traffic, have been increasing in numbers and systems to monitor the statuses of network devices and the health of the network as a whole, have become almost a basic requirement for any network management system.
Large communication networks typically include heterogeneous network devices and these devices can vary widely in size and power. The communication among these devices is usually by a common protocol, such as Internet Protocol, SONET, etc. The network devices that use Internet Protocol are interlinked through routers, bridges, multiplexors, and hubs, which provide the essential support to transport the communication payload from a source to a destination in the network.
In order to ensure smooth operation of the network, a network management system may be deployed in the network. The network management system, which is a software system running on a network device in the network, gathers information about the topology of the network, the operational statuses of network devices and the interconnection among them, performance statistics of the different segments of the network and potential trouble spots in the network, if any, and may also provide a mechanism to configure the network.
The network management system utilizes a network management framework consisting of a management protocol and a set of standardized managed objects with supporting schemata. As an example, the Simple Network Management Protocol (SNMP) is a network management framework that is quite common in the field.
Initially, the network management system discovers the different network devices (or objects) connected to the network and stores all the information it gathers, like Internet Protocol (IP) address and the like, in a local database. Then, the network management system periodically queries or polls these nodes for their operation statuses and provides the current status of the network in a graphical form to network personnel. As new network devices are added to the current network, these devices are combined with the existing ones in the database and their statuses are monitored as well.
If there is no response from a network device for a poll within a specific time limit, the poll is retransmitted. Response to this poll is awaited for a specific time limit and, in the absence of a response, the poll is retransmitted. This retransmission is typically done a specific number of times and if there is no response after the last poll, the network device is declared to have an inactive operational status and the database is updated accordingly. During the process of retransmissions, the device is usually said to be in an unresponsive state.
If a network device receives a large number of polls in a short time frame when it is busy dealing with network payload traffic, then the network device may send delayed responses or even discard these polls without processing. The delayed responses and discarding of polls will lead to retransmissions from the network management system and thus resulting in more traffic in the network and more processing load for the network device. The transient rise in polling requests at the input end of the network device may adversely affect the device's ability to process the payload in a timely way. This is particularly true when the devices are optimized for an application and hence may not have enough resources to effectively deal with the spurt in the workload. Accordingly, network management systems should avoid overloading the network and/or devices with polling requests.
One way to avoid overloading the network with polling requests is to restrict the number of polls that are dispatched in such a way that there cannot be more than a fixed number of network management system poll requests in the network at any one time. For example, the Hewlett-Packard Open View (HPOV) network management system accomplishes this by restricting the number of unresponsive network devices to three. That is, when there are three network devices in an unresponsive state in the database at any time, the network management system stops sending polls to other devices until at least one of them changes state to an active (response received) or fail (no response) state. This can ensure that the network does not get overloaded with a possible flood of poll requests and responses in a short time frame. However, this technique may be slow to discover new devices and potential problems in very large networks.
Another proposed way to avoid overloading the network with polling requests is to send the polls at a fixed rate using a rate controlling mechanism (see, e.g., U.S. Pat. No. 5,710,885, issued Jan. 20, 1998 to Bondi). Devices to be polled are stored in a queue and poll requests are sent at a rate determined by the rate controlling mechanism. Although this technique can allow for a variable number of unresponsive devices, implementation can be difficult and the results may not be satisfactory in many situations.