Network management systems are programmatic tools, services, applications or devices that implement management functions, policies and controls on a network. In many cases, management systems are employed to facilitate human managers who supervise operations of the elements in the network. For example, networks may operate performance management to enhance network performance, configuration management to monitor and manage network configurations for interoperability between network elements, accounting management to manage availability of resources, fault management to detect and manage network problems, and security management to implement firewalls, authentication and authorization.
The actions carried out by such network management systems may be governed by one or more network management policies, which are abstract expressions about how a particular network is managed. Specific examples of commercial products that support creation and management of management policies for a network include CiscoAssure Policy Networking, and Cisco Quality-of-ServicePolicy Manager, from Cisco Systems, Inc.
In policy management systems, management policies are usually implemented through use of one or more workstations. The workstations are often operated separately than other network elements. One function performed on these workstations when implementing a management policy is to detect when elements of the network are down. In theory, elements that are down have lost connectivity with the network for some reason. The management systems may detect the unreachable elements, and implement management policies that account for or compensate for the unreachable elements. In this context, network elements may include routers, switches, gateways, hubs, bridges, switch controllers, etc.
Typically, management systems identify unreachable elements by repeatedly polling each pertinent element of the network. If a response is not detected from the element polled, the element is assumed to be down or failed. The polling is usually done using protocols such as Simple Network Management Protocol (SNMP), User Datagram Protocol (UDP), or Telnet. Internet Control Message Protocol (ICMP) may also be used to poll elements (i.e. “ping”).
Polling requires resources for sending roundtrip communications to the polled devices. Furthermore, protocols such as SNMP and UDP are unreliable on congested networks. Communications sent using these protocols may fail with too much traffic, and little feedback is provided to notify the elements exchanging the communication that the communication failed.
The result is that using polling to query managed devices often generates inaccurate results regarding the health of a network. Elements of the network may be indicated as failed, when in fact the system was too congested to be able to poll that element using a protocol such as SNMP or UDP. In addition, the polling performed on the managed devices adds to network congestion, and consumes network resources. The managed devices that are polled to detect the health of the network also add overhead and cost to operation of the network.
Based on the foregoing, there is a clear need for an efficient manner to determine the health of a network.
There is a specific need to accurately detect whether network elements are down or unavailable for the purpose of implementing a management policy.