1. Field of the Invention
The present invention relates to network management systems, and more specifically to a method and apparatus for monitoring and reporting the status of a desired group of resource elements.
2. Related Art
Network environments contain several resource elements. A resource element generally refers to an entity that is used in computation or communication. Examples of resource elements include memories, processors, processes, etc.
There is a general need for administrators, users, and other personnel to monitor the health of various resource elements. For example, it may be desirable to know whether a machine has lost connectivity to an underlying network, an interface has dropped packets while forwarding messages, or the number of “open cursors” (in which the database software may need to maintain state information representing the access(es) by a prior query) in a database has exceeded normal limits.
Network management systems are used to monitor various attributes of these resource elements. An attribute generally refers to an entity, the status or other statistic of which can be determined. Examples of attributes include, but are not limited to, operational status of resource elements, utilization of disk space/processor/interface, count of the packets dropped by an interface, the number of open cursors for a database, etc., as is well known in the relevant arts.
In one prior system, thresholds are associated with corresponding attributes of resource elements, and an alarm is generated if the measured value for an attribute exceeds (or falls below) the corresponding threshold. In some systems, the thresholds themselves are dynamically computed and based on historical data. In general, the thresholds are intended to represent normal or desirable behavior in relation to corresponding attributes of the resource element.
One drawback with such a prior system is the need to analyze several alarms to determine the extent of abnormalities in a network environment containing one or more resource elements. Further, alarms are usually lagging indicators which typically indicate the state of a system after a problem has occurred, whereas it may be beneficial to use information about all deviations from normal or desirable behavior (i.e., all abnormal events that may or may not result in alarms) to obtain a leading indicator that may indicate impending problems in the system. Various aspects of the present invention overcome one or more of such disadvantages (or provide one or more of the desired benefit(s)), as described in sections below.
In the drawings, like reference numbers generally indicate identical, functionally similar, and/or structurally similar elements. The drawing in which an element first appears is indicated by the leftmost digit(s) in the corresponding reference number.