The present invention is generally related to management of a computer network. More particularly, the present invention is related to improving the accuracy in the analysis of data collected from remote devices connected to the network.
Network communications have become a fundamental part of today""s computing. As networks grow larger, increasingly complex, and interface with a variety of diverse networks, the task of monitoring and maintaining a network also becomes increasingly complex.
To assist a network manager, network management software (xe2x80x9cNMSxe2x80x9d) may be used in the management of a network. A conventional NMS may typically be executed on a management device or node of the network. From the management node, the conventional NMS may be configured to determine a network topology, detect malfunctioning remote network devices or communication links, monitor network traffic, and the like.
As part of the monitoring duties, the network manager may configure the NMS to occasionally query or poll remote network devices for information. The information may include status data, port information, address, and the like. The information required may be crucial for the network manager to assess the overall status of the network.
FIG. 5 illustrates a block diagram of a conventional management node or device 500 implementing a conventional data collection from a remote node. In particular, the management node 500 includes a NMS 510 and a network interface 520. The NMS 510 may be configured to provide the functionality for a network manager to manage a network 515 through the network interface 520.
The NMS 510 may include a data collector module 530 configured to retrieve user specified information at a scheduled time from remote devices 525a . . . 525n over the network 515, i.e., a data collection event. The data collector module 530 may retrieve the selected information from at least one of the remote device 525a . . . 525n and store the selected information in an associated output file in the management node 500. The associated output file may be analyzed by additional network tools of the NMS 510 to assist in the assessment of the status and maintenance of the network 515.
The results of an analysis of the associated output file may be skewed. Typically, network systems experience regular patterns of network traffic (i.e., data/command packets traversing a network). A typical pattern may be a high volume of network traffic during the morning hours of a work week resulting from, for example, users checking their electronic mail in the morning), followed by a steady volume of network traffic for the rest of the day. The network traffic volume may subsequently drop during the evening hours as users end their respective work days.
Network traffic pattern on working days of a week may be markedly different than on a weekend. Weekend activity may include occasional network administration traffic (e.g., back-up, maintenance commands, etc.) along with an occasional weekend user. The weekend network traffic pattern may also be markedly different from overnight traffic pattern during the working days of a week. This overnight activity may consist entirely of network administration traffic and/or time-intensive computations.
If the results of the analysis of the associated output file are used to determine a performance threshold for incoming data, the performance threshold computation may be skewed. For a typical performance threshold computation, most conventional network management systems use all the relevant collected data value points to calculate a given performance threshold. As a result, the given performance threshold may not take into account the varying network traffic patterns that may occur during a week or during a given time period. Accordingly, a weekend data point, which may not be an aberration when compared with comparable weekend data points, may be an aberration when compared with the combined data points.
The aberrations may generate unnecessary alarms (or alerts) to a network manager, and the unnecessary alarms may present an erroneous picture of the state of a network. As a result, a network manager may unnecessarily adjust performance parameters of the network to accommodate the unnecessary alarms, which may lead to an inefficient allocation of network resources. Additionally, the generation of unnecessary alarms may lead a network manager to assume that all alarms from the NMS are trivial. Thus, the network manager may ignore meaningful alarms that arrive from the NMS.
One solution to the generation of unnecessary alarms is a proposal where a sliding window of time is utilized to create the appropriate thresholds. The technique is fully described by U.S. Pat. No. 6,182,022 to Mayle et al., the subject matter of which is herein incorporated by reference.
In the Mayle technique, a subset of data points collected during a sliding window of time T1 (e.g., a week or a month) are used by a statistical analyzer to calculate a baseline for a monitored performance parameter or attribute. The subset of data points may be collected during a period of time within time T1, such as during normal business hours of a week or a month. The baseline is calculated based on the subset of data points and represents a normal operating range for the monitored performance parameter during the sliding window of time. The baseline is subsequently utilized to generate a new performance threshold. However, although the sliding window of time may take into account the varying amount of network traffic over an extended period of time, the technique does not account for performance parameters that may vary from a calculated threshold in a limited period of time. Furthermore, conventional techniques, including the Mayle technique, do not disclose a method for resetting an alarm.
The invention facilitates improved computer network monitoring. In one respect, an exemplary embodiment of the present invention includes a method of monitoring network attributes. The method includes: receiving a first plurality of data values for a network attribute; generating an alarm in response to at least one data value of the first plurality of data values exceeding a first threshold for the network attribute; and resetting the alarm in response to a data value not exceeding a second threshold for the network attribute. The second threshold is within the first threshold (e.g., below the first threshold or within a range of the first threshold), and the data value is a data value that was measured subsequent to the at least one data value that caused the alarm to be generated. The method further includes steps of calculating the first threshold based on the first plurality of data values, wherein the first plurality of data values are measured during a first predetermined period of time; and comparing the first plurality of data values to the first threshold. The method further includes steps of recalculating the first threshold based on a second plurality of received data values measured during a second predetermined period of time, subsequent the first period of time; comparing the second plurality of data values to the recalculated first threshold; and generating a second alarm after the alarm is reset in response to at least one of the second plurality of data values exceeding the recalculated first threshold. The first predetermined period of time and the second predetermined period of time may approximately be one hour time intervals. Also, the first and second thresholds and subsequently recalculated thresholds may include a single value or a range of values.
Exemplary methods of the present invention include steps that may be performed by computer-executable instructions executing on a computer-readable medium.
In another respect, an exemplary embodiment of the present invention includes an apparatus operable to monitor a plurality of network attributes. The apparatus includes a data collector configured to receive a first plurality of data values for a network attribute from a plurality of network devices via a network; a threshold calculator configured to calculate a first threshold for the network attribute and a second threshold for the network attribute from the first plurality of data values, the second threshold being within the first threshold; and a threshold comparator configured to compare the first plurality of data values to the first threshold and generate an alarm signal in response to at least one of the first plurality of data values exceeding the first threshold. The threshold comparator is further configured to generate a reset signal after the alarm signal is generated in response to a data value not exceeding a second threshold for the network attribute.