1. Field of the Invention
The present invention is directed to systems, methods, apparatus and related software for managing computer networks. More particularly, the invention relates to the generation of trouble-tickets (bundled error messages) indicative of fault conditions of one or more managed networks. Accordingly, the general objects of the invention are to provide novel systems, methods, apparatus and software of such character.
2. Description of the Related Art
With the ascendancy of the modern computer in the last few decades have come new ways to harness its potential. For example, as smaller computers became more powerful, they could be linked together as a network, in order to share memory space, software, information and to more easily communicate with each other. As opposed to a mainframe computer, distributed computer networks allow individual computers to form “electronic co-ops” structured in a hierarchical fashion. Whether using direct wiring (e.g., such as a Local Area Network (LAN)) or indirect communicative coupling (such as telephone lines, T1, T2 or T3 lines, etc.), contemporary networks can reach extraordinary proportions in terms of geographic scope, technical complexity, management costs, and processing capabilities. Since such networks may yield tremendous computing power that is dispersed over a wide geographical area, recent decades have seen a concomitant reliance placed on such computer networks.
The enormous benefits obtained from the use of such computer networks are, however, tempered by the fact that computer hardware, firmware and software do malfunction in many ways for a wide variety of reasons. In fact, the more complicated the computer system, the more likely that problems will occur and the more difficult it is to diagnose and solve each problem. Accordingly, techniques, hardware, software, etc. have been developed for the sole purpose of managing computer networks so that network outages will be minimized. This task is currently, however, a labor intensive, costly, stressful, and complicated task.
One widely used and helpful system for detecting fault conditions existing on computer networks is a network management software package manufactured by Hewlett-Packard and entitled “OpenView.” This software includes a graphical user interface (GUI) that is capable of graphically displaying the architecture of managed networks as well as displaying limited information regarding the status of various components of each network. The displayed network components are identified through a network discovery process during a set-up phase and the various components of the network are color-coded to thereby indicate the status of the various components. OpenView also includes a browser for displaying textual information regarding the status of the various components of the network. With such a system, each network management operator may monitor several networks simultaneously in an effort to detect and solve fault conditions from a remote network operations center (NOC).
Upon detection of a fault condition of a network, OpenView is capable of presenting fault related information to an operator. Such information may include network IP addresses for the various components of the monitored network where the fault was detected. However, OpenView is not capable of providing the operator with contact information (name, telephone number address, etc.) for the personnel resident at the managed network. Nor can it convey any special procedures that should/must be followed to fulfill the trouble-shooting preference and/or requirements of various customers. This deficiency forces network management operators to manually look up such information recorded in a conventional paper format, such information being necessary to obtain prior to taking action to solve a reported fault condition.
The converse deficiency of OpenView is that it may overwhelm a network management operator by flooding the operator with duplicative, repetitive, irrelevant and/or unnecessary information. Of particular concern is the possibility that critical network management information will go unnoticed to amid a mass of other data. This may occur, for example, where a single point-source network outage affects a large number of monitored network devices connected to a faulty component. When this occurs, each one of the monitored network devices may report the same problem to a network management operator by issuing a fault message (called a “trap” in SNMP terminology) describing a related problem occurring at a different location. In large networks, this may yield hundreds of essentially duplicative error messages being reported to a network management operator even though, as a practical matter, only a single problem exists and needs to be solved.
A related problem is that of “network bouncing.” Network bouncing refers to network fault conditions that only temporarily exist and then resolve themselves. Poor quality lines, overloaded lines, solar flares, maintenance operations of a third-party line provider, non-destructive power surges resulting from thunder storms, etc., may all cause such network bouncing. Other examples are widely known in the art.
Considering lightning strikes as an illustrative example, a temporary and localized power surge resulting from a lightning strike may briefly interfere with normal operations of the computer network. Provided the power surge is nondestructive, however, the fault condition will cease to exist in a short time without any intervention whatsoever. In this situation, the monitored network device will issue fault data indicating an outage while the power surge exists and, in a short time, issue another message indicating that the outage has resolved itself. Thus, a single monitored network device may issue two messages within moments of each other even though the initial fault condition may have resolved itself before a network management operator has time to take any corrective action whatsoever. This is somewhat akin to receiving a false-positive test result during a medical diagnosis. Where hundreds of devices are influenced by a single lightning strike and dozens of lightning strikes occur in a single hour, thousands of essentially useless messages can be delivered to a network management operator. Amid this mass of data, more important error messages indicative of more serious fault conditions can easily go unnoticed by network management operator.
When a database is used in combination with a network monitoring tool in the NOC, conventional network monitoring systems require operators to access the database, open a record and manually enter content for each new fault condition being addressed. One popular database for use in such a system, is CLARIFY produced by Amdocs, Inc. which is headquartered in Chesterfield, Mo., USA. Data entry of this type is a time consuming process that can lengthen the time necessary for the network management operator to take corrective action to thereby solve reported fault conditions. Automatic generation and storage of comprehensive trouble-tickets into one or more databases would, accordingly, greatly improve the ability of operators to properly diagnose and correct network fault conditions.
There is, accordingly, a need in the art for novel methods, systems and apparatus for automatically generating bundled error messages that provide network management operators with more complete information relating to managed customer networks to thereby permit more efficient network management. Such methods and apparatus should provide operators with information indicative of the fault detected as well as network-specific profile information describing the network devices on which the detected fault condition occurred.
There is an additional need in the art for novel methods, systems and apparatus for automatically generating bundled error messages that provide network management operators with fewer duplicative error messages while not permitting important fault condition information to be lost.
There is another need in the art for novel methods, systems and apparatus for automatically generating bundled error messages that automatically reduce the number of “false-positive” error messages provided to network management operators.
There is another need in the art for novel methods, systems and apparatus for automatically generating bundled error messages in which the sensitivity to “false-positive” error messages may be adjusted.