1. Field of the Invention
The present invention relates to a system for analyzing faults on computer and like message networks.
2. Description of the Prior Art
A message network is a network (which may or may not include nodes performing switching functions) interconnecting a plurality of data processing devices. Such networks are often used to interconnect a number of computers, but can also be used for other data communication purposes, such as telephone-type networks. In such a network, information is generally transmitted in the form of discrete packets, and the routing of the packets is at least partially controlled or determined by the various nodes in the network. In some cases, the route taken by the packets is set up for the message and all packets follow that route; in others, the various packets of a message may follow different routes through the network. Usually, a number of packets of different messages will be interleaved on any particular link between two adjacent nodes in the network.
Such networks are liable to suffer from faults. The cause and/or effect of a fault may both be immediately evident. For example, the physical linkage between two nodes may be interrupted or no packets may be received from a particular node. However, network faults are often subtle in both their causes and effects, and it may not be clear whether there is a fault or not. For example, a poor response time of the network may be due to a fault or it may be due to an unusual and extreme workload imposed on it.
A variety of instruments are available for network fault diagnosis, where the term "diagnosis" is used in a broad sense. At the lowest level, there are voltage level testers, continuity testers and the like. At a slightly higher level, there are signal presence testers such as LED instruments. However, many network faults occur at a high level, and their diagnosis requires inspection of the network at a correspondingly high level, involving the observation of packets and packet types. This can be achieved by means of protocol analyzers. However, the use of protocol analyzers has two difficulties. One is that the setting up of the analyzer is a skilled task, requiring a long training and learning period before it can be used effectively. The other is that the output from the analyzer is generally in a form which is not directly intelligible and requires considerable further analysis before its implications for the health of the network can be understood.
At the highest level, network management systems take a wider view of a network and provide a number of network management services such as fault management, configuration management, accounting, performance analysis, security and resource management. These systems generally comprise distributed data gatherers located at various key points around the network and one or more centralized management stations for receiving and analyzing data on network operation from the data gatherers. One such system is the ANM (Automated Network Management) system described in the article "ANM: Automated Network Management System" by M Feridun, M Leib, M Nodine and J Ong, IEEE Network, March 1988- Vol 2, No 2. In the ANM system, network entities, such as gateways, provide data to a backbone of Distributed Management Modules (DMMs) which service `Clients` that provide the network management services referred to above. Clients request and receive raw data collected from network entities by the DMM backbone and can also request the DMM backbone to execute specific actions. Specialized Clients can be provided such as a fault management Client that detects, diagnoses and recovers from network faults.
Because of the complexity of the network management task, the ANM system uses artificial intelligence techniques to represent and organize its network expertise, invoke relevant network analyses and annotate its reasoning to support later explanations to the network operator. More particularly, a Client called the Intelligent Network Manager is provided with a collection of expert systems (Experts) organized as a top-level Expert which forwards triggering data received from the network entities to other Experts that each understand a specific kind of network problem. These Experts, in turn, may suggest possible hypotheses that might explain the triggering data. If necessary, each Expert may request additional data from network entities. When Experts suggest, confirm or reject hypotheses, the network operator is informed. However, to add expertise about a new type of network problem to the Intelligent Network Manager of ANM, a new Expert must be added to the system, and to change the way the system reasons about problems, all Experts conducting such reasoning must be changed. It will be appreciated that network management systems such as the ANM system are conceived on a much larger scale and require considerably greater investment than instruments such as protocol analyzers intended for localized use. These two approaches to network fault analysis thus are largely complimentary rather than competitive.
Accordingly, it is an object of the present invention to provide an improved network analysis system which eases the problems of interpretation of collected data and which can be implemented as a portable instrument, at least in its less complex embodiments.