1. Field of the Invention
This invention relates generally to the field of artificial intelligence, and more particularly to an expert system interfaced to, or forming a part of, a data communications network management system which automates network alarm handling and assists the network operator in isolating network problems.
2. Background of the Invention
Traditionally, data communications network management systems have concentrated on providing a set of fault isolation and test functions that allow an operator to locate, diagnose and isolate network problems.
Network problems are often expressed by the target network devices or objects (e.g. modems, multiplexers, etc. in the data communication environment) in the form of alarms or other error messages. Alarms can generally be considered events reported by target network devices when abnormal conditions exist. In some networks, alarms are generated autonomously while in others the alarms are actually responses to queries (polls). Although perhaps the former is more appropriately referred to as an alarm, both will be referred to as alarms for purposes of this document. Upon receiving the alarms from the network, the network management system displays the alarms on the operator's console. One of the network operator's responsibilities is to interpret the alarm and then isolate and resolve the problem associated with the alarm in the shortest time span. The operator then uses a series of test procedures to determine the exact cause of the problem. Once found, he may take remedial actions (such as calling for repair or switching in redundant equipment) and then move on to the next alarm.
Sometimes the operator may have difficulty in keeping up with the alarms since a single problem may result in many alarms from affected target network devices (network objects). In such cases, often the operator either ignores them, or just waits until a complaint call arrives. Furthermore, due to the different levels of network operators' experience in dealing with network faults, the problem could get further complicated because of wrong decisions in attempting to diagnose the problem and more time than necessary may be taken to solve the original problem. Such delays can be costly in large networks which are heavily relied upon to quickly move vast amounts of data in short periods of time to carry out the normal course of business. For example, large financial institutions rely upon such systems to move large sums of money electronically. Loss of that ability even for a relatively short period of time may be very costly to the institution. Similarly, airlines rely upon such systems to track passenger reservations and loss of that ability can result in fight delays or cancellations and loss of customers.
In a typical network management environment, a heterogeneous array of switching and transmission equipment may produce hundreds of alarms each day. Moreover, alarms are sometimes spurious, transient, redundant, time correlated, or too numerous to be handled at the same time. This makes a network fault diagnosis task a complex problem where considerable experience is required to interpret and isolate network faults.
Some experienced (expert) network operators acquire or develop strategies and "rules of thumb" in diagnosing networks. It is desirable to encode such knowledge into a knowledge base and make the best expert assistant available at all times, and at all locations. Ultimately, the benefits of routine use of such a system (called an expert system) include reduced operational cost, less down time, increased network performance, more effective fault management in the network, and the ability to build and effectively manage bigger networks.
A major difficulty with typical expert systems is the bottleneck encountered in acquiring knowledge from the expert. The job of a knowledge engineer is to act as an agent, or go-between to help a domain expert build a knowledge-based system. This task usually involves time consuming interviews, lengthy documentation and refinement, and transformation of the acquired knowledge into Artificial Intelligence (AI) based languages or representations. Often, the knowledge engineer and domain expert must work together to debug, extend, and refine the system iteratively. This is usually attributable to the fact that the knowledge engineer has far less domain knowledge than the expert and the expert has far less knowledge about artificial intelligence than the knowledge engineer. Such communication gaps constantly impede the progress and the process of transferring domain expertise into a knowledge-based system. Ultimately, this may lead either to a long development cycle or a failing system. To further complicate the matter, providing expert information is a continuing need in data communications networks since the networks tend to expand and become larger and more complex while adding new and different equipment as time goes on. With this evolution of the network comes an evolution of the products connected to the network (e.g. analog modems to digital-devices) and with it a change in the knowledge required to diagnose the network.
A second problem with typical expert systems is that as the complexity of the application domain increases, the classical rule-based system is not adequate. Knowledge management (knowledge acquisition, validation, and maintenance) is also a serious problem when the rule-based system evolves to a certain size. It has been claimed (see Buchanan and Short life, 1984, Rule-Based Expert Systems, Addison-Wesley Publishing Company; or Hayes-Roth Fredrick, 1985, "Rule-Based System", Communications of ACM) that the benefit of the rule approach is the ease of modification and extension of the system because rules can be added independently at any time. However, more recent articles (see Brug, A. Bachant, J. McDermott, J., FALL 1986, "The Taming of RI", IEEE EXPERT; or Jackson, P. 1986, Introduction to Expert Systems, International Computer Science Series; or Rauch-Hindin, W. 1987, Artificial Intelligence in Business, Science, and Industry, Vol 1 & 2, Prentice-Hall) have proven in many cases that this is not true for medium to large systems such as large data communication networks.
For medium to large diagnostic systems, the rule-based approach has suffered from at least the following problems:
--lack of methodology; PA1 --need for knowledge engineers to transfer knowledge into rules; PA1 --difficult to control program behavior; PA1 --limited generic processing; PA1 --unanticipated rule interactions during rule updates; and PA1 --systems with a large number of rules are difficult to manage, validate and maintain. PA1 --domain knowledge is transparent and explicit; PA1 --knowledge acquisition is simplified; PA1 --flow-chart browsing can be used to examine the relations among objects in a more systematic manner; PA1 --flow-chart Inference Engine is completely separated from the flow-chart knowledge bases; PA1 --inference processing is quick and effective due to its deterministic nature of the flow-chart representation; PA1 --facilitates fast incremental knowledge acquisition and verification cycle; and PA1 --reduced risk in knowledge maintenance. PA1 --lack of formal methodology and knowledge structuring; PA1 --lack of goal (hypothesis) directed reasoning capability; PA1 --lack of top-down problem decomposition methodology; PA1 --state of the world is often not adequately represented; PA1 --incomplete and unreliable heuristic knowledge cannot be fully captured and expressed; and PA1 --monotonic reasoning is inadequate for large diagnostics systems.
One alternative to the problems with traditional rule-based expert systems is flow-chart-based knowledge representation. In the flow-chart knowledge representation scheme, the domain knowledge base is simply represented as decision-trees (or flow-charts), similar to the way that many repair manuals are designed. Each decision node in the flow-chart is represented by an object--schema (data structure plus its associated procedures with inheritance). Node objects represent tests, and arcs represent the outcomes of tests leading to the next node object. A separate Inference Engine is constructed to reason through and traverse among flow-chart nodes. This flow-chart approach is particularly attractive in its knowledge acquisition capability. The domain expert can enter his domain knowledge directly into the system by simply manipulating the flow-chart objects by filling in predefined schematic forms.
The following merits are experienced by using the flow-chart knowledge representation in capturing the domain knowledge:
However, with the pure flow-chart-based knowledge representation scheme, there are still some deficiencies that have been realized in the course of capturing domain knowledge, such as:
The present invention ameliorates these difficulties in an expert system with advantages such as an enhanced User Interface, Inference Engine and knowledge representation as described below.