The present invention relates to methods of processing data from communications networks, systems for processing data from communications networks, methods of diagnosing causes of events in complex systems, methods of acquiring knowledge for knowledge based reasoning capacity for the above methods, methods of extending compilers for such knowledge based reasoning capacity, and methods and systems for using such extended compilers.
In complex systems such as communication networks, events which can affect the performance of the network need to be monitored. Such events may involve faults occurring in the hardware or software of the system, or excessive demand causing the quality of service to drop. For the example of communication networks, management centres are provided to monitor events in the network. As such networks increase in complexity, automated event handling systems have become necessary. Existing communication networks can produce 25,000 alarms a day, and at any time there may be hundreds of thousands of alarms which have not been resolved.
With complex communication systems, there are too many devices for them to be individually monitored by any central monitoring system. Accordingly, the monitoring system, or operator, normally only receives a stream of relatively high level events. Furthermore, it is not possible to provide diagnostic equipment at every level, to enable the cause of each event to be determined locally.
Accordingly, alarm correlator systems are known, as shown in FIG. 1 for receiving a stream of events from a network, and deducing a cause of each event, so that the operator sees a stream of problems in the sense of originating causes of the events output by the network.
The alarm correlator shown in FIG. 1 uses network data in the form of a virtual network model to enable it to deduce the causes of the events output by the network. Before the operation of known alarm correlator systems is discussed, some details of how alarms are handled within the network will be given, with reference to FIG. 2. Several layers of alarm filtering or masking can occur in between a device raising an event, and news of this event reaching a central system manager. At the hardware element (HE) level, the system would be overwhelmed, and performance destroyed if every signal raised by hardware elements were to be forwarded unaltered to higher layers. Masking is used to reduce this flood of data. Some of the signals are always suppressed, others delayed for a time to see if a higher criticality signal arises, and suppressed if such a signal has already been sent.
Some control functions may be too time critical to be handled by standard management processes. Accordingly, either at the hardware element level, or a higher level, some real time control may be provided, to respond to alarms. Such real time control (RTC) has a side effect of performing alarm filtering. For example, a group of alarms indicating card failure, may cause the real time controller to switch from a main card to a spare card, triggering further state change modifications at the hardware element level. All this information may be signalled to higher levels in a single message from the RTC indicating that a failure and a handover has occurred. Such information can reach the operator in a form indicating that the main card needs to be replaced, an operation which normally involves maintenance staff input.
A node system manager may be provided as shown in FIG. 2, to give some alarm filtering and alarm correlation functions. Advanced correlation and restoration functions may be located here, or at the network system management level.
In one known alarm correlation system, shown in U.S. Pat. No. 5,309,448 (Bouloutas et al), the problem of many alarms being generated from the same basic problem is described. This is because many devices rely on other devices for their operation, and because alarm messages will usually describe the symptom of the fault rather than whether it exists within a device or as a result of an interface with another device.
FIG. 3 shows how this known system addresses this problem. A fault location is assigned relative to a device, for each alarm. A set of possible fault locations for each alarm is identified, with reference to a stored network topology.
Then the different sets of possible fault locations are correlated with each other to create a minimum number of possible incidents consistent with the alarms. Each incident is individually managed, to keep it updated, and the results are presented to an operator.
Each of the relative fault locations are internal, upstream, downstream, or external. The method does not go beyond illustrating the minimum number of faults which relate to the alarms, and therefore its effectiveness falls away if multiple faults arise in the selected set, which is more likely to happen in more complex systems.
Another expert system is shown in U.S. Pat. No. 5,159,685 (Kung). This will be described with reference to FIG. 4. Alarms from a network manager 41 are received and queued by an event manager 42. After filtering by an alarm filter 43, alarms which are ready for processing are posted to a queue referred to as a bulletin board 44, and the alarms are referred to as goals. A controller 45 determines which of the goals has the highest priority. An inference engine 46 uses information from an expert knowledge base 47 to solve the goal and find the cause of the alarm by a process of instantiation. This involves instantiating a goal tree for each goal by following rules in the form of hypothesis trees stored in the expert knowledge base. Reference may also be made to network structure knowledge in a network structure knowledge base 48. This contains information about the interconnection of a network components.
The inference process will be described with reference to FIG. 5. First a knowledge source is selected according to alarm type. The knowledge source is the particular hypothesis tree. Hypothesis trees, otherwise known as goal trees are stored for each type of alarm.
At step 51 the goal tree for the alarm is instantiated, by replacing variables with facts, and by executing procedures/rules in the goal tree as shown in step 52. If the problem diagnosis is confirmed, the operator is informed. Otherwise other branches of the goal tree may be tried, further events awaited, and the operator kept informed as shown in steps 53 to 56.
This inference process relies on specific knowledge having been accumulated in the expert knowledge base. The document describes a knowledge acquisition mode of operation. This can of course be an extremely labour intensive operation and there may be great difficulties in keeping a large expert knowledge base up to date.
A further known system will be described with reference to FIG. 6. U.S. Pat. No. 5,261,044 (Dev et al) and two related patents by the same inventor, U.S. Pat. No. 5,295,244, and U.S. Pat. No. 5,504,921, show a network management system which contains a model of the real network. This model, or virtual network includes models of devices, higher level entities such as rooms, and relationships between such entities.
As shown in FIG. 6, a room model 61 may include attribute objects 62, and inference handler objects 63. Device models 64, 65, may also include attribute objects 66, 67 and inference handler objects 68, 69. Objects representing, relationships between entities are also illustrated. The device models are linked by a xe2x80x9cis connected toxe2x80x9d relationship object 70, and the device models are linked to the room model by xe2x80x9ccontainsxe2x80x9d relationship objects 71, 72.
The network management system regularly polls all its devices to obtain their device-determined state. The resulting data arrives at the device object in the virtual model, which passes the event to an inference handler attached to it. An inference handler may change an attribute of the device object, which can raise an event which fires another inference handler in the same or an adjacent model.
The use of object orientated techniques enables new device models to be added, and new relationships to be incorporated, and therefore eases the burden of developing and maintaining the system.
However, to develop alarm correlation rules for each device, it is necessary to know both what other devices are linked to the first device, and also how the other devices work. Accordingly, developing and maintaining the virtual network model can become a complex task, as further new devices, new connections, or new alarm correlation rules are added.
The invention addresses the above problems.
According to a first aspect of the invention there is provided a method of operating a communications network comprising a plurality of network entities, having predetermined states of operation the method comprising the step of creating an object associated with a given state of one of the entities, the object comprising knowledge based reasoning capability for determining whether the entity is in the given state, and the method further comprising the steps of:
passing data about the network to the object; and
inferring whether the entity is in the given state using the reasoning capability.
By creating an object associated with a given state of one of the entities, a number of advantages arise. Firstly, the object oriented feature of encapsulation limits the amount of communication to that which is relevant, which can increase the speed of correlation. Furthermore, separation of problem modelling allows for improved reuse of code across different devices. A problem object can undertake relatively complex tasks such as launching tests, verifying complex conditions, and controlling recovery behaviour which would be difficult to do by combining rules without the problem oriented structure.
Advantageously the given state is a fault state. The data about the network may comprise alarms or other events relating to abnormal or undesired operation of the network. The example of alarm correlation is particularly valuable in communication networks where alarms are unlikely to be sufficiently detailed to isolate the problem which originally caused the alarm.
Advantageously a plurality of objects are created associated with different states, and messages are passed between the objects as part of the inference process. Message based reasoning makes distribution of processing easier, which facilitates scaling to handle a wide range of network sizes, topologies, and real time requirements.
Advantageously the object creation step is triggered by an event notified by the network, and the given state is a possible cause of the event, or a possible consequence of the event.
Advantageously the reasoning capability comprises rules grouped according to the class of messages they can process. This structuring of knowledge ensures fast alarm correlation. Groups of rules may be defined for both problem classes and problem instances.
Advantageously the reasoning capability comprises rules for translating events notified by the network into a degradation of a service received or offered by the associated entity from or to other entities. This enables particular efficient reasoning, since service information expresses precisely how the operations of the entity are inter dependent, which enables causes and consequences to be determined and propagated quickly.
Advantageously such service degradation information is passed to other objects associated with the same or the other entities.
Advantageously two or more of said objects are created and the inference steps for each are carried out in parallel in threads sharing a common knowledge base. This may be done using separate processors, and enables the processing to be distributed to suit performance requirements.
Advantageously knowledge bases are built up for separate parts of a network, and the method of claim 1 is carried out in parallel on the separate parts. The inference step may be carried out using respective ones of the knowledge bases and messages are passed from one object in one knowledge base to a connected object in an other, transparently. This is another way of distributing the processing, to scale the solution as required.
Advantageously, a plurality of objects are created in one of the knowledge bases and the inference steps for each of the objects are carried out in parallel, in threads, wherein messages passed from these objects contain a reference to the thread in which they were processed. This enables the messages to be returned to the correct thread.
According to another aspect of the invention, there is provided a system arranged to operate a communications network as set out above.
According to another aspect of the invention, there is provided a method of acquiring knowledge for the knowledge based reasoning capacity for the method of claim 1, comprising the step of creating rules for translating events notified by the network relating to the associated entity, into a degradation of a service offered by the associated entity to other entities.
According to another aspect of the invention, there is provided a method of processing data from a communications network, comprising the steps of:
implementing classes corresponding to given states of network entities wherein each class comprises a static and dynamic part, the dynamic part connecting instances of each class to rules which provide their reasoning capacity, whereby the dynamic part held by the static part can be changed while a system using these classes for its operation is running thus changing the behaviour of future instances.
This facilitates updating and maintaining the rules.
Advantageously the rule implementation referenced by the dynamic part can be changed. This enables the behaviour of existing instances to be changed. Advantageously the rules reference by the dynamic part are compiled rules with their source code, rather than rule source which requires interpreting. This speeds up the operation considerably.
Advantageously the method further comprises the step of compiling the rules using an extended compiler for an object oriented language, extended to compile rule constructs, wherein all the standard constructs of the language can be embedded in the rule constructs, and wherein the rule constructs comprise sets of arrangements of conditions and sets of sequences of actions that have an arbitrarily complex logical dependency on the sets of conditions. The encoding of rules directly in the OO language of implementation avoids the xe2x80x9cimpedance mismatchxe2x80x9d problem. (Impedance mismatch is a classical problem arising from the clash between the data modelling styles of two paradigms, in this case 00 and KBS. This clash imposes a high cost of translation, both in performance when running the system, and in code maintenance when coding the translation between modelling styles.)
Advantageously the data comprises notification of an event in the network and the rules are for determining the cause of the event.
According to another aspect of the invention there is provided a method of processing data from a communications network, comprising the step of:
applying a knowledge based reasoning capability to interpret the data, wherein the reasoning capability comprises a hierarchy of rulebases, the hierarchy being arranged to have inheritance properties, such that the method further comprises the steps of;
determining whether a named rule is in one of the rulebases, and, where it is not present, making available the same named rule from a rule base higher in the hierarchy; and
applying the named rule to the data.
An inheritance hierarchy means that technology specific rule bases and product specific rule bases can be provided. This means supplier provided rule bases can be updated without overwriting customer specific rules at a lower level of the hierarchy.
According to another aspect of the invention there is provided a method of processing data from a communications network comprising the step of:
applying a knowledge based reasoning capability to interpret the data wherein the reasoning capability comprises one or more rulebases, comprising rules encoded directly in an object orientated language, by specialising selected classes of an object oriented compiler so extending its functionality that it compiles rules and standard code.
This enables a class library and other object oriented applications to be available not merely within the rules, but also when writing, compiling and testing them. Specialising a limited number or a minimum number of selected classes means that large parts of the compiler remain identical in their implementation. Thus many ancilliary tools will continue to interwork with the new compiler.
Advantageously the compiler is a Smalltalk compiler. Advantageously the method comprises the step of applying the reasoning to determine the cause of events notified by the network.
According to another aspect of the invention there is provided a method of extending a compiler for an object oriented language, to compile rule constructs, wherein all the standard constructs of the language can be embedded in the rule constructs, and wherein the rule constructs comprise sets of arrangements of conditions and sets of sequences of actions that have an arbitrarily complex logical dependency on the sets of conditions.
Advantageously, the rule constructs may have any other data and behaviour defined in the language. This enables names and references to the context of the rule, or variables, to be included in the rules. This can further simplify the rules, and ease maintenance.
According to another aspect of the invention there is provided a system comprising a processor arranged to use a compiler extended according to the above method of extending a compiler.
Preferred features may be combined and may be combined with any of the aspects of the invention, as appropriate, as would be apparent to a skilled person.