All large as well as small-scale systems, such as manufacturing units, processing units, nuclear plants, computer networks, etc., are generally provided with a failure monitoring unit. The failure monitoring unit ensures the safe operation of the system being monitored, by generating alarms corresponding to each failure detected by it. The failure monitoring unit is coupled to an alarm system, which displays the generated alarms, thereby informing a user about the occurrence of a failure in the monitored system.
The failure monitoring unit generates alarms corresponding to failures in the monitored system by using pre-defined criteria, such as a process variable exceeding its threshold value. Process variables include the parameters upon which the safe working of the monitored system depends. For example, in a steel manufacturing unit, the process variables include temperature and pressure inside a reactor; in a computer network the variables may include time out parameter applicable to the network, etc. Very often, alarms are generated due to a slight deviation of the process variables from a desired value, and do not correspond to a critical system failure. These alarms are termed as nuisance alarms. Most conventional alarm systems display even the nuisance alarms.
In other situations, a critical failure, which affects many parts of the monitored system, leads to the generation of a very large number of alarms. Most of these alarms are generated due to the same root failure and correspond to the various degrees of the failure and the process variables involved in the failure. Hence, generally, the alarms generated by the failure monitoring unit corresponding to failures in the monitored system are related. This is because a particular alarm is either a cause or an effect of a previously generated alarm, thereby, making most of the generated alarms redundant.
For example, consider that at an instant a first alarm is generated corresponding to the failure of a node in a LAN network, and at the next instant a second alarm is generated corresponding to the failure of a peripheral device attached to the node. In this case the second alarm corresponds to an effect of the previously generated first alarm because the failure of the peripheral device is a direct consequence of the failure of the node. Hence, the second alarm is a redundant alarm. In other words, the first alarm is a cause of the second alarm, thereby making the two alarms related to each other.
Hence, the failure-monitoring unit generates a large number of alarms corresponding to a sequence of failures in the monitored system. If the alarm system displays all the generated alarms, it becomes difficult for a user managing the monitored system to experimentally judge and locate the important alarms corresponding to a root failure from the large number of alarms being displayed at a particular instant. Hence, the user is unable to react to the root failure that is causing the alarms being displayed, and the alarm system is rendered useless.
Conventionally, various alarm systems have been designed with the objective of suppressing redundant alarms, thereby preventing them from being displayed.
U.S. Pat. No. 5,581,242 titled ‘Automatic alarm display processing system in plant’ relates to a method for automatically selecting important alarms from the alarms generated in a plant operation monitoring system. The method involves suppression of an alarm if the conditions for alarm suppression are met. The conditions for alarm suppression are stored in a causal table.
U.S. Pat. No. 6,594,236 titled ‘Alarm suppressing method for optical transmission apparatus’ relates to an alarm suppressing method for an optical transmission apparatus. The method involves an analysis of the relation between a root alarm and the subsequent alarms, which are generated subsequent to the root alarm and are referred to as, propagation alarms.
There are certain limitations associated with the prior art alarm systems and methods. Some of these alarm systems use hard coded alarm suppression rules. The alarm systems use these rules to decide which of the related alarms generated by the failure monitoring units are redundant. The alarm systems then suppress the redundant alarms and display only the non-redundant ones. These alarm suppression rules are generally based on an alarm source containment hierarchy. The alarm source containment hierarchy is a previously made list of all the possible failures that can occur in the monitored system along with the causes and effects of the failures. In accordance with the alarm suppression rules, the alarm systems suppress the alarms corresponding to the effects of a failure. Hence, these alarm systems succeed in preventing a large number of related alarms from being displayed. However, the hard coded alarm suppression rules may hide the concurrent occurrence of multiple failures in the monitored system. This may lead to no alarm being displayed, corresponding to some of the concurrently occurring unrelated failures. In addition, the operation of conventional alarm systems is not flexible, and setting them up and maintaining them is cumbersome.
Another method used to suppress redundant alarms involves the suppression of the alarms on the basis of the ‘time stamps’ associated with them. All the alarms occurring at a later time, with respect to a base alarm, are suppressed. However, the alarm systems employing this method of alarm suppression require a very high time resolution and highly synchronised system clocks. In many situations, this method also, may hide the concurrent occurrence of multiple failures in the monitored system. In general, these alarm systems are prone to synchronisation inaccuracies and delays. In addition, it is difficult to conclude causality from the time sequence of alarms generated in the monitored system.
Hence, the existing alarm systems are not capable of dynamically suppressing redundant alarms concurrently with their generation in the monitored system. These alarm systems are also not capable of displaying non-redundant alarms without a significant time lag between the generation of the alarm and the presentation of the alarm. In addition, the existing alarm systems are not capable of preventing the suppression of an alarm, corresponding to a critical failure in a situation where multiple failures occur concurrently in the monitored system.