Condition monitoring is used for detecting anomalies for machinery or industrial processes to avoid economic losses, such as due to machine failures which can cause accidents, injuries and/or environmental consequences. Condition monitoring is known for generating alerts in the form of alarms (e.g., blinking lights) responsive to the detection of a fault or disturbance in an industrial processing facility (IPF), sometime referred to as a plant, to alert an operator generally working in a control room something urgent or abnormal is currently happening before a critical event (e.g., a machine failure) occurs. An abnormal situation is any unexpected event or situation that confronts the operator during the course of his/her duties that causes the plant operation to be upset or disturbed to a point of concern. A conventional plant control system cannot generally address a disturbance or it may fail to do so, in which case operator intervention to take corrective action is then needed. Alerts should be set to provide sufficient time to allow an operator to take the corrective action, and the number of alarms should not be too low referred to as being ‘silent’, or be too high referred to as being ‘chattering’ (or ‘fleeting’).
Condition monitoring is usually implemented via an algorithm applied in real-time that compares one or more variables comprising real-time sensor values (generally an actual process variable) to a low and/or a high threshold value warning limits. Most condition monitoring technologies and products for heuristic rules (e.g., rules by engineering insight or by trial and error) or data-driven rules require some user tuning of the alarm rules, being at least one of upper and lower threshold warning limits, that when crossed for a minimum period of time (to prevent chattering) generate automatic alerts.
For example, if a furnace process temperature T1 is above 500° C. (a high threshold process limit) for a duration of more than 4 minutes, then an alarm may be raised to an operator. Accordingly, there are at least 2 alarm tuning parameters that need to be set, comprising a threshold limit(s), and at least one time delay (also called alarm delay) that generally includes an “ON-delay” which waits for the threshold limit to be exceeded for an on-delay time before switching the alarm state to “ON”, and one “off-delay” which waits for the threshold limit to be not exceeded for an off-delay time before switching the alarm state from ON back to “OFF”. There are generally always both of these delays. Thus the ON-delay time governs the change from OFF to ON of an alarm state, and the OFF-delay time governs the change from ON to OFF of the alarm state.
Conventional alarm tuning techniques usually involve taking a single data ‘silo’ that comprises a single isolated set of data of historical alarm data and analyzing it. Alarm data is a set of text messages generated by the distributed control system (DCS) and stored in alarm log. When a process value (abbreviated PV) exceeds one of its predetermined thresholds, an alarm message is generated. Usually an alarm message contains several fields of information: time stamp, namely, the time instant when the message is generated, tag name, tag identifier, e.g., ‘PVHI’, ‘PVLO’, ‘OFFNORM’, and some other information such as the priority, the value of the process variable, the trip point and so on. The tag name plus tag identifier reflects what type of alarm occurs, and the time stamp reflects the time when the alarm occurs. Two silos of data refers to two data sets that are typically not saved together or otherwise integrated with each other.
In some more advanced alarm tuning techniques this may involve creating a histogram of a function of this data (e.g. duration of the alerts, or the time between alerts), and then increasing the alarm delay time (either the ON-delay time, OFF-delay time, or both the ON- and OFF-delay times) setting in order to reduce the amount of chattering alarms. It is understood this advanced alarm tuning method comes with the cost of slowing the operator alarm response time for real alarm-worthy events. For example, if this activity is the performance of a process variable over a year's worth of data for a chattering alarm, there can easily be several thousand alarm events in that database.