The present invention relates to methods and systems for problem-alert aggregation and identifying sub-optimal behavior.
Predictive maintenance and failure detection are critical in many industries in which unpredicted problems may be costly, involving a host of adverse results including monetary loss, operational downtime, equipment loss, property damage, penalties, compensation, and sometimes even human fatality. To prevent such damages, many industrial plants install sensors to help monitor factory production and its processes. Machine-learning algorithms process the readings of such sensors, and alert the maintenance or security team when suspicious occurrences happen. Such alert-producing, machine-learning or data-mining algorithms vary, and include, for example, simple single-sensor threshold-crossing alerts, problem-specific alert scripts, problem-specific pattern-detection or likelihood-learning, trend-detection algorithms, prediction-deviation algorithms, deep-learning algorithms, and other anomaly-detection algorithms.
Due to the nature of complex systems, which include a large number of possible normal system states, and the fact that each alert can contain a large number of sensors, such algorithms typically produce false alarms with misdetection rates that can be mitigated with threshold settings, albeit with trade-offs. If the thresholds are set too high, there are fewer alerts, resulting however in a crisis possibly going unalerted. If the thresholds are set too low, there is an excessive number of alerts, resulting in the maintenance or security team, which tends to the alerts, commonly failing to investigate all the alerts, and possibly ignoring some or most of the alerts. As a result, most of the problems are not detected.
Once an alert is produced, it is passed to the maintenance or security team to investigate the cause. Since the alerts are the product of machine-learning algorithms, the alerts are expressed in machine-learning input terms, which are usually based on the headers of the columns in the monitoring system's database, or the names of the sensors that produce the alerts.
In order to investigate an alert, a maintenance team needs to identify the purpose and indications of the sensor or sensors that triggered the alert, distinguish the underlying relation between those sensors, and determine the real cause of the alert, which can often be very far from the identified sensors due the interconnectivity of such equipment/components. In complex industrial plants, the facility is often too big, and encompasses too many interlaced sub-systems, for the maintenance team to memorize the purpose and relations of each sensor to the other sensors in the alert, requiring additional reference information, sensor-layout diagrams, and facility experts to be involved in order to understand the nature and severity of the alert. Such additional resources are not always available.
Large facilities typically have robust process systems that are very reliable, meaning that failure of facility equipment (e.g., a boiler) is rare. Current alert systems characteristically have a high alert rate, usually hundreds per day. A maintenance team doesn't have the capacity to thoroughly investigate all alerts. Such facilities frequently employ high maintenance schedules and redundancy policies, which enable low workloads and alternative production procedures in case of machinery malfunction or failure. An alert-investigating team, which is aware of the low rate of real problems, may develop a tendency to ignore and dismiss alerts.
Present-day problem-detection systems currently have a detection rate in typical factories of around 2-3% of the actual problems. Complex critical plants cannot rely on existing problem-alert systems. Most complex plants employ system redundancy, and schedule redundant maintenance procedures, which increase production costs. In practice, there is no good solution to component-failure identification in complex plants.
Current problem-predicting systems produce the following.                1. Too many and/or unreliable alerts (e.g., often hundreds a day) are triggered, frequently designated as false alarms—misdetection rates make available general-purpose, problem-alert systems practically useless.        2. Alerts are generated that are associated with tags and text-like descriptors attached to the columns in the relevant database, which are not indicative of the problem to an investigating team, resulting in each alert requiring a complicated manual investigation in order to identify the problem, its location, and the cause of the alert—such investigations often fail to produce conclusions.        
Most problems in complexes using present-day problem-detection systems are not prevented due to such poor detection rates and mislabeling as false alarms. Factory personnel frequently do not trust their own detection systems for early problem detection, or describe such systems as impractical due to the large number of alerts produced, forcing facility managers to institute redundant maintenance procedures in production facilities.
Modeling languages enable a user to model a machine, plant, factory, or system. Such modeling in current modeling languages requires an expert in the modeling language. Modeling languages typically have no stopping criteria (i.e., the ability to identify when the model has been completely represented), which further increases modeling time and complexity. Including a sensor as part of the model of the facility is possible in existing modeling languages, but requires a very detailed level of modeling, which is time-consuming. Moreover, sensors measure an attribute (i.e., property) of a part or process. Examples of such attributes include weight, importance, temperature, and pressure. Modeling sensors or data columns as parts of a component (e.g., an engine) don't capture their true function.
A possible solution is to associate metadata with the sensors in order to describe the attributes measured by the sensor. Such a solution is problematic because such attributes are not an integrated part of the model. Other issues of existing modeling languages are the expertise needed for performing the modeling, the modeling complexity, the lack of stopping criteria, and the inability to automate model queries.
It would be desirable to have methods and systems for problem-alert aggregation and identifying sub-optimal behavior in assets. Such methods and systems would, inter alia, overcome the various limitations mentioned above.