The recent availability of very inexpensive sensors has resulted in an explosion of real-time, operational machine data. The analysis of data from such sensor sources can be important in a variety of contexts. For instance, it is desirable to analyze sensor data to look for anomalies in how medicinal tablets are sorted to help reduce the likelihood of cross-contamination between different types of medicines, credit card purchases to identify potential fraudulent activities, temperature and/or humidity readings to ensure that foodstuffs being shipped are not likely to spoil during transit, etc.
Similarly, in what is sometimes called the Internet-of-Things (IoT), machine data oftentimes is used for monitoring the health and condition of the machinery in order to allow for faster, more efficient maintenance. The IoT concept is based on the idea of “everything” being connected, especially when it comes to uniquely identifiable embedded computing like devices within the existing Internet infrastructure. Just as mobile devices are connected, the IoT industry posits that (otherwise) ordinary, everyday consumer products and infrastructure, such as cars, refrigerators, homes, roads, human health sensors, etc., soon will be interconnected. In brief, the IoT is expected to offer advanced connectivity of devices, systems, and services that goes beyond machine-to-machine communications, while covering a variety of protocols, domains, and applications.
It will be appreciated that there is a vast number of potential data producers, and that the data produced may be generated quickly and in large amounts, and may change frequently. As a result, in the IoT and other contexts, it would be desirable to be able to evaluate streaming data as the sensors send it, e.g., so that deteriorating equipment can be identified and problems addressed before catastrophic or other failures occur. Indeed, streaming data typically includes a payload from a sensor or the like along with a timestamp and typically cannot be stored because of the high volume and rate of transmission and thus must be analyzed on-the-fly. In other words, it will be appreciated that it would be advantageous to detect anomalies in streaming sensor data.
A variety of anomaly detection approaches have developed over time. For example, some early approaches used simple thresholds for individual sensors and raised an alarm if any of those thresholds were exceeded. This is typically called univariate analysis.
Such techniques can be improved by incorporating different models for different operational states (e.g., accelerating or decelerating), but the detection approach basically still relies upon the deviation of a key parameter. Although this approach works adequately for some failure modes, a degradation scenario oftentimes is much more complex and is difficult to discover without a simultaneously performed analysis of data from multiple sensors, e.g., in accordance with what oftentimes is called multivariate analysis.
Current multivariate analysis approaches may be thought of as falling into one of two groups, categorized by how they use machine learning to create the anomaly detection model, namely, supervised and unsupervised approaches. Supervised learning approaches generally require a knowledgebase of existing, known failures and the sensor data readings surrounding the time of the failure. With this technique, one can use a data set of labeled/classified (i.e., normal or anomalous) sensor readings and train a predictive model to recognize the difference between the two. The resulting model can then be used to predict the classification of current instances of sensor readings.
An example supervised learning approach may, for instance, use various multivariate clustering algorithms that are trained on learning data sets in order to classify later observations as normal or anomalous. Another example may, for instance, use case-based reasoning to determine and explain what type of fault exists, with the anomaly detection algorithm itself relying on previously prepared sets of learning data. In such cases, multiple learning data sets may be created to cover the operational scenarios of old equipment, operating conditions, etc. Thus, it will be appreciated that a machine learning algorithm uses supervised learning if the creation of the model requires the use of training data including example inputs and labeled outputs.
By contrast, unsupervised learning approaches do not require a knowledgebase of known problems and thus may be able to detect problems never seen before. The predictive models still need to be trained, but the learning data can merely include sensor data collected under normal operating conditions. Thus, it will be appreciated that a machine learning algorithm uses unsupervised learning if it is able to learn something about the structure of the training data without labeled outputs.
Unsupervised learning and supervised learning approaches have been combined into what is sometimes called semi-supervised learning. However, in general, an assumption is made that the system primarily is performing (a) supervised learning, with the use of additional unlabeled data to increase performance, or (b) unsupervised learning, with the use of labeled data to impose additional constraints. Thus, generally speaking, these approaches assume a preexisting, labeled training data set (along with unlabeled data).
The k-means algorithm is a popular unsupervised learning approach that can also be adapted for supervised learning. Its chief weakness, however, is that the number of clusters k must be known upfront. In the academic world, there has been some research into creation of a streaming version of the algorithm; e.g., a version that can learn from data as it is continuously received. Streaming algorithms are difficult to develop, however, as they typically need to be able to deal with the practical limits of how much data can be stored in memory and the fact that, once the data is released, there is no practical way to get it back.
One example streaming version of k-means estimates the clusters using samples of the data, although the number of clusters k must be known in advance. Another example approach uses a two-step, mini-batch method where the results of the first step must be stored until it is time to run the second step as a batch. In this case, the size of the batch must be determined by the user and, again, the number of clusters k must be known in advance.
It will be appreciated that both supervised and unsupervised learning approaches can suffer from the problem of concept drift, where the normal operating parameters change over time. Concepts are often not always stable in the real world. For instance, weather prediction rules and customers' preferences oftentimes change over time. The underlying data distribution may change, as well. Thus, through naturally occurring changes, a model built on old data may become inconsistent with the new data, and/or old concepts may become inconsistent with new concepts. Updating of the model thus may be necessary. The problem of concept drift, therefore complicates the task of learning.
As noted above, univariate analysis works adequately for some failures, but often the degradation scenario is much more complex and requires the simultaneous analysis of data from multiple sensors. Also, multivariate analysis typically can detect emerging problems earlier than single-sensor thresholds, since the latter is often not detected until a component failure has already occurred.
Multivariate supervised learning approaches typically require a knowledgebase of existing, known failures and the sensor data readings surrounding the time of the failure. Such knowledgebase can be expensive and time consuming to create, and these approaches typically only capture problems that have been seen before.
Multivariate unsupervised learning approaches do not require such a knowledgebase and can detect new problems, but they oftentimes suffer from false alarms being erroneously generated (e.g., as a result of detecting a rare, but not necessarily problematic, event). They also generally cannot provide any prescriptive aid to maintenance operators. For instance, it is oftentimes difficult or impossible to provide information, such as likely causes of the event, best course of action to remediate the problem, etc.
A disadvantage of existing supervised, unsupervised, and semi-supervised learning approaches, especially when it comes to IoT and/or similar anomaly detection, is that they require a data set of training examples (whether labeled or not) to be collected upfront before the model can be trained to start looking for anomalies. In the case of unsupervised learning, the training data is not labeled, but it needs to contain only normal examples of sensor data. Current approaches require offline, batch model training and evaluation by a machine learning expert before the system can start monitoring for IoT and/or similar anomalies.
As explained above, one weakness of the k-means algorithm is that the number of clusters k must be known upfront. For a streaming application with machine sensor data, the number of clusters of normal and anomalous behavior is unknown, making this approach impractical. Thus, existing streaming k-means algorithms are not well suited for use with machine sensor data. Also, existing streaming k-means algorithms make different tradeoffs in their handling of the volume and velocity of streaming data, generally choosing either to sample it, or to process it in mini-batches. Neither approach takes full, continuous advantage of all the data available.
Current machine sensor anomaly detection approaches typically do not address the problem of concept drift. The failure to account for concept drift can eventually lead to false or missed alarms, e.g., unless the predictive model is updated. However, knowing when to update the model often requires specific domain expertise for the machinery in question. As a result, these approaches often degrade in accuracy and require a significant amount of maintenance.
Certain example embodiments address the above and/or other concerns.
One aspect of certain example embodiments relates to systems and/or methods for detecting novel and/or anomalous events across multiple sensors that automatically start with live data, learn and adapt as they go, facilitate the input of human operators to guide machine learning, and coordinate necessary maintenance and/or other responses as appropriate. Certain example embodiments implement both unsupervised and supervised machine learning techniques to create a shared anomaly detection model and include dynamic updating features to handle the issue of concept drift (e.g., where the normal operating parameters of a machine change naturally over time).
Another aspect of certain example embodiments relates to techniques applicable across a wide variety of machinery that do not necessarily require a priori knowledge of the machine's sensor types, failure modes, operating environment, etc.
Another aspect of certain example embodiments relates to dynamic anomaly detection, e.g., in connection with the IoT and/or other similar technology areas (including, for instance, those that involve small, inexpensive sensors that are ubiquitously found in all areas of manufacturing and controlling). Although information that streams in to dedicated servers can be assessed and classified, certain example embodiments make it possible to quickly identify error situations, failing machines, etc., as well as to automatically identify a fault situation, specifically dealing with situations where (a) there initially is an empty knowledgebase (e.g., there is no comparable data available that would aid in classifying the input data), and (b) there might be concept drift (e.g., where certain readings could over time turn from an “error” to a “normal” classification, or vice versa, because certain parameters have changed).
Another aspect of certain example embodiments relates to an improved k-means algorithm, which enables the combination of the supervised and unsupervised learning techniques. An overall process description includes not only the “incremental training of the shared model,” but also adds other components such as a knowledgebase, a workflow management component, a visualization component, etc.
The following example will help clarify the above-described and other related issues. Consider, for example, that power generation engines are expensive and complicated pieces of machinery. Because component failure would have potentially disastrous consequences (e.g., leaving many without power for a potentially prolonged period of time), maintenance schedules for such engines generally are very conservative. For instance, maintenance schedules oftentimes are based upon known failure rates for the engine type and frequently call for maintenance well in advance of when it might actually be required for a particular engine. This time-based maintenance approach can waste operations time by having upkeep operations performed more frequently than is necessary, and costs are increased when items that are still serviceable are nonetheless replaced. Anomaly detection approaches may be implemented to enable condition-based maintenance as opposed to a strictly time-based maintenance approach.
A supervised learning approach would involve building a knowledgebase of known failures, causes, remediation plans, etc. But building such a knowledgebase could take considerable time and expense. Once the knowledge base is built, training data would be captured for all the known engine failures, further increasing development costs. FIG. 1 shows the typical flow of this approach.
As shown in FIG. 1, a knowledgebase is built (step S102). The training data set is created (step S104). The supervised model is trained (step S106). Supervised anomalies are predicted (step S108), e.g., as sensor data is read and transformed (step S110). An operator is alerted in response to a predicted anomaly (step S112).
A multivariate unsupervised learning approach would bypass the need for a knowledgebase, but would still need the creation of a training data set gathered under normal operating conditions. Once implemented, such a system would identify significant deviations from normal engine behavior, but it likely would need to be tweaked and retrained in order to reduce the incidence of false alarms (false positives) and missed alerts (false negatives), both of which can be very expensive. As seen in FIG. 2, this approach would detect engine anomalies but likely would require constant interpretation by skilled domain experts as to what caused the engine problem and what should be done next in order to address it.
Similar to FIG. 1, FIG. 2 involves creating a training data set (step S202) and training the unsupervised model (step S204). Supervised anomalies are predicted (step S206), e.g., as sensor data is read and transformed (step S208). An operator is alerted in response to a predicted anomaly (step S210).
The approaches outlined in FIGS. 1-2 would eventually suffer from detection accuracy as the engine aged, because normal operating parameters for a brand new engine are different than those for one with ten thousand hours on it.
Certain example embodiments allow for a faster response, because they do not require the building of training data sets or knowledgebases as prerequisites. This is because certain example embodiments implement a guided learning method for training the shared anomaly detection model. Certain example embodiments also assume that the data source (e.g., sensor data) is always live and, thus, it is assumed that there is never an offline period for performing traditional batch machine learning. Therefore, certain example embodiments begin with unlabeled data only and learn the labels as they go, with the incremental help of human experts. As seen in FIG. 3, certain example embodiments begin reading live engine sensor data right away (step S302) and train a shared model incrementally (step S304), thus avoiding the delayed response typical of prior and current systems. The resulting model is able to detect and recognize repeat problems (step S306), while still discovering new problems and routing them to domain experts for review (step S308) and knowledge capture (step S310). And the model may adapt to changing operating conditions automatically as the engine ages. Over time, the initially empty knowledgebase may grow to cover additional (and potentially all) possible engine issues, and the need for a domain expert, as required by unsupervised learning approaches of prior and current systems, accordingly may fade away.
The guided learning approach of certain example embodiments uses human expert input for dynamic, incremental labeling of training data. FIG. 4 is a flowchart of the guided learning approach that may be used in connection with certain example embodiments. As will be explained in greater detail below, this approach may use business process management (BPM) technology to help coordinate the actions of the components and the human operators.
The shared model of certain example embodiments is trained incrementally using two different techniques, and predictions are made via that model.
In certain example embodiments, a system for detecting anomalies in data dynamically received from a plurality of sensors associated with one or more machines is provided. The system comprises a knowledgebase, a model store, and one or more interfaces configured to receive data from the plurality of sensors. Processing resources include at least one processor and a memory, the processing resources being configured, for each instance of data received via the one or more interfaces, to at least: classify, using a model retrieved from the model store, the respective instance as being one of a normal instance type and an anomalous instance type, the retrieved model being selected from the model store as being appropriate for the machine that produced the data in the respective instance if such a model exists in the model store; in response to a classification of the respective instance being a normal instance type, use the data in the respective instance to train the retrieved model; in response to a classification of the respective instance being an anomalous instance type that is not new, determine from the knowledgebase an action to be taken and take the determined action; and in response to a classification of the respective instance being an anomalous instance type that is new, seek confirmation from an authorized user as to whether the respective instance should be designated as a confirmed new anomalous instance type. Responsive to confirmation from the authorized user that the respective instance is a new anomalous instance type, the knowledgebase is updated with information about the respective instance and/or an action to be taken should the new anomalous instance type be detected again. The data in the respective instance is used to train the retrieved model. Each model in the model store is implemented using a k-means cluster algorithm modified so as to (a) be continually trainable as a result of the dynamic reception of data over an unknown and potentially indefinite time period, and (b) build clusters incrementally and in connection with an updatable distance threshold that indicates when a new cluster is to be created. Each said model has a respective total number of clusters that is dynamic and learned over time.
In certain example embodiments, there is provided a system for detecting anomalies in data dynamically received from a plurality of sensors, with each said sensor being associated with one or more machines. The system includes a model store, with each said machine having an associated model stored therein. One or more interfaces is/are configured to receive data from the plurality of sensors. Processing resources include at least one processor and a memory, with the processing resources being configured to train each said model using a modified k-means cluster algorithm in which there are defined a cluster initialization window p, a distance threshold t, an instance-weighting window w, a number of clusters k, clusters c1 . . . cn, sample covariance matrices S1 . . . Sk for respective clusters, and μ1 . . . μk as centroids of respective clusters. Each said cluster has an associated class, with the class being one of an anomalous type class and a non-anomalous type class. For each given data stream X from a given one of the machines that includes data instances x1 . . . xn with a number of variables d, the modified k-means algorithm is programmed to: initialize centroid n of cluster c1 as the mean of instances x1 . . . xp, and matrix S1 as the covariance of instances x1 . . . xp, cluster c1 and instances x1 . . . xp being predicted as normal instance types; and for each instance i from xp+1 . . . x∞ in the given data stream X: (a) temporarily assign instance xi to the cluster with the nearest centroid μ1, . . . μk, (b) if the distance of xi to that centroid is greater than the distance threshold t, obtain a cluster assignment for xi from an authorized user, and (c) if the cluster assignment is for a confirmed new anomalous instance type, (a) create a new cluster cj+i, and set centroid μj+1=xi and covariance matrix Sj+1 as the mean of existing covariance matrices S1 . . . Sj, and (b) predict the class of cj+1 for xi; and (d) otherwise: update the centroid μj as the w window-weighted mean of the instances xi that have been assigned to the cluster; if the number of instances xi that have been assigned to the cluster is greater than the cluster initialization window p, update the matrix Sj as the w window-weighted covariance of the instances xi that have been assigned to the cluster; and predict the class of cj for xi.
Corresponding methods and non-transitory computer readable storage mediums tangibly storing instructions for performing such methods also are provided by certain example embodiments, as are corresponding computer programs.
These features, aspects, advantages, and example embodiments may be used separately and/or applied in various combinations to achieve yet further embodiments of this invention.