As industrial machinery has become more complex, machine condition monitoring has received increased attention and evolved into one of the most effective tools for maximizing the economic life-span of industrial machinery in various fields of application. Advanced machine learning techniques are among the key components for sophisticated monitoring systems and provide a means to automatically learn fault diagnosis models from sensor data (e.g., annotated historical data). One of the particular advantages of machine learning in condition monitoring is that the underlying diagnosis models can be adapted both to different application fields and time-shifting monitoring environments.
One of the most elementary scenarios in machine condition monitoring is to consider only two orthogonal states, namely, the alert state (e.g., indicating that the system requires specific attention to prevent possible failure or damage) and the non-alert state. More sophisticated systems model the machine to be associated with exactly one state from a finite, and typically small, set of alternatives, Systems such as these support a more fine-grained monitoring process such as a green, orange, and red alert scale, where the system states are assumed to be mutually exclusive. Adding even more flexibility, the machine condition might be characterized by a set of states (e.g., failure, alert, etc.) such that more than one state can be applicable at a particular point in time. Most prior models of a multi-alert system considered multiple binary monitoring systems where each binary system indicated whether a particular subsystem is in a critical (e.g., relevant, active, and/or alert) state. Some models were capable of ranking functionality and learning to determine a cut-off between active and non-active fault states (e.g., relevant and non-relevant faults), but were computationally costly.
Multilabel ranking (MLR) is a recent combination of two supervised learning tasks—multilabel classification (MLC) and label ranking (LR). MLC studies the problem of learning a model that associates with an instance x a bipartition of a predefined set of class labels into relevant (e.g., positive) and irrelevant (e.g., negative) labels, while the LR considers the problem to predict rankings of all class labels. MLR is a consistent combination of these two types of prediction. Thus, it can either be viewed as an extended ranking (e.g., containing additional information about a kind of “zero point”), or as an extended MLC (e.g., containing additional information about the order of labels in both parts of the bipartition). For example, in a document classification context, the intended meaning of the MLR is that, for the instance (=document) x, the classes (=topics) politics and economics are relevant, the former even more than the latter, whereas education and sports are irrelevant, the former less than the latter.
Considering MLC, the additional order information is not only useful by itself but also facilitates the post-processing of predictions (e.g., considering only the most relevant labels). MLR is not more demanding than MLC with respect to the training information (e.g., a multilabel ranker can well be trained on multilabel classification data). Also, inducing such a ranker can be useful even if the only interest is in a MLC. Basically, a MLR model consists of two components—a classifier and a ranker. The interdependencies between the labels which are learned by the ranker can be helpful in discovering and perhaps compensating for errors of the classifier. For example, the classifier may estimate one label to be relevant and a second label not relevant. The additional (e.g., conflicting) information that the latter is typically ranked above the former might call this estimation into question and thus repair the misclassification.
Existing approaches operating in ranking scenarios are typically model-based extensions of binary classification techniques which induce a global prediction model for the entire instance space from the training data. These approaches suffer substantially from the increased complexity of the target space in multilabel ranking in comparison to binary classification, thus having a high level of computational complexity already for a moderate number of class labels.
A common model-based approach to MLC is binary relevance learning (BR). BR trains a separate binary model Mi for each label λi, using all examples x with λiε Px as positive examples and all examples with λiε Nx as negative examples. To classify a new instance x, λi is submitted to all models, and Px is defined by the set of all λi for which Mi predicts relevance.
BR can be extended to the MLR problem in a straightforward way if the binary models provide real-valued confidence scores as outputs. A ranking is then simply obtained by ordering the labels according to these scores. This approach suffers in that it is ad-hoc and has some disadvantages. For example, good estimations of calibrated scores (e.g., probabilities) are often hard to obtain. Further, this approach cannot be extended to more general types of preference relations such as partial orders.
Some model-based methods use a unified approach to calibrated label ranking which subsumes MLR as a special case. The framework enables general label ranking techniques, such as the model-based ranking by pairwise comparison (RPC) and constraint classification (CC), to incorporate and exploit partition-related information and to generalize to settings where predicting a separation between relevant and irrelevant labels is required. This approach does not assume the underlying binary classifiers provide confidence scores. Instead, this approach adds a virtual label as a split point between relevant and non-relevant labels where a calibrated ranking is simply a ranking of the extended label set L ∪ {λ0}. Such a ranking induces both a ranking among the labels L and a bipartite partition (Px,Nx) in a straightforward way. Px is given by those labels that are ranked higher than λ0; Nx by those labels that are ranked lower. Thus, every label λi known to be relevant is preferred to the virtual label (λi>λ0). Likewise, λ0 is preferred to all non-relevant labels. Adding these preference constraints to the preferences that can be extracted for the regular labels, a calibrated ranking model can be learned by solving a conventional ranking problem with c+1 labels. However, these model-based approaches may become computationally costly.
With increasingly complex industrial machinery, the need to detect and/or remedy faults (e g., alerts, failures, etc.) early has become critical. However, prior approaches to addressing multilabel classification and ranking used computationally intense modeling. Therefore, there exists a need for a less computationally intense approach.