Many manufacturing and service equipment installations today include, in addition to systems for controlling machines and processes, systems for machine condition monitoring. Machine condition monitoring systems include an array of sensors installed on the equipment, a communications network linking those sensors, and a processor connected to the network for receiving signals from the sensors and making determinations on machine conditions from those signals.
The purpose of machine condition monitoring is to detect faults as early as possible to avoid further damage to machines. Traditionally, physical models were employed to describe the relationship between sensors that measure performance of a machine. Violation of those physical relationships could indicate faults. However, accurate physical models are often difficult to acquire.
An alternative to the use of physical models is the use of statistical models based on machine learning techniques. That approach has gained increased interest in recent decades. In contrast to a physical model, which assumes known sensor relationships, a statistical model learns the relationships among sensors from historical data. That characteristic of the statistical models is a big advantage in that the same generic model can be applied to different machines. The learned models differ only in their parameters.
There are two basic types of statistical models used in machine condition monitoring: a regression-based model and a classification-based model. In a regression model, a set of sensors is used to predict (or estimate) another sensor. Since a regression model can produce a continuous estimate, the deviation of the actual value from the estimate can be used directly for fault diagnosis. For example, a simple logic can be built as “the larger the deviation, the greater the chance of a fault.”
In a classification-based model, the output is discrete. One application of a classification-based model is an out-of-range detector, wherein a one-class classifier is often employed. A one-class classifier output indicates whether there is an out-of-range condition or not.
To be able to use statistical models for machine condition monitoring, it is necessary to train the model based on labeled historical data. In a classification-based model, a data point label may be either “normal” (representing good data) or “abnormal” (representing data indicating a fault).
One approach to training is to include all available data in the training set. The advantage of an all-inclusive approach is that the trained statistical model is expected to generalize well, because the training data covers most variations that may occur in future. Two issues, however, exist in that approach. First, there may be too much training data, making the training process time-consuming or even intractable. Second, much of the data may be very similar. It is not necessary to use similar training samples. Similar data may furthermore cause over-training if, during the selected training period, the machine happens to be working in the same mode for most of the time. Simple sub-sampling can solve the first of the above issues, but not the second. Sub-sampling may also cause loss of useful data points. A human operator can manually select training instances; however, such a process is tedious and also intractable if multiple sensors are present in a model.
There is therefore a need for an improved method for selecting training data. Such an improved method would find representative training instances and at the same time reduce data redundancy.
One approach might be to use standard clustering techniques to cluster the training data, and then use each cluster center as a selected instance. The two most frequently used clustering algorithms are the k-means algorithm and the ISODATA clustering algorithm. Both of those algorithms are iterative procedures. For the k-means algorithm k cluster centers are initially randomly selected. Each training sample is assigned to the closest cluster based on the distance from the sample to the cluster center. Then all cluster centers are updated based on the new assignments. The process is repeated until it converges.
The ISODATA algorithm is more advanced in that it is able to split and merge clusters. A cluster is merged with another cluster if the cluster is too small or very close to another cluster. A cluster is split if it is too big or its standard deviation exceeds a predefined value.
Neither algorithm, however, is appropriate for use in selecting training data in the present application, for at least two reasons. First, both the k-means and ISODATA algorithms create a virtual data point, while the present application requires selecting a real data point. Second, both clustering methods lack a precise control of the geometric size of each cluster. For example, the technique may yield a number of large clusters. The center of a large cluster is not representative of all its members, because the distance between the members is too large.
There is therefore presently a need for a method for selecting training data from a large data set. That method should limit the number of training samples, while assuring that the selected samples are representative of the data.