Many manufacturing and service equipment installations today include, in addition to systems for controlling machines and processes, systems for machine condition monitoring. Machine condition monitoring systems include an array of sensors installed on the equipment, a communications network linking those sensors, and a processor connected to the network for receiving signals from the sensors and making determinations on machine conditions from those signals.
The purpose of machine condition monitoring is to detect faults as early as possible to avoid further damage to machines. Traditionally, physical models were employed to describe the relationship between sensors that measure performance of a machine. Violation of those physical relationships could indicate faults. However, accurate physical models are often difficult to acquire.
An alternative to the use of physical models is the use of statistical models based on machine learning techniques. That approach has gained increased interest in recent decades. In contrast to a physical model, which assumes known sensor relationships, a statistical model learns the relationships among sensors from historical data. That characteristic of the statistical models is a big advantage in that the same generic model can be applied to different machines. The learned models differ only in their parameters.
To ensure the success of a statistical model, the sensors to be included in the model must be selected carefully. For example, for a regression model, which uses a set of input sensors to predict the value of an output sensor, the output sensor should be correlated with the input sensors. Large systems such as a power plant can contain over a thousand sensors. A systematic technique for exploring the relationship between sensors is therefore needed.
In the statistics field, correlation analysis has been extensively used to find the dependence between random variables. If the signal from each sensor is viewed as a random variable and its value at a certain time is viewed as an independent observation, it is possible to similarly apply statistical correlation analysis to sensors to find out their relationship. A well-known method is to calculate the correlation coefficient between two random variables x and y as:
      ρ    xy    =                    ∑                  i          =          1                n            ⁢                        (                                    x              i                        -                          x              _                                )                ⁢                  (                                    y              i                        -                          y              _                                )                                                          ∑                          i              =              1                        n                    ⁢                                    (                                                x                  i                                -                                  x                  _                                            )                        2                              ⁢                                    ∑                          i              =              1                        n                    ⁢                                    (                                                y                  i                                -                                  y                  _                                            )                        2                              
where [xi, yi] is the ith observation (or sample) of x and y. x and y are the observation means of x and y. n is the number of samples. For simplicity, ρxy is also abbreviated as ρ.
The correlation coefficient defined above suffers from the effects of outliers. A single outlier could significantly lower the ρ score between two random variables even if they are, in fact, highly correlated. To tackle that problem, researchers have proposed Spearman and Kendall correlation coefficients, which are known to be among the top performers.
After calculating ρ for each pair of sensors, a cluster analysis may be performed in order to group the sensors according to their similarity. The well-known k-means clustering method requires a number of clusters to be specified. The value has little or no physical meaning. In addition, the classical k-means clustering technique requires the calculation of the mean of each cluster, which is not directly possible using the correlation-coefficient based measure.
There is therefore presently a need to provide a method and system for establishing relationships among sensors in a machine condition monitoring system using statistical models based on machine learning techniques. The technique should be capable of dealing with data containing outliers, and should have physically meaningful criteria for setting cluster parameters.