Detecting and analyzing anomalies of dynamic systems is an important technical challenge in various areas in the manufacturing industry. For example, anomaly detection in a production line has been of particular importance, and a lot of statistical techniques have been developed for quality control purposes. However, most of the traditional statistical quality control techniques are based on a strong assumption of multivariate normal distribution. Unless the system of interest is relatively static and stationary, the distribution of data is far from the normal distribution in general. This is especially the case in the analysis of automotives, where the system is highly dynamic and the definition of the normal state is not apparent. As a result, the utility of such traditional approaches is quite limited in many cases.
The following points may be considered on anomaly detection and analysis of cars:
1. From each component of an automobile, hundreds of time series data are observed.
2. The types of observed time series data can be various; for example, the values could be discrete in some variable, and be continuous in another.
3. The intervals of observations (or sampling interval) can be also various depending on the types of observed values.
4. The knowledge of individual engineers may be incomplete; they may not always make a valid decision based on experimental data.
Heretofore, a typical approach to anomaly detection and analysis is limit-check or its variant, where an observed value is compared to a threshold (or reference) value that has been predetermined using some algorithm. Based on limit-check, a rule-based system is often implemented, which enables, at least in principle, making a decision on a detected fault, based on a rule that “if a certain kind of observed value is larger than a predetermined reference value, a user is informed of an occurrence of anomaly”. However, in highly dynamic systems such as an automobile, the trend of a variable can be greatly changed over time. Thus it is difficult to determine the reference value of a variable for detecting anomalies. While experienced engineers may be able to make a decision on the state of the system based on such complicated numerical data, it is unrealistic to assume that enough manpower of experienced engineers is available in every phase and place of anomaly detection. In addition, the knowledge of experienced engineers is often hard to translate to specific mathematical rules used in the limit-check routine. To summarize, the applicability of limit-check in combination with partial human knowledge is seriously limited in general. Accordingly, if there is an anomaly detection method that works more effectively than limit-check, or complementally functions in addition to the limit-check, time and effort for an anomaly diagnosis will be greatly reduced.
Generally, test experiments are performed on a certain round basis. For example, in a case of an automobile, one experimental round can be one lap of a test course. This experimental round is referred as a run. When an automobile goes round the test course n-times, observed values of n runs, that is, n time series data sets of each kind of observed values are obtained. In general, it is difficult to make test conditions in all the runs exactly the same since the complexity of the system is too high to completely control their values. The time series data sets in individual runs may be different from one another more or less. In conventional techniques, it is hard to handle such fluctuations in experimental conditions, so that a substantial status of a diagnosis target cannot be appropriately characterized in many cases.
In addition, the tendency of variations in observed values are greatly different among types of observed values. Moreover, since the number of variables of the system is very large, considering all combinations of the variable is computationally prohibitive.