1. Field of the Invention
The present invention relates to an outlier determination rule generation device and an outlier detection device for conducting statistical outlier detection, unfairness detection and fraud detection, and an outlier determination rule generation method and an outlier detection method thereof.
2. Description of the Related Art
Unfairness detection (abnormality detection, fraud detection) using machine learning techniques is roughly classified into two systems, a system based on supervised learning and a system based on unsupervised learning.
Among well-known systems based on supervised learning are that proposed by T. Fawcett and F. Provost (“Combining Data Mining and Machine Learning for Effective Fraud Detection”, Proceedings of AI Approaches to Fraud Detection and Risk Management, pp. 14-19, 1997) and that proposed by J. Ryan, M. Lin and R. Miikkulainen (“Intrusion Detection with Neural Networks”, Proceedings of AI Approaches to Fraud Detection and Risk Management, pp. 72-77, 1997).
Supervised learning needs “labeled data”, data with a label (supervisor information) attached thereto in advance which indicates that data is abnormal (unfair) or not. In supervised learning, based on such past data, features of abnormal data are learned for use in the detection of abnormal data. Unfairness detection, for example, is conducted by collating data of past unfairness with data to be examined and is therefore incapable of coping with detection of unfairness having new features.
On the other hand, unsupervised learning detects abnormal data (unfair data) without requiring such labeled data. Systems based on unsupervised learning make use of the idea of statistical outlier detection, among which well-known are the system proposed by P. Burge and J. Shawe-Taylor (“Detecting Cellular Fraud Using Adaptive Prototypes”, Proceedings of AI Approaches to Fraud Detection and Risk Management, pp. 9-13, 1997) and the system proposed by K. Yamanishi et al. (“On-line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms”, Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, pp. 320-324, 2000).
Statistical outlier detection here denotes a technique of considering data outside a probability distribution which the majority of data follow (data whose generation is less likely) in a data set, as “statistical outlier” and identifying the data as being abnormal (unfair). The above-referenced system, in particular, is characterized in that for each data in a data set, a score indicating how much the data falls outside is calculated.
The above-described conventional techniques, however, have the following problems.
First, although the conventional systems based on supervised learning are characterized in that features of unfairness and abnormality can be presented, it is in practice so difficult to prepare sufficient labeled data in advance as described above that highly precise learning can not be conducted to deteriorate efficiency of unfairness detection.
Secondly, although the conventional systems based on unsupervised learning are characterized in being capable of coping with unknown abnormality and unfairness, they have a problem that the reason why detected abnormal data is determined to be abnormal is not indicated.
Moreover, when structural abnormality occurs or when unfairness is made organizationally, abnormal data will be generated in the lump and in such a case, conventional unsupervised-learning-based systems fail to detect the abnormality.
Here, even when abnormal data is generated in the lump, if the features of the generating abnormality can be implicitly seized and automatically ruled to detect abnormal data using the rules, efficiency of abnormal data detection can be drastically improved.