Heretofore, for example, assessments of insurance claims in insurance companies, and examinations and credit of loan and credit cards in financial companies have been essential and important tasks, and experienced experts in such companies have been in charge of these tasks. However, these days, since the number of tasks to be processed is increasing, such tasks cannot be handled in manual procedures performed by experts.
Accordingly, to reduce the load on experts, a method has been recently employed in which assessments of insurance claims and credit of credit cards are performed by means of a machine learning technique using a computer.
Information that is used for the assessments and the credit and that is sent from applicants includes yes/no answers for questions, values, such as an age and an annual income, and other descriptive text information. When such information is given on paper, predetermined operators input the information using keyboards of a computer or by means of OCR so as to convert the information into electronic formats. On the other hand, when applicants send the information to a server by means of operations on web browsers, it is not necessary to convert the information into electronic formats.
When electronic applications are collected in these ways, the experts first check each application information, and, for each application, determine acceptance/rejection and record a label for it electronically. A supervised (training) data set which has pairs each of which is constituted by a feature vector xi (i=1, . . . , n) and a determination result (class label) yi (i=1, . . . , n) for each piece of the application information, and which represents the determination performed in advance by the experts as described above is defined as follows.Dtraining={(x1,y1), . . . ,(xn,yn)}Here, yiεC where C represents a set of class labels. For example, C={0, 1} where 1 represents acceptance, and 0 represents rejection.
An example of such training data set is illustrated in FIG. 1. That is, supervised data includes accepted (label 1) data 102, 104, 106, and 108, and rejected (label 0) data 110, 112, and 114. These pieces of data correspond to individual application.
A system of supervised machine learning configures a classifier by using this training data. The classifier corresponds to a function h such ash:x→y where x represents a feature vector for the application and y represents a label for the application.
After the classifier is configured as described above, FIG. 2 illustrates applications as test data are classified using the classifier. That is, data 202, 204, 206, and 208 are classified as accepted data, whereas data 210, 212, 214, and 216 are classified as rejected data. Here, the data 208 and 210 will be focused on. If the data 208 had been classified properly, it should have been classified as rejected data; however the data 208 has been classified as accepted data by the classifier and is called falsely accepted data (FP=false positive). If the data 210 had been classified properly, it should have been classified as accepted data; however, the data 210 has been classified as rejected data by the classifier and is called falsely rejected data (FN=false negative).
The classifier is configured on the basis of probability. Accordingly, even if any scheme of machine learning is employed, it is difficult to eradicate falsely accepted data and falsely rejected data completely.
The classifier classifies test data of a sample, and the classification result is, as illustrated in FIG. 3, that data 302, 304, 306, 308, 310, and 312 are classified as accepted data, whereas data 314, 316, 318, 320, and 322 are classified as rejected data. Regarding the classification result, suppose that a malicious person finds by chance the data 312 which is falsely accepted. The malicious person may analyze the content described in the data 312, acquire knowledge, which is to be used maliciously, of which items are to be rewritten and how to rewrite these items in order to make data that is to be rejected become accepted data, and produce a manual by using the knowledge. For example, this manual may be a manual titled “how to make an insurance claim, which is far from being accepted, be easily accepted”. The malicious person could sell this manual, and persons who have read the manual could create and send a series of cases that may become falsely accepted data as denoted by reference numeral 324 in FIG. 3.
Known technologies for detecting such a malicious attack are described in the following documents.
In the document, Shohei Hido, Yuta Tsuboi, Hisashi Kashima, Masashi Sugiyama, Takafumi Kanamori, “Inlier-based Outlier Detection via Direct Density Ratio Estimation”, ICDM 2008 http://sugiyama-www.cs.titech.ac.jp/˜sugi/2008/ICDM2008.pdf, a technique is disclosed in which an anomaly is detected by obtaining a density ratio between training data and test data.
In the document, Daniel Lowd, Christopher Meek, “Adversarial Learning”, KDD 2005 http://portal.acm.org/citation.cfm?id=1081950, an algorithm in the field of spam filtering is disclosed which aims to continuously address a situation in which a single attacker carries out an attack using various techniques. The algorithm defines a distance from an ideal sample which the attacker wants to pass as an adversarial cost, and detects a sample having the minimum adversarial cost (the first sample that the attacker wants to pass among samples that can pass) and a sample having an adversarial cost that is at most k times the minimum adversarial cost, from a polynomial number of attacks.
The document, Adam J. Oliner, Ashutosh V. Kulkarni, Alex Aiken, Community Epidemic Detection using Time-Correlated Anomalies, RAID 2010 http://dx.doi.org/10.1007/978-3-642-15512-3_19, describes a technique in which in order to detect a malicious attack when a computer is subjected to the malicious attack, multiple clients are grouped under the same condition and a difference in behavior from the surroundings is calculated as a degree of anomaly. A situation in which a degree of anomaly for a single client temporarily increases may occur even in a normal case, whereas a case in which degrees of anomaly for a certain number of anomalous clients simultaneously increase indicates occurrence of an attack. This is called a time-correlated anomaly and a monitoring method for detecting a time-correlated anomaly is proposed.
The document, Masashi Sugiyama, “Kyouhenryoushifutokadeno kyoushitsuki gakushu” (“Supervised Learning under Covariate Shift”) Nihon Shinkei Kairo Gakkaishi (The Brain & Neural Networks), vol. 13, no. 3, 2006, describes a discussion about how a predictive model is to be corrected in the supervised learning that is performed when training data and test data have different probability distributions. In particular, this document describes a technique in which a degree of importance is increased for training data samples that are present in an area in which test data frequently appears, so that test data is successfully classified.
According to the related art described above, a malicious attack can be detected in a certain situation. However, the related art has a problem of having a limitation that properties that are specific to data, such as data homogeneity and degrees of anomaly for individual pieces of data, are assumed. Another problem is that a degree of vulnerability can be assessed but the fact that a saturation attack is being carried out using data to be falsely accepted cannot be detected.