1. Field of the Invention
The present invention relates to a data merging program which merges numerical data recorded in cells arranged in a matrix; a data merging method; and a scoring system which utilizes the data merging program and is adapted to calculate a score representing the probability of fraudulent use in response to a credit inquiry of a credit card or the like.
2. Description of the Related Art
Customarily, when a credit card is used, in order to prevent fraudulent transactions by a third party who has found the credit card and pretends to be the owner, the store or the like where the card is being used checks with the credit card company to ascertain the credit card balance as well as to conduct a credit inquiry concerning fraudulent use. In a system for such credit inquiry, quickness and accuracy of determination are important.
At present, credit card companies use a system which automatically determines a score for the possibility of fraudulent use on the basis of authorization data (data which is sent from the store or the like concerning the owner of the credit card, the monetary value of the transaction which is requested, etc.). In such systems, typically a score is determined by use of a scoring system which utilizes a neural network using neural theory (see Nonpatent Document 1).
A neural network is leading-edge technology which models the structure and information processing function of nerve cells of the human brain. Constructing such a system requires special know-how and a large monetary investment. Accordingly, many credit card companies do not themselves construct a basic system for score determination, but instead typically introduce a general purpose, external system for portions relating to a neural network.
Nonpatent Document 1
Asano Yoichiro, Suda Yoshinobu, “Introduction of a Fraudulent Use Detection System and Its Results”, Gekkan Syohishashinyo, Kinzai Institute for Financial Affairs Research Group, May 2000, pages 16–19.
However, a scoring system using a neural network has problems, in that the logic for making a determination is a black box, so that the basis of determination is unclear to the credit card company or the like which utilizes it. In addition, as the user such as the credit card company does not itself create the neural network, difficulty is encountered in reflecting trends from the authorization data for that company. A conceivable measure for coping with such problems is to construct, in place of a neural network, a scoring system using a Bayesian network which uses Bayesian theory, which has recently come into use in the fields of artificial intelligence and the like.
A Bayesian network classifies objective events into patterns according to individual factors and statistically obtains the probability of occurrence of an event from past record values in the respective patterns. For example, when a Bayesian network is used for determination of fraudulent use of a credit card, factors such as the time, the monetary value, and the purchased article contained in authorization data are extracted, and, for example, data are collected for an individual pattern such as “use during the time period 15:00–18:00 to purchase an article having a monetary value of up to 10,000 yen” or “purchase of electric appliances having a monetary value of 50,000 yen–100,000 yen.” From the ratio between the total number of samples for each pattern and the number of frauds for that pattern, the probability of occurrence of fraudulent use is calculated as a score.
Specifically, in scoring according to a Bayesian network, as illustrated in FIG. 14, a matrix whose coordinates (columns and rows) are factors contained in authorization data is created, and for each pattern, the number of samples and the number of frauds are plotted in a corresponding cell of this matrix. For example, upon receipt of an inquiry for determination on use of a credit card at 15:00 for a purchase having a monetary value of 20,000 yen, a score is calculated from a piece of past record data indicating that a single occurrence of fraud appeared among 120 samples, by referring to the cell of 12:00–18:00 and 10,000 yen–50,000 yen. If such a method is utilized, the basis of determination is clear to the credit card company, and the credit card company can construct a scoring system matching the trend of users of that company, while reflecting, in such data, the authorization data concerning the use of credit cards by the users of that company.
If the factors contained in authorization data are of two different kinds such as “time” and “monetary value,” a two-dimensional matrix like the above-mentioned example of FIG. 14 is used. Alternatively, if the factors contained in authorization data are of three kinds including an additional factor; e.g., “article,” a three-dimensional matrix as that shown in FIG. 15 including an additional coordinate is used. Further, if other kinds of factors such as “store” and “attribute of user” are added, a multi-dimensional matrix such as a four—or five-dimensional matrix is constructed.
The problem with this method using a multi-dimensional matrix is that as the number of dimensions of the matrix increases, the number of cells to be contained in the matrix becomes enormous, thereby increasing the load of processing of the system imposed for scoring, rendering a speedy determination difficult. Increasing the number of factors is preferable, from the viewpoint of more accurate determination; however, the increase in the number of factors leads to a decrease in processing speed. In scoring of credit card use, because the store is reluctant to keep a customer waiting for a long time for credit inquiry, speedy determination must be performed.
Further, when the number of cells increases with the increase in number of factors, the number of samples of past record data to be contained in a single cell decreases. As a result, when the number of samples to be contained in a cell is too small, the result of calculation of scoring is likely to be scattered. If a single fraudulent use happens to occur with a certain combination of factors, a score to be notified as a response to the store is that the probability of occurrence of fraud is 100%, provided that no other sample exists. An essential measure for preventing such a phenomenon is to set the number of samples to be contained in a single cell to a value equal to or larger than a fixed value that would prevent scattering of the result of calculation.