With the increase in communications and electronic transactions, incidents of fraud surrounding these activities has increased. For example, “cloning” a cellular telephone is a type of telecommunications fraud where an identifier, such as a serial number, for a cellular telephone is snooped, or read, as calls are transmitted, captured, and used to identify calls transmitted by other cellular telephones. When the other cellular telephones transmit calls, the calls may be fraudulently charged to the account holder for the original cellular telephone.
Another fraudulent activity includes stealing credit card numbers. Some workers carry small readers for reading the vital information from a credit card. A person may get a job as a waiter or cashier in a restaurant and when the customer provides his credit card, the credit card may be swiped as part of a payment and swiped again using the small reader. The credit information is captured and then the person misappropriating the credit card information will use the information to make unauthorized purchases, or sell the information related to the credit card to others who will place unauthorized purchases. There are other schemes where a group of bad actors set up bogus ATM machines. In one instance, a convenience store owner was given $100 to allow a bogus machine to be placed in the store. The ATM included a reader only so prospective customers would use the machine and then complain that it did not dispense money. The bad actor would pick up the machine after several days and take it for “repair” and would never return. The misappropriated credit card numbers would then be either sold or used to make various purchases.
In short, various fraudulent schemes result in large losses to various institutions. Generally, the losses are billions of dollars per year. Therefore, there is large demand for systems and methods to detect fraudulent transactions. Some current systems and methods attempt to detect fraudulent transactions by constructing a model based on historical observations or transactions. By observing a large number of transactions, characteristics of fraud may be derived from the data. These characteristics can be then be used to determine whether a particular transaction is likely to be fraudulent.
For example, characteristics of 100,000 transactions, such as phone calls or points of sale, can be captured and later characterized as fraudulent or legitimate. The fraudulent calls in the 100,000 calls may share similar characteristics and transaction patterns that are used to build static model that indicate the probability of fraud for incoming transactions. Similarly, the fraudulent credit card transactions in the 100,000 transactions may share a different set of similar characteristics and transaction patterns. The similar characteristics, in either case, are used to build a static model that indicates the probability of fraud for incoming transactions, such as transactions associated with phone calls, point of sale transactions, internet sales transactions, and the like. In certain systems, these static, historical models can be used in a production, or real-time, environment to evaluation a probability of fraud for incoming transactions. However, creation of the historical model may be difficult to deploy.
The models formed for production many times include an indication of fraudulent or non-fraudulent activity and also associate a risk with the assessment where value of fraud is 1.0 and non-fraud is 0.0. The risk will generally have a value between 0.0 and 1.0 and will, therefore, serve as a probability of whether there is an actual fraud. Thus, the decision to label a particular transaction as fraudulent will have an indication of the probability that the fraud determination is correct.
A risk table is used to assess risk to a particular group of variables from a transaction. Typically, methods use one dimensional (1-D) risk tables, which essentially represent a Naive Bayes method of assigning risk to the values of categorical variables. The interaction among raw categorical variables is not taken into account when a 1-D risk table is used. A two dimensional (2-D) risk table or multi-dimensional risk table would capture at least some of the interactions among various categorical variables. However, multi-dimensional risk tables generally are more limited primarily due to inherent issues around the sparseness of exemplars on which to estimate risk for the permutations of the various variable values. For example, using a typical 2-D risk table for associating two variables normally yields a risk table having many blank values for cells where the co-occurrences of variable values are not well represented or are non-existent in the training data.
Blank values are undesirable. Populating the blank values with a default value is often bad since the default value may overstate or understate the risk of fraud associated with the variables.