A paramount concern in a modern enterprise is to understand the quality of its particular business. For example, it is often difficult to identify the value that a particular business entity has in the dimension of fraud, as it relates to business transactions. Often times, identification of fraudulent transactions involves analyzing data to uncover hidden insights or patterns. In the field of computer science, data mining algorithms have traditionally been utilized across a variety of industries to uncover hidden insights into data.
By way of example, U.S. Pat. No. 6,836,773 teaches an enterprise-wide web data mining system that generates a plurality of data mining models for generating a prediction or recommendation using data collected from the Internet. An integrated data mining and relational database management system that makes patterns uncovered during data mining available in virtual relational database tables that can be queried is described in U.S. Pat. No. 6,629,095. Similarly, U.S. Pat. No. 6,708,163 teaches a collective data mining approach for finding patterns from a network of databases, each with a distinct feature space. The approach is useful distributed fault detection in an electrical power distribution network. By way of further example, U.S. Pat. No. 6,480,844 teaches a method for mining information from large volumes of data regarding transactions of a multitude of parties.
Data mining algorithms generally fall into two broad categories of learning techniques: supervised learning and unsupervised learning. Unsupervised learning techniques can be used to discover associations and clusters in data, independent of a particular business objective. Alternatively, supervised learning techniques can construct predictive models for particular dimensions of a business problem, such as whether a transaction is fraudulent or not.
A good predictive model needs to be able to uncover patterns that are not obvious or intuitive. However, generating good predictive models for identifying particular dimensions of a business problem using supervised learning typically requires large sets of data. One particular business problem that has received much attention is the problem of detecting fraudulent transactions. A large database of fraudulent transactions is usually needed in order to train the model to differentiate between the two categories (i.e., fraudulent or legitimate) based on the hidden trends inherent to fraud. It is also important that the data set be balanced in a way that produces a good model. For example, if there are only a few fraudulent transactions in the example, and the rest are non-fraudulent, the resulting predictive model might not be able to accurately differentiate between the two categories. Often times, individual organizations lack a large enough sample of fraudulent transactions, as well as data of different types, needed to generate an adequate predictive model.
A number of different approaches to the problem of fraud detection have been proposed. For example, a technique for automatically designing a fraud detection system using a series of machine learning methods is described in U.S. Pat. No. 5,790,645. U.S. Patent Publication 2005/0182712 teaches a data-driven model for detecting fraudulent behavior where statistically significant data elements are not known in advance.
A primary drawback of existing fraud detection systems and methods is that many enterprises suffer from an inadequate volume or number of business transactions (e.g., fraudulent transactions) needed to generate an accurate predictive model. For instance, a single organization in good financial standing typically lacks a large enough sample of fraudulent transactions with which to generate an adequate model. This constitutes a fundamental barrier to learning the inherent structure of corporate fraud.
What is needed, therefore, is a method and/or system that overcomes the problems inherent in the prior art approaches, and which permits the construction of more accurate predictive models for business problems such as fraud detection.