In financial industry, as an example, understanding the spending patterns of each of a huge number of customers is critical in order to rapidly detect fraudulent transactions so as to mitigate monetary losses. The patterns may be related to, and established based on, the spending time, merchant location, amount and merchant category code (MCC), etc. One of the most prominent fraud models in the industry, the Falcon® model developed by FICO, Inc., has been successfully developed upon historical transaction data. That model is a data-driven model and is trained on a historical transaction dataset that includes transactions and the associated transaction tags which indicate whether the transaction is fraudulent or legitimate. Thus, the quality of the tags is essential for success in operations in the real world.
Such models developed with the complete dataset with tags are commonly called “supervised” models. However, there are many limitations to collecting tags for transactions. The quality of the tags might not be well defined, and in some cases the tags might not be available immediately, for example, in a real time scenario. Thus, the need for access to high quality historical data to develop supervised models poses an obstacle to the development of the trained models. To circumvent this obstacle, various methods, may be used to approximate the tags by grouping the customers based on the similarity between the spending patterns, resulting in poor performance of the supervised models.
In the absence of transaction tags in the dataset, a model might be built using an algorithm to group entities without scoring them. A model developed in such a way is referred to as an “unsupervised” model since the target classes are neither known nor used. In such an approach similar transactions are grouped together while dissimilar transactions are separated into different groups. For example, transactions that are similar in date, time, amount and location, etc., may be grouped together and may share similar characteristics, depending on the grouping scheme used.
What is needed is a method and model to efficiently detect any anomalous behavior in transactions, which is developed upon an adequate processing and understanding of characteristics of the entities involved.