This disclosure relates generally to fraud analytics, and more particularly to adaptive analytics system and methods with automatic variable creation.
Predictive models are typically trained on labeled/tagged historical data, often using supervised learning algorithms and a fixed set of pre-defined variables, and then the models are used to evaluate unlabeled future data. Distributions of data (fraud and non fraud transactions) evolve/change over time, and these changes cause model performance to degrade. Recent advances in fraud detection models involve online/real time/adaptive learning models that are capable of updating their parameters over time when changing distributions of fraud and non-fraud data are encountered in production. This is generally done by periodically retraining the adaptive model in an automated method that updates model parameters based on current fraud/non-fraud data in production. This helps prevent degradation of model performance and allows the fraud model to adapt its model weights to new fraud behaviors in production, compared with static weights in a base model that has been trained on a fixed dataset of historical data.
Performance of any model depends on the quality of labels/tags, and also on feature detector variables derived from the data used during training. Labels/tags, during training, allow the model to learn to differentiate to which class (fraud/non fraud) a particular transaction or state of the account belongs. Effective feature detector variables are inputs to the models and enable the models to separate the two classes. Creating meaningful feature detector variables is fundamentally important, because without them the model will not be able to separate classes (fraud from non-fraud) well. Model variables are typically created manually, relying on domain knowledge of experts and validated against historical data of fraud and non-fraud. This approach to defining variables often suffers from limitations of the expert, preventing exploration of all possible dimensions of variables to best classify fraud and non-fraud. Having a data-driven method to define variables has long been a need in the development of predictive models, as has finding an automated method to run in production environments and to couple with an adaptive analytics model.