Customer behavior modeling is the creation of a mathematical model to represent the common behaviors observed among particular groups of customers in order to predict how similar customers will behave under similar circumstances. Models are typically based on data mining of customer data, and each model can be designed to answer one or more questions at one or more particular periods in time. For example, a customer model can be used to predict what a particular group of customers will do in response to a particular marketing action. If the model is sound and the marketer follows the recommendations it generated, then the marketer will observe that a majority of the customers in the group respond as predicted by the model.
While behavior modeling is a beneficial tool, access to data can present a significant hurdle in training the model. In particular, models need large datasets in order to be properly trained. Only after a model is properly trained can the model be applied. Previously, models were trained on datasets that include information regarding actual people. These models, generally referred to as original datasets, include real information about real people, including biographical, demographic, and even financial information about the people in the dataset. Much of this information can be sensitive information, and even though the data in the original dataset can be anonymized, the use of original datasets has significant privacy implications.
An example of an original dataset can include customer information for a bank or financial institution. The dataset can include bank account information, types of transactions, asset portfolios, income etc. This information is extremely sensitive, and can be compartmentalized within an institution (e.g., only certain people or groups have access to the dataset) or subject to other usage restrictions. In order to overcome the privacy concerns associated with original datasets, synthetic datasets can be used. Synthetic datasets can include computer generated customer information, which can then be used to train a model. However, generating a dataset that can successfully be used to train a model is difficult, and many synthetic datasets are not suitable for model training.
Thus, it may be beneficial to provide a system, method, and computer-accessible medium for evaluating a synthetic dataset which can overcome at least some of the deficiencies described herein above.