Businesses are increasingly using testing in order to learn about an initiative's effectiveness. The businesses will try an idea on a subset of the customers (e.g., test group) and analyze the customer performance within the test group in comparison to the customer performance of the rest of the customers who did not receive the treatment idea (e.g., potential control dataset). In one conventional example, the business is a retailer, and the test group is a set of customers who received a promotional service. The business in this example, may study the behavior of the customers within the test group to customers who did not receive the promotional service. Conventionally, analyzing the customer behavior has been accomplished using a “brute force” method of comparing the test group customers to every customer who did not receive the treatment. However, the number of customers within this potential control dataset may be very high, which may make analysis burdensome and tedious. Furthermore, studying the behavior of the potential control group customers may not be as beneficial if they do not resemble the customers within the test dataset.
As the processing power of computers allow for greater computer functionality and the Internet technology era allows for interconnectivity between computing systems, many institutions use computers to generate control group datasets. However, since the implementation of these more sophisticated online tools, several shortcomings in these technologies have been identified and have created a new set of challenges. Existing and conventional methods fail to provide fast and efficient analysis due to a high volume of customer information existing on different networks and computing infrastructures. Managing such information on different platforms is difficult due to number, size, content, or relationships of the data associated with the customers. For example, for a customer potential control dataset of only 100,000 customers, there may be an extremely high number of different combinations of control customers who may or may not resemble the customers within the test dataset. Conventional methods may take hours or even days to complete the analysis because there is often not enough processing power and time to search the entire potential control dataset or generate an optimized control dataset (e.g., a subset of the potential control dataset, which resembles the test dataset). As a result, existing and conventional methods also produce incomplete and inaccurate results.