Retailers and financial institutions typically collect large amounts of data and use business intelligence and analytics to make sense of the data. As the volume of data being generated and analyzed has increased, so has the need for analyzing this data in real-time. This increasingly data rich environment offers tremendous potential for improving business results by making increasingly data driven decisions. This same growth in data volume, and the growing need to make more decisions more rapidly, is pushing retailers to look at new approaches and technology. Many retailers are adopting predictive analytics to extract more meaning from their data. These mathematical techniques process large amounts of historical data to make predictions about the future. They allow a retailers and financial institutions to make probabilistic predictions, such as how likely is this transaction to be fraudulent, how loyal is this customer, what offer will be most effective at increasing basket size for these customers and more. These predictions, these probabilities, can be used to improve decision-making throughout the organization.
Predictive analytics provides an enhanced view of customers and makes predictions about their current and future transaction behavior. This technique can be applied to a myriad of challenges, including customer segmentation, merchandising optimization, information security, marketplace trust and safety, and buyer fraud. However, within many organizations, the promise of predictive analytics is inhibited by a misalignment between technical and data science resources.
FIG. 1 is a block diagram illustrating a traditional analytic development system and a process for runtime predictive analytics based on offline analytics. The traditional analytics system 100 includes an online transaction system 102, and an off-line system 104. Customers 106 of a business entity buy, sell, or otherwise interact with the business entity's online transaction system 102 through a website 108 or a mobile application 110. The solid arrows indicate the traditional process workflow where the resulting online transaction data is stored in a production database 112, copied to the off-line system 104 for processing. The online transaction data is first extracted, transformed, and loaded (ETL) 114 into a data warehouse 116. A data scientist 118 takes the transformed data and uses different statistical tools 120 to build algorithms 122 in the form of models and scores. Examples of the statistical tools 120 that the data scientist 118 may use include R™, SaS™, MATLAB™, and the like.
Once the data scientist 118 builds the algorithms 122, the algorithms 122 are released to a business developer 124 who must map and recode the algorithms 122 into the entity's business logic of the online transaction system 102. In other words, for every piece of data that the data scientist 118 used to build the algorithms 122, the business developer 124 tries to map the data back to the original transactional data. An application developer 126 must also convert the algorithms 122 written with one of the statistical tools 120 into a language used in the online transaction system 102. The resulting code is then released into the online transaction system 102 for use by the customers 106.
To ensure that the models and scores in the algorithms 122 were created correctly, a validation cycle is performed, shown by the dashed arrows, where the online transaction data is collected again, and the data scientist 118 verifies whether the algorithms 122 are working correctly. If any discrepancies are found, the data scientist 118 notifies the business developer 124. The business developer 124 then corrects the problem in the business logic and the cycle continues. The cycle for a medium size complexity model may take anywhere from 3 to 6 months.
There are number of problems with the traditional analytic development systems. One problem is that the data scientist 118 builds the algorithms 122 in off-line system 104 on transaction data that is extracted, transformed and loaded (ETL) from a production database 112. Therefore, the transaction data on which these algorithms 122 are built is often very different from the transaction data in the online transaction system 102. The data scientist 118 then has to work with the business developer 124 to accurately map the data between the online and off-line systems 102 and 104, resulting in many iterations and long lead times to get the algorithms 122 working in production.
Another problem is that the algorithms 122 are written in a mathematically precise statistical language such as R, SaS, MATLAB, and must be translated into languages such as Java, Python, C++, Ruby, and the like, used to implement the business logic used of the online transaction system 102. That means the algorithms 122 must be reinterpreted from a mathematical precise world into a domain-specific world of the business entity, which is a mismatch. This mismatch results in the data scientist 118 having to work with the business developer 124 to accurately convert the algorithms 122. This process results in many iterations and delays in deploying the algorithms 122 developed by the data scientist 118. The above two problems may be typically referred to as the “life cycle” problem.
Yet another problem with traditional analytic development systems that the software components implementing the system are custom built for a specific type of application or use case (domain). The application, business logic and data base schemas are custom built for each application. This makes it very difficult, if not impossible, for a business entity to share data and analytics about user behavior within various departments of the business entity or across different business entities that want to share data and analytics.
A related problem is that different departments within the same business entity may have separate sets of algorithms 122, which are computed and integrated differently. The reason is because there is no common frame of reference that allows a determination of whether the information and analysis from one set of algorithms 122, such as for a loyalty program, are applicable for another set of algorithms 122, such as fraud or risk. The underlying infrastructure does not support different analytical algorithms to be composed and aggregated in a clean manner; rather each one evolves independently because there is no common metadata model.
Accordingly, it would be desirable to provide an improved analytic development system, referred to herein as a real-time predictive analytics platform.