Predictive analytics can be a variety of statistical techniques, including predictive modeling, machine learning, and data mining, which analyze current and historical facts to generate predictions about future, or otherwise unknown, events. In business, predictive models detect patterns in historical and transactional data to identify risks and opportunities. Data scientists use predictive models to capture the relationships among a data set's many factors, which enables the assessment of risk or potential associated with a particular set of conditions, and guides decision making for candidate transactions. A predictive model can generate a predictive score (probability) for each individual (customer, employee, healthcare patient, product stock keeping unit, vehicle, component, machine, or other organizational unit) in order to determine, inform, or influence organizational processes that pertain across large numbers of individuals, such as in marketing, credit risk assessment, fraud detection, manufacturing, healthcare, and government operations. A predictive model can be a system that estimates a consequence for something.
Production engineers lack turn-key solutions for operationalizing the results of predictive analytics. Often, the results of a predictive analytics project are created in a training system and then communicated via an email or a specification document, which is to be translated manually into a production system. These methods of communicating predictive analytics results leads to slow turnaround and errors, or in many cases leads to the broader enterprise never adopting insights from a predictive analytics project. One solution to this communication problem may be the exporting of predictive analytics results from a training environment's predictive model to a production environment's predictive model, which may be referred to as a scoring engine and may be deployed to generate predictions for production data sets, such as real-time data streams. However, in many instances, the training environment's computing platform is a completely different type of computing platform than the production environment's computing platform. A computing platform can be the environment in which a software application is executed. Therefore, a predictive model executed on one type of computing platform requires a standardized way to be exported to another type of computing platform that will support another version of the same predictive model, which may be referred to as a platform-agnostic predictive model. The most widely deployed predictive model specification language, which may be used as an interchange format for predictive models, may be the Predictive Model Markup Language (PMML). While the Predictive Model Markup Language is adept at specifying common predictive models, the Predictive Model Markup Language lacks the flexibility to specify the pre-processing of data and the post-processing of data that is required for many predictive models. This lack of flexibility can result in data scientists and/or production engineers manually augmenting a Predictive Model Markup Language file with their own computer executable instructions. To address this manual coding shortcoming, the Portable Format for Analytics (PFA) has been developed as a new interchange format standard for predictive models. The PFA can enable significantly more complex data processing to be described, thereby enabling the complex end-to-end data transformations of predictive models to be encapsulated in a single PFA file.