An ever increasing amount of data and data sources are now available to researchers, analysts, organizational entities, and others. This influx of information allows for sophisticated analysis but, at the same time, presents many new challenges for sifting through the available data and data sources to locate the most relevant and useful information in predictive modeling. As the use of technology continues to increase, so, too, will the availability of new data sources and information.
Analysis of large amounts of data can provide insights into the relationship between past and future events. Predictive models, built using historical data, can be applied to current data sets in an attempt to predict future outcomes or events. To effectively predict a specific event, a model must identify specific data points or features that indicate that the target event might occur. Because of the extensive amount of available data, however, determining which specific features of the existing data are relevant poses significant challenges. Additionally, different domains can have different relevant indicators.
Moreover, a predictive model must be generic enough to effectively apply to a wide variety of future data sets and, at the same time, specific enough to provide accurate prediction. Striking the balance between high model performance and generalizability to new data is especially challenging when there are many millions or billions of features and many different types of models that need to be built.
While current predictive models can be built using analysis, research, existing publications, and discussions with domain experts, this process can be resource and time intensive. Further, while the produced model may be effective for predicting a specific event, the time and resources necessary to produce similar predictive models for many thousands of additional events is not feasible. Currently, there is a need for accurate and efficient generation of predictive data models that can apply across domains and indicate what specific features of existing data most effectively predict a future event.