Prediction models are used to forecast future events on the basis of past performance data. For example, prediction models exist for forecasting the performance of publicly-traded stocks based on "indicators", i.e., particular data pertaining to past stock performance. Similarly, prediction models exist for forecasting the response of a patient to a given medical treatment based on such indicators as age, gender, and known medical history, among other things. Also, prediction models exist for forecasting the outcome of horse races based on certain indicators that can be gleaned from the past performance charts of the horses, and so on.
As recognized herein, it is problematic to ascertain whether a successful forecast is due to luck, or due to the effectiveness of the prediction model used to generate the forecast. As an example, if a prediction model forecasts the winner of a particular horse race, it might be that the successful prediction was a result of a "good" model, but it is also possible that the horse actually won for reasons not accounted for by the model. For example, the prediction model might have used the horse's breeding as an indicator for its forecast, when in fact the horse won because of factors unrelated to breeding. In such a circumstance, the model is not "good", only lucky.
Nevertheless, it might be evaluated under such circumstances that the prediction model is a good model, useful for further forecasts, based on one or more lucky yet successful forecasts. Such lucky forecasts can arise easily when, as is common, a given set of past data is used more than once for purposes of attempting to find a "good" prediction model. This data re-use is referred to as "data snooping". When one engages in data snooping there is thus a significant danger that lucky results will be mistaken for good results.
Accordingly, as also recognized herein, to avoid the adverse consequences of data snooping, it is desirable to provide an indication of the statistical significance of a model's performance. As further recognized herein, one way to measure the statistical significance of a model's performance is to compare the model with the performance of a benchmark model, often one that is simple and straightforward. To use the horse racing analogy, a benchmark model against which the performance of other handicapping models might be compared is "always bet on the favorite".
As still further recognized herein, however, it is desirable to understand the statistical significance of prediction models vis-a-vis a benchmark model in the context of more than a single proposed model, to avoid the adverse consequences of data snooping. That is, the present invention recognizes that it is desirable to generate plural prediction models that use differing combinations of indicators, indicator weighting factors, and so on, and then determine the statistical significance of the best of the models relative to the benchmark. Such consideration of a plurality of models is called a "model specification search", or simply a "specification search", and is a form of data re-use. Stated differently, a statistic that represents the statistical significance of a "best" model vis-a-vis a simple benchmark can be misleading, unless a complete specification search is reflected in the statistic. By accounting for a specification search in the statistic, incorrectly positive evaluations of the effectiveness of a prediction model can be avoided.
The present invention accordingly recognizes the need to provide a computer method for evaluating the statistical significance of the best of a plurality of prediction models, vis-a-vis a benchmark model by computing an estimate of a p-value for a test of the formal null hypothesis that a best prediction model has expected performance no better than that of a benchmark, where the p-value is the probability of wrongly rejecting the null hypothesis on the basis of the evidence provided by the data. Another object of the present invention is to provide a method for evaluating prediction models in the context of several computer-generated models. Still another object of the present invention is to provide a method for evaluating prediction models that is easy to use and cost-effective.