1. Field of the Invention
The present invention pertains in general to modeling and forecasting of data. In particular, the present invention is directed to evaluating the quality of models designed to forecast data usage and storage requirements.
2. Background of the Invention
A conventional approach to statistical modeling and forecasting is to use a variety of statistics to evaluate model quality and to pick the one statistic that best serves the purposes of the forecaster or data analyst. In other words, the quality and usability of the model is literally in the eyes of the beholder.
This manual, ad-hoc and potentially biased approach ignores the possibility that while some parameters may indicate that a particular model is adequate, another group of parameters may indicate the opposite. Analysts utilizing such models must be aware of possible pitfalls, while at the same time they avoid spending too much of their time determining what the model quality parameters indicate. For example, a model may have found the best-fitted trend, but if it was based on a sample of data that was too small to make any reasonable decisions regarding a trend then the model is inadequate, even if the trend explains most of the variation in the data.
In addition, not only is the conventional approach often confusing, it also may be dangerous in situations where a misleading model/forecast criterion is used. For example, analysts and end users of models and forecasts may make financial decisions based on forecasts, and a misunderstood combination of criteria may drive the decision-maker in the wrong direction.
A conventional approach requires a skilled user to interpret varying data output from a modeling program. Such an approach is not useable by someone not skilled in statistics and modeling. Also, no single parameter can be used to realistically compare various models. While some modeling and forecasting tools, such as BFS ForecastPro or Autobox, execute an automatic selection of models, these tools do not consistently provide or enforce a multiple-parameter quantitative judgment of model quality or relevance.
Some modeling tools on the market iteratively apply a variety of models, such as Exponential Smoothing, Curve-Fitting, or ARIMA (i.e. Box-Jenkins), and pick one that fits best, but this is done based on only one parameter, which is not necessarily the most relevant one, causing the model to be suboptimal. Further, the analyst has a variety of model parameters to tune the model with and a variety of model statistics to judge the quality of the tuning, which creates potential problems, as the analyst now has to judge each of the model parameters and decide what they mean in each particular case. There is no clear best model statistic. For example, statisticians Makridakis and Wheelwright have been conducting the so-called M-competitions of forecasting models and software vendors on large numbers of scenarios since 1982. The latest M-competition (M3) used only four of the possible model quality statistics to judge models, which, while important, are not the only statistics that could be used. Besides that, the M3 competition used these statistics as four separate competition categories, with winners in each category not necessarily being winners in other categories.
When there is a way to recalculate the model with different values for the model controls, several model quality indicators may be examined in order to decide which of the controls may be tuned in order to improve the model and/or forecast quality. That, however, requires a person sufficiently skilled in statistics to make such decisions and sufficient time to keep re-calculating the model with different controls.
The present state of the art does not provide for a higher-level criterion that allows easily evaluation of a model and/or forecast quality; nor is there such a statistic that would formalize the model quality criteria, such that an automatic evaluation is possible.
In addition, because of this absence of a reliable, simple, and easy-to-understand model quality indicator, automation of decision-making regarding the quality of models using the present state of the art is complicated and/or of questionable accuracy.