The ability to accurately model and predict events is very desirable, especially in today's business environment. Accurate modeling would help one to predict future events, resulting in better decision making in order to attain improved performance. Because reliable information concerning future trends is so valuable, many organizations spend a considerable amount of human and monetary resources attempting to forecast future trends and analyze the effects those trends may ultimately produce. One fundamental goal of forecasting is to reduce risk and uncertainty. Business decisions depend upon forecasting. Thus forecasting is an essential tool in many planning processes.
Two classes of models are utilized to create forecasting models, exponential smoothing models and autoregressive integrated moving average (ARIMA) models. Exponential smoothing models describe the behavior of a series of values over time without attempting to understand why the values behave as they do. There are several different exponential smoothing models known in the art. Conversely, ARIMA statistical models allow the modeler to specify the role that past values in a time series have in predicting future values of the time series. ARIMA models also allow the modeler to include predictors which may help to explain the behavior of the time series being forecasted.
In order to effectively forecast future values in a trend or time series, an appropriate model describing the time series must be created. Creating the model which most accurately reflects past values in a time series is the most difficult aspect of the forecasting process. Eliciting a better model from past data is the key to better forecasting. Previously, the models chosen to reflect values in a time series were relatively simple and straightforward or the result of long hours and tedious mathematical analysis performed substantially entirely by the person creating the model. Thus, either the model was relatively simplistic and very often a poor indicator of future values in the time series, or extremely labor intensive and expensive with perhaps no better chance of success over a more simplistic model. Recently, the availability of improved electronic computer hardware has allowed much of the modeling aspects of forecasting to be done rapidly by computer. However, prior computer software solutions for forecasting were restricted because the number of models against which historical data were evaluated was limited and typically low ordered, although potentially there is an infinite number of models against which a time series may be compared.
Modeling is further complicated because finding the best model to fit a data series requires an iterative data analysis process. Statistical models are designed, tested and evaluated for their validity, accuracy and reliability. Based upon the conclusions reached from such evaluations, models are continually updated to reflect the results of the evaluation process. Previously, this iteration process was cumbersome, laborious, and generally ineffective due to the inherent limitations of the individuals constructing the models and the lack of flexibility of computer-based software solutions.
The model building procedure usually involves iterative cycles consisting of three stages: (1) model identification, (2) model estimation, and (3) diagnostic checking. Model identification is typically the most difficult aspect of the model building procedure. This stage involves identifying differencing orders, the autoregression (AR) order, and the moving average (MA) order. Differencing orders are usually identified before the AR and MA orders. A widely used empirical method for deciding differencing is to use an autocorrelation function (ACF) plot in a way such that the failure of the ACF to die out quickly indicates the need for differencing. Formal test methods exist for deciding the need for differencing, the most widely used of such methods being the Dickey-Fuller test, for example. None of the formal test methods, however, works well when multiple and seasonal differencings are needed. The method used in this invention is a regression approach based upon Tiao and Tsay (1983). The Dickey-Fuller test is a special case of this approach.
After the series is properly differenced, the next task is to find the AR and MA orders. There are two types of methods in univariate ARIMA model identification: pattern identification methods and penalty function methods. Among various pattern identification methods, patterns of ACF and partial autocorrelation function (PACF) are widely used. PACF is used to identify the AR order for a pure AR model, and ACF is used to identify the MA order for a pure MA model. For ARIMA models where both the, AR and MA components occur, ACF and PACF identification methods fail because there are no clearcut patterns in ACF and PACF. Other pattern identification methods include the R and S array method (Gary et al., 1980), the corner method (Begun et al., 1980), the smallest canonical correlation method (Tsay and Tiao, 1985), and the extended autocorrelation function (EACF) method (Tsay and Tiao, 1984). These methods are proposed to concurrently identify the AR and MA orders for ARIMA models. Of the pattern identification methods, EACF is the most effective and easy-to-use method.
The penalty function methods are estimation-type identification procedures. They are used to choose the orders for ARMA(p,q)(P,Q) model to minimize a penalty function P(i,j,k,l) among 0≦i≦I, 0≦j≦J, 0≦k≦K, 0≦I≦L. There are a variety of penalty functions, including, for example, the most popularly used, AIC (Akaike's information criterion) and BIC (Bayesian information criterion). The penalty function method involves fitting all possible (1+1)(J+1)(K+1)(L+1) models, calculating penalty function for each model, and picking the one with the smallest penalty function value. Values I, J, K and L that are chosen must be sufficiently large to cover the true p, q, P and Q. Even the necessary I=J=3 and K=L=2 produce 144 possible models to fit. This could be a very time consuming procedure, and there is a chance that 1, J, K, L values are too low for the true model orders to be covered.
Although identification methods are computationally faster than penalty function methods, pattern identification methods cannot identify seasonal AR and MA orders well. The method in this invention takes the pattern identification approach for identifying non-seasonal AR and MA orders by using ACF, PACF and EACF patterns. The seasonal AR and MA orders are initialized as P=Q=1 and are left to the model estimation and diagnostic checking stage to modify them.
Thus, there is a need for a system and method for accurately fitting a statistical model to a data series with minimal input from an individual user. There is a further need for a more flexible and complex model builder which allows an individual user to create a better model and which can be used to improve a prior model. There is also a need for a system and method for performing sensitivity analyses on the created models.