A predictive model (forecasting model, autoregressive model) is a software-implemented model of a system, process, or phenomenon, usable to forecast a value, output, or outcome expected from the system, process, or phenomenon. The system, process, or phenomenon that is modeled is collectively and interchangeably referred to hereinafter as a “process” unless specifically distinguished where used.
A simulation is a method of computationally looking ahead in the future of the execution of the process to predict one or more events that can be expected to occur in the process at that future time. A predicted event is a value, output, or outcome of the process at the end of a look-ahead period configured in the simulation.
A variable that affects an outcome of a process is called a factor or a feature. A predicted event or an outcome of a process is dependent upon, affected by, or otherwise influenced by a set of one or more factors. A factor can be independent, to wit, independent of and not affected by other factors participating in a given model. A factor can be dependent upon a combination of one or more other independent or dependent factors.
A predictive model has to be trained before the model can reliably predict an event in the future of the process with a specified degree of probability or confidence. Usually, but not necessarily, the training data includes past or historical outcomes of the process. The training process adjusts a set of one or more parameters of the model.
Data emitted by a data source is also called a time series. In statistics, signal processing, and many other fields, a time series is a sequence of data points, measured typically at successive times, spaced according to uniform time intervals, other periodicity, or other triggers.
Time series analysis is a method of analyzing time series, for example to understand the underlying context of the data points, such as where they came from or what generated them. As another example, time series analysis may analyze a time series to make forecasts or predictions. Time series forecasting is the use of a model to forecast future events based on known past events, to wit, to forecast future data points before they are measured. An example in econometrics is the opening price of a share of stock based on the stock's past performance, which uses time series forecasting analytics.
Time series forecasting uses one or more forecasting models to regress on independent factors to produce a dependent factor. For example, if Tiger Woods has been playing golf very quickly, the speed of play is an example of an independent factor. A forecasting model regresses on historical data to predict the future play rates. The future play rate is a dependent factor.
The illustrative embodiments recognize that time series data is not always uniformly distributed, and includes anomalies. For example, if the data pertains to a golfing tournament, the events that occur in the tournament are reflected in the data. The type, spacing, peaking, repetition rate, intensity, duration, and other characteristics of the events are dependent on a variety of factors, and are therefore non-uniformly distributed in the data.
The non-uniformity of the distribution of an event in time series data is referred to herein as an anomaly. For example, that an event in the example golfing data will have a certain value is dependent upon a time of day when that event is occurring, the slope of the course, a weather condition at the time, a skill level of the player, and many other factors that introduce anomalies in the event's data. For example, the event may occur more regularly during midday as compared to evenings; or the event may occur more predictably if a skilled player is playing as compared to when a novice is playing; and so on.
The illustrative embodiments recognize that presently available forecasting models are good at forecasting events when they are uniformly distributed in a time series but are often inaccurate when forecasting those events in anomalous data. Anomalous data is time series data that includes one or more anomalies.
The illustrative embodiments further recognize that not only does an anomaly disturb the uniformity of a given time series data, an anomalous portion of that data itself has variability. In other words, the illustrative embodiments recognize that to forecast an event during an anomalous portion of a future time series, one has to also know which segment of that future portion is being forecasted.
For example, suppose that training data shows that an anomaly causes an event to occur with a linearly increasing value from 8 to 32 during an anomalous period of three hours from noon until 3 PM during an otherwise uniform time series for the remainder of twenty one hours. If the event is to be forecasted during a day next week, to accurately forecast the event during 2-3 PM, one has to know the anomalous behavior of the event not only between noon and 3 PM, but particularly during the 2-3 PM segment of that anomalous period because the anomalous values of the event are different within different segments of the anomalous period as well.
The linear anomaly is described only as a simplified example of anomalous data. The illustrative embodiments recognize that the anomaly in the data can be far more complex, such that a polynomial equation in n-degree is needed to suitably model the anomaly curve. In such cases, not only is the polynomial representation of the anomaly important, but how that polynomial expression changes in different segments of the anomaly curve is important as well.
The illustrative embodiments have already recognized that presently available forecasting models are inaccurate when forecasting events in anomalous data. The illustrative embodiments further recognize that forecasting events in specific segments of anomalous data is even more challenging, and unavailable in presently available forecasting models.