In recent years, traffic flow prediction has been playing an important role in Intelligent Transportation Systems (ITS), which provides decision support for intelligent personal route planning as well as for transportation administration.
Early studies are mainly focused on traffic flow prediction based on single time series. The prediction models can be sorted into two categories: parametric and nonparametric methodologies. For parametric models, seasonal ARIMA (Autoregressive Integrate Moving Average) is the most widely used method (see reference [1]), which aims to approach minimum squared error (MSE) for traffic flow prediction over single time series. As for nonparametric models, Nearest Neighbor Method is regarded as an alternative solution to ARIMA (see reference [2]). However, its performance is subject to the quality of the historical data. Overall, traffic flow prediction based on single time series simply takes into account the characteristics of the time series itself but neglects the interactions and relations among different time series.
Since the evolution of traffic flows is the outcome of the interactions among the traffic flows at all the nodes in the road network of interest, the relations across the nodes should be taken into account for traffic flow prediction. Correspondingly, the recent trend has been shifted to study the multi-variable prediction models based on spatio-temporal correlations among traffic data. The prevailing methods can be sorted into 3 categories: (1) State space model or Kalman filter (see reference [3]); (2) Machine learning such as Neural Networks (see reference [4]); (3) Time series methods such as Vector Autoregressive Moving Average (VARMA) module (see reference [5]). Nevertheless, determining spatio-temporal correlation is essential for multi-variable traffic flow prediction. In previous studies, the spatio-temporal correlated sensors are determined empirically and manually, and confined within the neighborhood around the target node to some extent. Such a scheme to select input variables is too subjective to approach the best performance in terms of prediction due to the less consideration of the reality, say, the spatio-temporal correlations among traffic data. Moreover, the variable selection based on human experience cannot be generalized to be applicable to large-scale road networks.
As a mathematical tool, sparse representation has been applied to signal processing very early such as signal compression, image deblurring, and feature extraction. Certain embodiments of the present invention aim to apply it to spatio-temporal correlation mining for traffic flow prediction. The basic idea of sparse representation is as follows. A signal y can be represented as a linear combination of K primitives {d1,d2, . . . ,dj, . . . ,d K} in a dictionary D, that is, y=Dx, where y∈Rn, dj∈Rn, and D∈Rn×K. Approximately, it can be represented as y≈Dx, where ∥y−Dx∥22≦ε0 and x∈RK are the coefficients to reconstruct y. Sparse representation aims to reconstruct y with as few as possible primitives, that is, x should contain as less as possible nonzero coefficients to render the linear combination. Hence, the objective to be optimized in the sense of sparse representation can be formulated as
            x      ^        =                            arg          ⁢                                          ⁢                      min            x                          ||        x        ⁢                  ||          0                ⁢                                  ⁢                  subject          ⁢                                          ⁢          to          ⁢                                          ⁢          y                    =      Dx        or            x      ^        =                  arg        ⁢                                  ⁢                  min          x                    ||      x      ⁢              ||        0            ⁢                          ⁢              subject        ⁢                                  ⁢        to            ⁢                          ||              y        -        Dx            ⁢              ||        2        2            ⁢              ≤                  ɛ          0                    where ∥x∥0 means the l0 norm of x, namely, the number of nonzero elements contained in vector x. Since a couple of optimization methods have been developed in the context of sparse representation [6], which are able to select the corresponding primitives from the dictionary in correspondence to the nonzero coefficients in a fully automatic manner, certain embodiments of the present invention employ such methods to discover the spatio-temporal correlations among traffic data so as to determine the correlated sensors in the whole road network that are highly contributive to the prediction task to be performed at the target sensor, and apply the data collected from such correlated sensors as the input to the predictor for the sake of traffic flow prediction.