Multi-variate time series data is abundant in a variety of systems. For example, such data is found in biomedical systems, from spatiotemporal signals such as the electroencephalogram (EEG), to temporal patterns of gene expression observed with gene chip arrays. Multi-variate or multi-channel data refers to measurements of multiple variables or channels (of data) across a common variable (i.e., time). Some examples of multi-variate data include: 16 channels representing 16 different locations on a patient's scalp to measure neural activity, expression levels of 40 different genes, or the brightness of pixels in the rows of an image. While many approaches exist for the analysis of multi-variate data, the most common approach involves first decomposing the data into several independent time series (e.g., using Principal Components Analysis (PCA)). The decomposed time series data is then separately analyzed using e.g., autoregressive modeling, spectral analysis, and other linear and nonlinear techniques. Once the data has been decomposed and analyzed, one can speculate about the nature of the underlying sources of the multi-variate time (temporal) series data.
However, this prior art technique is constrained in that decomposition necessarily assumes that the underlying sources are independent. Accordingly, the prior art technique is not appropriate for real systems in which the underlying sources include independent sources and sources with dependent or dynamic relationships. Dynamic relationships are dependencies among sources or within individual sources that change with respect to a common variable, e.g., time.
Conventional decomposition techniques include PC alone or in combination with Varimax Rotations (VR), independent components analysis (ICA), and/or non-negative matrix factorization (NMF). These methods ignore, or do not make use of time series data's causality. Causality is an attribute possessed by some systems, and indicates that one or more components of the system may be the cause of variation in another, perhaps overlapping, set of components. For example, a system that evolves in time possesses causality, in that its behavior at one time may cause variations at a later time. On the other hand, measurements from photographic snapshots of randomly chosen images typically do not possess causality. Another example of causality is that, the value of a stock across time depends on (“is caused by”) what happened in the past. PCA assumes that the sources are mathematically independent. In the sense used by PCA, such mathematically independent sources are referred to as orthogonal sources. ICA requires independence of sources in an information-theoretic sense. Sources which are independent in an information-theoretic sense are those in which the value of one source does not provide any information with respect to determining the value of any other source. The Varimax rotation variation seeks a transformation that maximizes the sum of the variances in each extracted source. NMF attempts to find sources whose weights are non-negative (i.e., only adding, not subtracting, of the sources is allowed).
Although the above conventional techniques are suitable for image analysis or for separating independent time series data, they are unsuitable for multi-variate series systems that originate from dynamical interrelated sources via a mixing process. In particular, these conventional methods assume that the order of data points is irrelevant and thus, produce equivalent results for the original, time-reversed, or randomly shuffled data.
Models for the analysis of multi-variate time series data using multi-variate linear autoregression (MLAR) are also known in the art. One such model is described in Granger, C. W. J., “Investigating causal relations by econometric models and cross-spectral methods”, Econometrica 37(3) (1969), pp. 424–438, herein incorporated by reference. However, although these methods make use of causality, they do not address the problem of identifying sources and can only be applied when the sources are known, i.e., under the assumption that the observed data series are the sources and that no mixing occurs.
Accordingly, it would be beneficial to design a multi-variate data analysis system and method which accounts not only for the causal relationships of the data sources, but also accounts for a mixing between the sources and the observed data series.