Multivariate data analysis can be simply characterized as the study of how two or more factors are related to one another. For example, wind chill is the well known effect of the combination of low temperature and wind speed. Another good example of multivariate data analysis is the study of how “things change over time.” As time passes, children grow to be adolescents and then adults. Crops starting as seeds in the Spring grow into plants that are harvested in the Fall. Scientists who are interested in various fields of endeavor take measurements of “things” that change over time and attempt to draw conclusions from those measurements. Measurements of things that change over time are referred to in this patent as time variant or time series data. Thus, the study of time variant data is a form of multivariate data analysis.
The way in which time-based measurements are taken depends upon what is being measured. For example, each growing season agronomists take plant size measurements, temperature measurements, and precipitation measurements to try and determine what factors contribute to high crop yields. Measurements can also be taken in a more automated way through the use of sensors. The information collected from the sensors can be used to make more or less “real” time adjustments to the systems being monitored.
A good example of this sensor/real time adjustment approach is the modern day automobile engine. Today's car engines have a significant number of sensors and at least one computer controller that analyses the readings (i.e., measurements) of the sensors to make adjustments to the engine's behavior. The interplay between an engine's oxygen sensor and its computer controller is but one example. Researchers determined some time ago that a specific mixture of air and gasoline would yield the least pollution. A mixture with too much gasoline, called a rich mixture, results in fuel being left over after combustion. The excess fuel enters the environment through the exhaust pipe as hydrocarbons, which are considered a pollutant. On the other hand, a mixture with too much air, called a lean mixture, produces nitrogen-oxide pollutants. The problem of course is that the amount of air an engine can pull in depends upon a variety of factors that change over time (e.g., altitude, air temperature, engine temperature, barometric pressure, engine load etc.). To solve this problem, an oxygen sensor is placed in the exhaust system to determine whether the mixture is lean or rich mixtures at various times. The controller gathers the ongoing sensor measurements and adjusts the fuel/air mixture accordingly.
It is easy to see how the analysis of time series data can be used to solve a significant number of problems. What is difficult, though, is identifying patterns within the time series data that permit application of well-known solutions. Said another way, a measurement pattern that points to a problem/solution may be understood, but recognizing the measurement pattern within the time series data is a difficult process. In the prior art there are numerous methods for computing the similarity of two time series data curves. These methods include probablistics models using dynamic curve matching, deformable Markov model templates, and piecewise matching of subcurves. All of these matching methods are complex mathematically and there is no easy way to describe the shapes of the curves using a natural language (such as English). Without a mechanism to specify known shapes of curves using natural language and to compute a similarity measure between arbitrary time series data curves and the known curve shapes, the analysis of time series data will continue to be a difficult, time consuming, and expensive endeavor.