In dealing with large amounts of data, data correlation is beneficial because it facilitates the discovery of useful relationships among data associated with certain operations (for example, manufacturing processes, delivery systems and the like). Once discovered, these relationships are often used to improve the associated operations.
Data correlation provides information that can be used for preemptive problem identification and performance optimization. For example, data correlation is often applied on business activity log data to discover correlations among business objects (e.g., how one business object affects other business objects) that can be used to better understand performance issues and thus improve business performance.
One type of data that is often analyzed or correlated is enumeration data, which is data capable of being arranged in a list. Data field entries that comprise enumeration data take one of a limited number of values that can easily be categorized for analysis. For example, a data field used for storing customer names and containing only a few hundred unique data values can easily be categorized as enumeration data. A correlation analysis on such discrete data can yield results like: “When customer name is customer1 then product name is Printer with 60% probability.”
Another type of data is numeric data, which is data that can be expressed in numerical terms. Automatically discovering data correlations among discrete enumeration data is relatively easy compared to automatically discovering data correlations among numeric data. This is true because the search space (the number of data points to be compared) is much smaller for discrete data. The discovery of correlations among numeric data sequences typically involves similarity queries. In other words, a database is queried to identify numeric data sequences that meet a given query sequence.
It is difficult to compare numeric data streams with discrete event occurrences using existing techniques because numeric data and discrete data are not comparable. What is needed is a data correlation solution that facilitates the comparison of changes in numeric data streams with discrete event occurrences.