Correlation analysis measures the relationship between two data items, for example, a security's price and an indicator or the sales of jam and bread. The resulting value (called the “correlation coefficient”) is a measure of how a change in one data item (e.g., an indicator) will likely result in a change in the other data item (e.g., the security's price), where the correlation coefficient can range between ±1.0 (plus or minus one). A positive correlation means that positive changes in one of the data items will likely result in a change in the positive direction in the other data item. Conversely, a negative correlation means that positive changes in one of the data items will likely result in a change in the negative direction in the other data item. Such correlation analysis is used throughout the business and scientific community to identify items that relate to one another. For example, it can be established that the sales of jam and bread are related to one another because they have a high correlation.
However, correlation analysis is limited in that it only establishes the degree of correlation in which two items relate mathematically. For example, the correlation analysis involving the items of jam and bread only indicate the degree in which such items are related mathematically. Correlation analysis does not provide any indication regarding how interesting is the correlation between the data items. For example, it may be more interesting to the user that there is a relationship between the sales of jam and the number of tornado warnings in a geographical region than the correlation between the sales of jam and bread even though the correlation between the sales of jam and the number of tornado warnings is less than the correlation between the sales of jam and bread.
As a result, there have been approaches to evaluate the “interestingness” of the correlation of data items. However, such approaches rely on the use of statistical metrics, such as support and confidence, to refine the correlations and discard statistically uninteresting results. These approaches fail to consider how these data items relate in the physical real world based on the data items themselves. That is, these approaches fail to surface correlations between data items that are not obvious to users and are unexpectedly correlated since these approaches rely solely on mathematical analysis.