Analyzing data for correlations is a difficult and time consuming task. For example, a user may want to determine if the values of one field or attribute of a set of records is correlated with another value, field, or attribute of the record. Currently, a user may have a hunch or insight into a possible correlation between attribute pairs, and may then perform various calculations on the data set to determine if the attributes of the pair are indeed correlated.
This solution is problematic because users guess or speculate on possible correlations before performing the correlation calculations. Such calculations are time consuming and processor intensive, especially with data sets that often include thousands or even millions of records. Further, such problems are further exacerbated when attempting to determine correlations between attributes of different tables or data sets.
There are two potential limitations of standard statistical tests to discover correlations: (1) such tests usually employ a threshold value to ascertain correlation; results below the threshold are deemed uncorrelated, however, there is no automatic way of choosing the thresholds for all application domains; (2) the statistical tests may also fail to discover partial correlations (when only a subset of the records displays the correlation).