In many instances, such as business intelligence (BI) and risk analysis, correlation in historic data and assumed correlation in future data are considered to provide an analysis result. For example, investigating the effect of data dependencies on business developments can help prevent inaccurate estimations of risk and opportunity in a business context. Consequently, correlation is an important factor when analyzing data, and uncertain data in particular.
In some instances, correlation assumptions can be provided using techniques based on copula functions. Copula-based techniques enable analysts to introduce arbitrary correlation structures between arbitrary distributions and calculate relevant measures over the arbitrarily correlated data. To apply such techniques, correlation structures can be represented and processed in an approximate fashion. In some examples, using a parametric construction of approximate correlation representations (ACRs), purely assumed correlation patterns can be applied and represented.
In some instances, it may be necessary to extract correlation from historic data. In this manner, for example, the extracted correlation pattern can be applied to uncorrelated data, for which a similar correlation is assumed. In order to extract correlation information from historic data, a correlation coefficient can be extracted, whose computation is supported by default database systems. Generally, databases support only the computation of a scalar correlation coefficient (e.g., Pearson's correlation, Spearman's Rho) from a set of sample data. Consequently, correlation extraction using such methods is limited to linear correlations calculated under the limited assumption that the underlying marginal distributions are (close to) normal (otherwise, the computed correlation information is not correct). Another challenge is that the extraction and storing of correlation structures may lead to large storage, access, and processing costs.