Data may be stored in data warehouses or databases that are several terabytes large, with the size increasing over time. Data mining is a process used to discover interesting patterns in different groups of data stored in the data warehouse. For instance, a data mining tool may aggregate information on customers and their purchases to determine patterns that may provide information that may be used in making marketing decisions to the customers. Data mining tools may extrapolate from aggregated data to predict trends and behaviors to provide knowledge to decision makers.
One concern with processes such as data mining is protecting confidentiality and privacy that may be comprised when a data mining tool is able to aggregate different data elements and derive confidential information or privacy information from the aggregation. In the data mining context, there is the “data inference” problem, where confidential information may be derived by discovering discernible patterns in data, even if the individual data elements alone are not confidential or secured. The pattern discovered from the data combination/aggregation may reveal highly sensitive and confidential information that needs to be protected. Another concern is that there is currently no systematic method to assure that derived data (including those data automatically generated along the path of an Enterprise Data Warehouse (EDW) pipeline) have the appropriate access control level. An EDW integrates data spread across transactional systems into a central repository, against which users may perform business analytics.