One example of a data repository is a “data lake.” A data lake is a centralized data storage system for structured and unstructured data. The data in the data lake can originate from a diverse variety of data sources. A data lake can, by way of example, facilitate agile business queries that advantageously leverage the diverse variety of data sources in order to produce business insight.
However, since data in a data lake can come from a diverse variety of data sources, this can be a problem given that the data from one or more of the sources could be inaccurate. As such, query results generated against such data may not be trustworthy. This could have disadvantageous ripple effects, for example, for a chief data officer whose reputation (and perhaps the reputation of the company) may be tied to the correctness of data-based decisions. In addition, there are other entities outside of the business domain (e.g. regulators) that can levy additional penalties for use of incorrect data.