Data repositories create a centralized location for data that can facilitate agile business or other queries and analytics by leveraging a diverse variety of data sources in order to produce business or other insight. Some common types of data repositories that a business or some other entity may maintain include, but are not limited to, data lakes, data warehouses, and data marts. A data lake is typically considered to be a centralized data storage system for structured and unstructured data. A data warehouse is typically considered to be a centralized data storage system for integrated data from one or more disparate sources. A data mart is typically considered to be a simpler data warehouse focused on a single subject.
However, data stored in any type of data repository can come from a diverse variety of data sources, and this can be a problem given that the data from one or more of the sources could be inaccurate and thus not trustworthy. As such, query results generated against such data may not be trustworthy. Furthermore, while the underlying storage infrastructure of a data repository maintained by an entity may be trusted (i.e., because the data repository is controlled by the entity), the storage infrastructure of the sources from which the data in the data repository came may or may not be trusted.
Data that is not trustworthy could have disadvantageous ripple effects, for example, for a chief data officer whose reputation (and perhaps the reputation of the company) may be tied to the correctness of data-based decisions. In addition, there are other entities outside of the business domain (e.g. regulators) that can levy additional penalties for use of incorrect data.