The present disclosure relates to a method for identifying denial constraints, the denial constraints being used in multiple applications, including but not limited to for use in reviewing data, sanity checking data, identifying errors and/or correcting errors in data stored in a database.
As businesses generate and consume data more than ever, it is critical to enforce and maintain the quality of the data assets. One in three business leaders does not trust the information used to make decisions, since establishing trust in data becomes a challenge as the variety and the number of sources grow. For example, in healthcare domains, inaccurate or incorrect data may threaten patient safety. Therefore, data review, validation and/or cleaning is a task towards improving data quality, which is estimated to account for 30%-80% of the cost of a typical data warehouse project.
Integrity constraints (ICs), originally designed to improve the quality of a database schema, have been recently repurposed towards improving the quality of data, either through checking the validity of the data at points of entry, or by cleaning the data at various points during the processing pipeline. Traditional types of ICs, such as key constraints, check constraints, functional dependencies (FDs), and their extension conditional functional dependencies (CFDs) have been proposed for data quality management. However, there is still a large space of ICs that cannot be captured using the aforementioned types.