Relational database systems store data in tables organized by columns and rows. Generally, the tables are linked together by relationships that simplify the storage of data and make complex queries against the database more efficient. Structured Query Language, (SQL) is a standardized language for creating and operating on relational database systems.
A relational database system can include an “optimizer” that plans the execution of SQL queries. For example, if a query requires accessing or “joining” more than two elements or two tables, the optimizer will select the order that the tables are joined to produce the requested result in the shortest period of time or to satisfy some other criteria.
In some cases, a database administrator will define “join paths” to be used that contain one or more frequently accessed columns from one or more tables. The optimizer has a choice of accessing columns or paths, or can add additional “join paths” that can be followed in query execution.
Generally, an optimizer will not identify the top-few answers, or highest quality information, within a join path framework for a very large database without taking a lot of expensive and costly time to evaluate the data, analyze the data in relation to possible data quality problems, such as data inconsistencies or default values, and, then, determine the top-few answers.
Thus, a need exists for an efficient and cost-effective system and method for locating and providing high quality information, such as the top-few answers or solutions to a problem, across multiple, large databases in the presence of data quality problems. Further, a need exists for a system and method for evaluating data quality within and across a large relational database system with respect to specified integrity constraints for identifying data quality problems in order to provide the high quality information or answers unaltered by data quality problems.
The present embodiments meet these needs.