Keys play a fundamental role in understanding both the structure and properties of data. Given a collection of entities, a key may represent one or more attribute(s) whose value(s) uniquely identifies an entity in the collection. For example, a key for a relational table may represent a column such that no two rows have matching values in the column. The notion of keys carries over into many other settings, such as XML repositories, document collections, and object databases. Identification of keys is an important task in many areas of modern data management, including data modeling, query optimization, indexing, anomaly detection, and data integration. The knowledge of keys can be used to: (1) provide better selectivity estimates in cost-based query optimization; (2) provide a query optimizer with new access paths that can lead to substantial speedups in query processing; (3) allow the database administrator (DBA) to improve the efficiency of data access via physical design techniques such as data partitioning or the creation of indexes and materialized views; (4) provide new insights into application data; and (5) automate the data-integration process.
Unfortunately, in real-world scenarios with large, complex databases, an explicit list of keys is often incomplete, if available at all.
Keys may be unknown to the DBMS, due to any of the following reasons: (1) the key represents a “constraint” or “dependency” that is inherent to the data domain but unknown to both the application developer and the database administrator (DBA); (2) the key arises fortuitously from the statistical properties of the data, and hence is unknown to the application developer and DBA; (3) the key is known and exploited by the application without the DBA explicitly knowing about the key; (4) the DBA knows about the key but for reasons of cost chooses not to explicitly identify or enforce the key. The unknown keys in a database may represent a loss of valuable information.
Thus, there is a need for an efficient method and system for discovering keys in a database.