1. Technical Field
Present invention embodiments relate to determining key relationships between database objects, and more specifically, to determining composite keys between database objects based on a sampled data set of the database objects.
2. Discussion of the Related Art
A composite primary-foreign key relationship for database tables S (having columns C1, C2, C3, C4, . . . Cn) and T (having columns D1, D2, D3, D4, . . . Dm) is a subset of the columns from database tables S and T subject to the following conditions. Initially, the subset of columns (Ci, Cj, Ck, . . . Cr) from database table S compounded together is a primary key for database table S. The selectivity of the subset of columns (forming the primary key) is high (ideally 100%). In other words, the column subset (forming the primary key) on the primary database table S has distinct values for every row, or most rows, of database table S.
Further, for every row in database table T, there is a corresponding row in database table S that has corresponding values (i.e., Ci=Da, Cj=Db, Ck=Dc . . . Cr=Df). In other words, the foreign hit rate of a composite key (for a primary-foreign key relationship) is high (ideally 100%).
The composite key relationships are generated by executing Structured Query Language (SQL) queries repeatedly on the entire database tables and analyzing the query results. However, since this approach is time consuming and does not scale, users can only discover these relationships between smaller database tables, or are required to manually create sample sets from larger database tables. Datasets with larger database tables (greater than one million rows) are routine, and for these datasets, it is very difficult to generate the relationships accurately and without copious amounts of manual work.