1. Field of the Invention
The present invention generally relates to data processing and, more particularly, to managing relationships between database tables.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses.
A relational database management system (RDBMS) is a computer database management system that uses relational techniques for storing and retrieving data in/from tables in a relational database. A RDBMS can be structured to support a variety of different types of operations for a requesting entity (e.g., an application, the operating system or an end user). Such operations can be configured to retrieve, add, modify and delete information being stored and managed by the RDBMS. Standard database access methods support these operations using high-level query languages, such as the Structured Query Language (SQL). The term “query” denominates a set of commands that cause execution of operations for processing data from a stored database. Any requesting entity, including applications, operating systems and, at the highest level, users, can issue queries against data in a database to obtain required information. Queries may be predefined (i.e., hard coded as part of an application) or generated in response to input (e.g., user input). Upon execution of a query against a database, a query result is returned to the requesting entity. The requesting entity may thus analyze the query result to identify required information therefrom.
One of the most common executed SQL queries is the SELECT statement. A SELECT statement generally has the format: “SELECT<clause> FROM<clause> WHERE<clause> GROUP BY<clause> HAVING<clause> ORDER BY<clause>”. The clauses must generally follow this sequence. Only the SELECT and FROM clauses are required and all other clauses are optional. The result of a SELECT statement is, in general, a subset of data retrieved from one or more existing tables stored in a relational database. The subset of data defines the query result which is treated as a new table, termed the result table. The WHERE clause determines which rows should be returned in the result table. Generally, the WHERE clause contains one or more query conditions that must be satisfied by each row returned in the result table. The FROM clause identifies the name of the existing table(s) from which the result table is being determined. Thereby, the FROM clause may define an implicit join operation. More specifically, a given SQL query may not contain a specific join keyword or statement, but may simply be configured to select data from multiple relevant database tables.
For example, assume a user in a hospital who wants to determine all medical tests which have been performed on a given patient “Bob” and all diagnoses that have been established for this patient. To this end, the user may specify the following exemplary SQL query of Table I below:
TABLE IEXEMPLARY SQL QUERYSELECTT1.ID, T1.Name,T2.Value AS Test,T3.Value AS DiagnosisFROMDemographic T1,Test T2,Diagnosis T3WHERET1.Name = ‘Bob’ ANDT1.ID = T2.ID ANDT1.ID = T3.ID
In the given example of Table I, the FROM clause of the exemplary SQL query defines an implicit join operation with respect to the database tables “Demographic” (as T1), “Test” (as T2) and “Diagnosis” (as T3). The WHERE clause indicates the columns (i.e., “T1.ID”, “T2.ID” and “T3.ID”) through which the tables to be joined (i.e., “Demographic”, “Test” and “Diagnosis”) are linked. In other words, the tables “Demographic”, “Test” and “Diagnosis” are related to each other by predefined relationships defined by the respective “ID” columns.
More generally, the tables in any relational database are related to each other according to predefined relationships. Frequently, such relationships are defined as primary key—foreign key relationships specified by an administrator or developer of the relational database. Thus, upon receipt of a given query against a relational database, relevant tables from the relational database can be joined according to the predefined relationships in order to produce interesting data requested by the given query. However, in order to enable a corresponding requesting entity to correctly formulate the given query, the corresponding requesting entity must be aware of the relationships between the relevant tables. Therefore, the administrator may create relations describing the predefined relationships which can be used by the requesting entity for the creation of the given query.
One difficulty when defining relations between different database tables is that the process of creating such relations is often time consuming, especially if a given relational database includes a great number of database tables. For instance, assume a relational database that is organized in a star schema or a snowflake schema. A star schema is a database schema having a single root table connected to a multiplicity of linked tables. A snowflake schema corresponds to several connected star schemas. In a database that is organized in a star schema, the number of relationships between the tables is normally only one less than the number of tables. Accordingly, if the database includes thousands of tables, there are thousands of relationships and, thus, thousands of relations to create. In a database that is organized in a snowflake schema, the number of relationships between the tables can be higher than the number of tables as in the snowflake schema a given table can be linked to more than one other table. However, with so many relations to create it is quite easy for the administrator or developer to make a mistake causing relations to be missing or tables being joined in multiple ways. Furthermore, if primary key—foreign key relationships are used to create the relations, false referential integrity constraints may occur at a later stage if the primary key column is deleted. Such missing or wrong relations may then lead to incorrect query results as tables cannot be linked correctly in execution of a given query.
Therefore, there is a need for an efficient technique for identifying and describing relations between tables in a database.