1. Field of the Invention
The invention generally relates to computer database systems. More particularly, the invention relates to techniques for determining query entities for an abstract database from a physical database table.
2. Description of the Related Art
Databases are well known systems for storing, searching, and retrieving information stored in a computer. The most prevalent type of database used today is the relational database, which stores data using a set of tables that may be reorganized and accessed in a number of different ways. Users access information in relational databases using a relational database management system (DBMS).
Each table in a relational database includes a set of one or more columns. Each column typically specifies a name and a data type (e.g., integer, float, string, etc), and may be used to store a common element of data. For example, in a table storing data about patients treated at a hospital, each patient might be referenced using a patient identification number stored in a “patient ID” column. Reading across the rows of such a table would provide data about a particular patient. Tables that share at least one attribute in common are said to be “related.” Further, tables without a common attribute may be related through other tables that do share common attributes. A path between two tables is often referred to as a “join,” and columns from tables related through a join may be combined to from a new table returned as a set of query results.
A relational database query may specify which columns to retrieve data from, how to join the columns together, and conditions (predicates) that must be satisfied for a particular data item to be included in a query result table. Current relational databases require that queries be composed in complex query languages. Today, the most widely used query language is Structured Query Language (SQL). However, other query languages are also used. A SQL query is composed from one or more clauses set off by a keyword. Well-known SQL keywords include the SELECT, WHERE, FROM, HAVING, ORDER BY, and GROUP BY keywords. Composing a proper SQL query requires that a user understand both the structure and content of the relational database as well as the complex syntax of the SQL query language (or other query language). The complexity of constructing an SQL statement, however, generally makes it difficult for average users to compose queries of a relational database.
Because of this complexity, users often turn to database query applications to assist them in composing queries of a database. One technique for managing the complexity of a relational database, and the SQL query language, is to use database abstraction techniques. Commonly assigned U.S. Pat. No. 6,996,558, entitled “Application Portability and Extensibility through Database Schema and Query Abstraction,” discloses techniques for constructing a database abstraction model over an underlying physical database.
U.S. Pat. No. 6,996,558 discloses embodiments of a database abstraction model constructed from logical fields that map to data stored in the underlying physical database. Each logical field defines an access method that specifies a location (i.e., a table and column) in the underlying database from which to retrieve data. Users compose an abstract query by selecting logical fields and specifying conditions. The operators available for composing conditions in an abstract query generally include the same operators available in SQL (e.g., comparison operators such as =, >, <, >=, and, <=, and logical operators such as AND, OR, and NOT). Data is retrieved from the physical database by generating a resolved query (e.g., an SQL statement) from the abstract query. Because the database abstraction model is tied to neither the syntax nor the semantics of the physical database, additional capabilities may be provided by the database abstraction model without having to modify the underlying database. Thus, the database abstraction model provides a platform for additional enhancements that allow users to compose meaningful queries easily, without having to disturb existing database installations.
In some situations, a user may need to use only a small portion of the data stored in a table and find it difficult to work with the data in the entire table. For example, a medical researcher may desire to review data from patient blood tests stored in a table of a hospital database. However, in practice, the table storing the blood test data will typically store data related to a large variety of medical tests conducted at the hospital, with the blood test data contributing to only a small percentage of the total records of the table. In such a case, a record in the tests table for a blood test may include values only in the relevant columns of the test table, while other columns are set to null. Due to the large number of records which are not useful to the researcher, she may have difficulty in finding and interpreting the needed records and also find that queries of the results data are unacceptably slow to execute.
One solution to this problem is to provide users with narrowly-tailored access to the portion of the data that is relevant to their needs. In the case of data stored in a physical table, a database view may be used to provide this functionality. In the case of a database abstraction model, a query entity may be used. A query entity is an object created by the database abstraction model that is used for storing data, and is typically configured with a particular data structure. For example, a query entity at the logical level labeled “blood tests” may map to the physical level using an SQL statement that generates a table having only records from the test table that are related to blood tests. In turn, a logical field may have an access method that references the “blood tests” query entity. More generally, a query entity provides a data storing object that may be referenced in an abstract query or by other objects of the data abstraction model. Thus, from a user's perspective, a query entity behaves somewhat like a table in a physical database.
A query entity may be defined in terms of the structure of the physical database, such as an SQL statement used to generate the query entity as needed. Conventionally, a query entity may be defined manually by a person who has technical training as well as knowledge of a particular database structure (e.g., a system administrator with knowledge of the example “tests” table). Often, however, there are a large number of columns in a table, with some columns used only with certain groups of records. For example, in the “tests” table discussed above, a blood test record may include only data in only a few of the overall columns of the tests table. In such a case, creating a query entity requires examining each field to determine whether it should be included in the query entity. This process may have to be repeated many times over before a complete set of query entities is defined from even a single table. These problems mean that creating query entities can be a costly and time-consuming process. Further, this process may have to be repeated each time the underlying structure of the database is modified.
Accordingly, there remains a need in the art for techniques for determining query entities for an abstract database from a physical database table.