1. Field of the Invention
The present invention is related to computer databases. More specifically, the present invention is related to methods of organizing data stored in a computer database and related to query processing techniques.
2. Description of the Related Art
Databases are well known systems for information storage and retrieval. The most prevalent type of database in use today is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A relational database management system (DBMS) uses relational techniques for storing and retrieving data.
Structured Query Language (SQL) is a well known database language that that includes commands for retrieving, storing, updating, and deleting data stored in a relational database. An SQL query is constructed as a text string that must strictly conform to the grammar requirements of the SQL query language. Further, an SQL query must be semantically correct to perform as desired by the user. That is, many syntactically correct SQL statements may fail to perform as desired due to semantic errors. Because of this complexity, database query applications are often used to assist a user in composing an SQL query of a relational database.
For example, one technique for managing the complexity of a relational database and the SQL query language is to use database abstraction techniques. Commonly assigned U.S. patent application Ser. No. 10/083,075 (the '075 application) entitled “Application Portability and Extensibility through Database Schema and Query Abstraction”, discloses a framework that provides an abstract view of a physical data storage mechanism. The framework of the '075 application provides a requesting entity (i.e., an end-user or front-end application) with an abstract representation of data stored in an underlying physical storage mechanism, such as a relational database.
Using the database abstraction model, logical fields are defined that describe the data stored in an underlying database. Users compose abstract queries by selecting logical fields and specifying conditions. Each logical field includes an access method that specifies how to retrieve data from an underlying database system corresponding to the logical field. The system then retrieves data from the underlying database by executing a resolved query generated from the abstract query.
Commonly assigned, U.S. patent application entitled “Abstract Query Plan” discloses techniques for processing an abstract query that include generating an intermediate representation of an abstract query which is then used to generate the resolved query. The intermediate representation may include a combination of elements taken from the database abstraction model and from the physical representation. To generate the intermediate representation while processing an abstract query, the database abstraction model must identify which data sources (e.g., tables) from the underlying database to include in the intermediate representation. In addition, the data abstraction model must identify how many copies of each data source are required. For example, an abstract query may specify that multiple conditions should be applied to data from the same relational table. Separate copies of the table allow the system to process each condition independently.
Once created, the database abstraction model provides a powerful interface that allows users to compose an abstract query consistent with the intuitive logical relationships about data being queried, regardless of the type or complexity of the underlying database storage mechanism. Users compose queries of the abstraction, rather than of the underlying database. Ultimately, however, the abstract query must be translated/resolved into a query of the underlying database (e.g., an SQL query). Thus, the database abstraction model must be able to interpret the conditions specified by the query and create a resolved query that includes the conditions.
Further, relational databases represent data using rows and columns. Accordingly, an SQL query retrieves a row set (i.e., a table) of values that satisfy conditions specified by the query. Oftentimes, a row set is simply displayed to a user as an unformatted table. One problem with this approach is that data in different columns of the row et are treated equally. That is, a row set fails to capture the one-to-one and one-to-many relationships logically present in the data. For example, consider a row set of medical test results for a given patient. Each row of the row set duplicates data elements, such as, the name, gender, and patient ID for each row that includes a single test result. The following table illustrates this effect of row-based query results.
TABLE IRow Based Query ResultsPatient IDLast NameCurrent AgeTest Results1Smith27231Smith27281Smith27212Jones18332Jones1835
More generally, a row set represents query results as the data is stored in a relational database, and not as a set of interwoven facts and relationships that data represents to a user of the database. Such relationships are presented by the database abstraction model at the “front-end”, when an abstract query is composed from logical fields, but not at the “back-end,” when query results are presented as a row-set of data represented as a table. Users however, understand these relationships and intuitively rely on them while interacting with the database environment.
Accordingly, there is a need for improved techniques for generating a query of an underlying physical storage mechanism, such as an SQL query of a relational database, from an abstract query. Abstract query processing techniques should identify the underlying data sources (e.g., tables) that include data needed to evaluate conditions specified in the abstract query. Further, query results should be presented in a manner consistent with users' logical view of the data being queried.