1. Field of the Invention
This invention is related to computer databases. More specifically, this application is related to methods for creating an abstraction of a physical data storage mechanism and for constructing a resolved query of the physical data storage mechanism from an abstract query.
2. Description of the Related Art
Databases are well known systems for information storage and retrieval. The most prevalent type of database in use today is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A relational database management system (DBMS) uses relational techniques for storing and retrieving data.
A database schema is used to describe the structure of a database. For example, a relational schema describes the set of tables, columns, and primary and foreign keys defining relationships between different tables in a relational database. Applications are developed that query data according to the relational schema. For example, relational databases are commonly accessed using a front-end application configured to perform data access routines, including searching, sorting, and query composition routines. At the back-end, software programs control data storage and respond to queries submitted by users interacting with the front-end.
Structured Query Language (SQL) is a widely used database language that provides a means for data manipulation, and includes commands to retrieve, store, update, and delete data. An SQL query is constructed according to the relational schema for a given relational database, and according to the explicitly defined SQL grammar. An SQL query comprises a text string that must strictly conform to the grammar requirements of the SQL language and must also be semantically correct to perform as desired by the user. That is, many syntactically correct SQL statements may fail to perform as desired due to semantic errors. Because of this complexity, database query applications are often used to assist a user in composing an SQL query of a relational database.
One issue faced by data mining and database query applications, however, is their close relationship with a given database schema. This relationship makes it difficult to support an application as changes are made to the corresponding underlying database schema. Further, this tightly bound relationship inhibits the migration of the application to alternative data representations.
Commonly assigned U.S. patent application Ser. No. 10/083,075 (the '075 application), filed Feb. 26, 2002, entitled “Application Portability and Extensibility through Database Schema and Query Abstraction”, discloses a framework for a data abstraction model that provides an abstract view of a physical data storage mechanism. The framework of the '075 application provides a requesting entity (i.e., an end-user or front-end application) with an abstract representation of data stored in an underlying physical storage mechanism, such as a relational database. In this way, the requesting entity is decoupled from the underlying physical data when accessing the underlying DBMS. Abstract queries based on the framework can be constructed without regard for the makeup of the underlying database. Further, changes to the schema for the database do not also require a corresponding change in the query application front-end; rather, the abstraction provided by the framework can be modified to reflect the changes.
One embodiment of a data abstraction model defines a set of logical fields, corresponding to a users' substantive view of data, which are loosely coupled to the underlying physical databases storing the data. The logical fields are available for a user to compose queries that search, retrieve, add, and modify data stored in the underlying database. The abstract query is used to generate an SQL query statement processed by a relational DBMS. Additional challenges arise when transforming an abstract query, which comprises a highly logical view of data structured in the form of objects, such as logical fields, into an SQL text string (e.g., a SELECT, INSERT, or DELETE statements). Chief among these problems is the difficulty of efficiently generating an SQL query directly from the abstract query. Different pieces of the abstract query may relate to one another in non-obvious ways, and therefore, the query builder must look forward and backward through the abstract query to correctly build the piece of the query currently being considered. The query builder, however, may not be able to inspect the SQL being generated from the abstract query to determine the information it needs. In particular, this makes it harder for the query builder to determine if the SQL is fully optimized, or to make adjustments in the query design. First, the SQL would need to be reparsed, despite it being in a fragmented and incomplete state during the query building process. Second, the SQL statement does not always contain all of the information from the abstract query, because some information is lost when the abstract query is converted to SQL.
Accordingly, there is a need for techniques to provide further improvements to efficiently generate and optimize a query of an underlying physical storage mechanism, such as an SQL query of a relational DBMS, and for abstract query processing techniques generally.