1. Field of the Invention
The present invention generally relates to computer databases. More specifically, the invention relates to database query optimization techniques.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A relational database management system (DBMS) uses relational techniques for storing and retrieving data.
Regardless of the architecture, in a DBMS, a requesting entity (e.g., an application, operating system, or end-user) demands access to data stored in a DBMS by issuing a database access request. Such requests may include, for instance, simple catalog lookup requests or transactions and combinations of transactions that read, change and add specified records in the database. Often, these requests are made using formal query languages such as Structured Query Language (SQL). Illustratively, SQL is used to construct a query that retrieves information from and updates information in a database. Commercially available databases include International Business Machines' (IBM) DB2®, MICROSOFT's® SQL SERVER®, and database products from ORACLE®, SYBASE®, and COMPUTER ASSOCIATES®. The term “query” referrers to a set of commands that retrieves, inserts, or modifies data from a database. Queries take the form of a command language that lets programmers and programs select, insert, update, determine the location of data, and the like.
A database schema describes the structure of a database. One of the issues faced by data mining and database query applications, in general, is their close relationship with a given database schema (e.g., a relational database schema describing a set of tables and relationships among tables). This relationship makes it difficult to support an application as changes are made to the corresponding underlying database schema. Further, it inhibits the migration of the application to alternative data representations. In today's environment, the foregoing disadvantages are largely due to the reliance applications have on SQL, which presumes that a relational model is used to represent information being queried. Furthermore, a given SQL query is dependent upon a particular relational schema, because specific database tables, columns and relationships are referenced by an SQL query. As a result of these limitations, a number of difficulties arise.
One difficulty is that changes in the underlying relational data model require changes to the relational schema upon which the corresponding application is built. Therefore, an application designer must either forgo changing the underlying data model to avoid application maintenance or must change the application to reflect changes in the underlying relational model. Another difficulty is that extending an application to work with multiple relational data models requires separate versions of the application to reflect the unique SQL requirements of each relational schema or DBMS system. Yet another difficulty is evolving the application to work with alternate data representations because SQL is specifically designed for use with relational systems. Extending the application to support alternative data representations, such as XMLQuery, requires rewriting the application's data management layer to use non-SQL data access methods.
In addition, as the size and complexity of databases continues to grow, query optimization methods continue to be a critical focus of database operations. Not surprisingly, the state-of-the-art in query optimization is very advanced and requires extensive knowledge of many aspects of both the underlying physical database schema and the particular database engine against which a query will be executed. Such knowledge in the art of query optimization is very valuable to making DBMS systems run efficiently. This is increasingly true as the complexity of database systems grow, especially in emerging fields such as life sciences (e.g., genomic and proteomic fields of study) where the volume of data is immense.
At the same time, however, the increasing complexity of database systems is driving a change in technology that adds to the challenges of query optimization. Specifically, abstraction layers may be used to reduce the complexity faced by a user interacting with a modern database application and DBMS system. Some embodiments of an abstract database provide a data abstraction model, or an abstract data layer, interposed between a user interacting with a query application and an underlying representation used to store data (e.g., a relational database). One embodiment of an abstract data layer provides a set of logical fields that correspond with a users' substantive view of the data. The logical fields are available for a user to compose queries that search, retrieve, add, and modify data stored in the underlying databases. Detailed examples of a data abstraction layer are described in a commonly owned application “Application Portability and Extensibility Through Database Schema and Query Abstraction,” Ser. No. 10/083,075, filed Feb. 26, 2002, incorporated herein by reference in its entirety.
One challenge for the database designer where an abstraction layer exists is the ability of the database designer to optimize how an abstract query will be executed, or how to optimize a query of the underlying DBMS system generated from an abstract query. The further the abstraction progresses, the harder it is for a database expert to tune the database, or for the database engine to optimize an individual query, for the most efficient execution. Accordingly, these changes in database design have created a need for techniques that, on one hand, provide users with the ability to compose queries that retain the simple, logical structure provided by a data abstraction model, and that, on the other hand, also provide for query optimization during query processing.