1. Field of the Invention
The present invention relates generally to information processing environments and, more particularly, to a database management system (DBMS) having a methodology for cost-based subquery optimization using a left-deep tree join enumeration algorithm.
2. Description of the Background Art
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as “records” having “fields” of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information may be retrieved from, or updated in, such files, and so forth, all without user knowledge of underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of a database management system is known in the art. See e.g., Date, C., “An Introduction to Database Systems, Volume I and II”, Addison Wesley, 1990; the disclosure of which is hereby incorporated by reference.
DBMS systems have long since moved from a centralized mainframe environment to a decentralized or distributed environment. One or more PC “client” systems, for instance, may be connected via a network to one or more server-based database systems (SQL database server). Commercial examples of these “client/server” systems include Powersoft® clients connected to one or more Sybase® SQL Anywhere® Studio (Adaptive Server® Anywhere) database servers. Both Powersoft and Sybase SQL Anywhere Studio (Adaptive Server Anywhere) are available from Sybase, Inc. of Dublin, Calif.
In today's computing environment, database technology can be found on virtually any device, from traditional mainframe computers to cellular phones. Sophisticated applications, whether human resources information systems or sales force automation systems, can “push” much of their complexity into the database itself. Indeed, this represents one of the main benefits of database technology. The challenge, however, is to support these applications, and the complex queries they generate, on small computing devices. At the same time, users expect the productivity and reliability advantages of using a relational DBMS.
One purpose of a database system is to answer decision support queries. A query may be defined as a logical expression over the data and the data relationships set forth in the database, and results in the identification of a subset of the database. Consider, for instance, the execution of a request for information from a relational DBMS. In operation, this request is typically issued by a client system as one or more Structured Query Language or “SQL” queries for retrieving particular data (e.g., a list of all employees earning more than the average salary of all employees) from database tables on a server. In response to this request, the database system typically returns the names of those employees earning more than the average salary, where “employees” is a table defined to include information about employees of a particular organization. The syntax of SQL is well documented, see e.g., “Information Technology—Database languages—SQL”, published by the American National Standards Institute as American National Standard ANSI/ISO/IEC 9075: 1992, the disclosure of which is hereby incorporated by reference.
SQL queries express what results are requested but do not state how the results should be obtained. In other words, the query itself does not tell how the query should be evaluated by the DBMS (i.e., the query is declarative). Rather, a component called the optimizer determines the “plan” or the best method of accessing the data to return the result required by the SQL query. The query optimizer is responsible for transforming an SQL request into an access plan composed of specific implementations of the algebraic operator selection, projection, join, and so forth. The role of a query optimizer in a relational DBMS system is to find an adequate execution plan from a search space of many semantically equivalent alternatives.
A query optimizer transforms an SQL query into an access plan by generating different join strategies and, based on cost, choosing the best strategy. The process of generating a subspace of the space of all the join strategies is called join enumeration. Since relational databases typically only provide physical operators that can join two tables at a time, a join of a number of different tables (n-way join) must be executed as a sequence of two-way joins, and there are many possible such sequences. The optimizer must typically enumerate some or all of these sequences and choose one based on estimates of their relative execution costs.
In order to optimize a query, a query optimizer must be able to concurrently solve several different problems including: choosing the access method (e.g., sequential scan, index scan) for each base table used in the query; choosing the order in which to join the tables (i.e., the join order), and choosing the join method to be used for each join operation. A complete access plan comprises a join order for joining the relations (tables), join methods for each join operation, and an access method for each base table used in the query.
The SQL language allows the usage of queries using nested subqueries. In general, a subquery is embedded in the WHERE clause of a query to further restrict data returned by the query. A database optimizer, in general, optimizes a subquery block separately from the rest of the query block the subquery is used in. The evaluation of the subquery is then done when it is needed in the context in which it is used. For example, if the subquery appears in a filter predicate on a table, the subquery is evaluated for each row of the table while evaluating the predicate.
Subquery evaluation techniques and cost-based subquery optimization have been proposed in the relational database literature as a means to evaluate queries containing nested subqueries. Some of the most important work in this domain, the disclosure of which are hereby incorporated by reference, are:
Dayal, U., “Of Nests and Trees: A Unified Approach to Processing Queries that Contains Nested Subqueries, Aggregates, and Quantifiers”, in Proceedings of the 13th International Conference on Very Large Data Bases, Brighton, England, pp. 197-208, 1987.
Galindo-Legaria, C. A. et al, “Ortogonal Optimization of Subqueries and Aggregation”, Proceedings of ACM SIGMOD Conference of Management of Data, Santa Barbara, Calif., pp. 571-581, May 2001.
Muralikrishna, M., “Improved Unnesting Algorithms for Join Aggregate SQL Queries”, Proceedings of the 18th International Conference on Very Large Data Bases, Vancouver, Canada, pp. 91-102, August 1992.
Kim, W., “On Optimizing an SQL-like Nested Query”, in ACM Transactions on Database Systems, 7(3): pp. 443-469, September 1982.
A major challenge in database systems is to significantly reduce the amount of memory required to enable a database system to be run on small computing devices. More particularly, what is required is a solution which enables queries to be effectively optimized while requiring a minimal amount of memory for generating the search space and for storing the information required for query optimization. One of the challenges in cost-based subquery optimization in database systems is to integrate the subquery optimization into the optimizer's enumeration process without significantly increasing the amount of memory required, or significantly increasing the search space the optimizer is considering.
What is needed is a query optimization solution which integrates subquery optimization with an optimizer's general enumeration methodology. The solution should not significantly increase the search space considered during query optimization as an increase in the search space typically would require a corresponding an increase in the amount of memory required by the optimizer. The present invention provides a solution for these and other needs.