The present invention relates to database management systems and more particularly to a method for determining optimal database materializations utilizing a query optimizer.
A database management system (DBMS) comprises the combination of an appropriate computer, direct access storage devices (DASD) or disk drives, and database management software. A relational database management system is a DBMS which uses relational techniques for storing and retrieving information. The relational database management system or RDBMS comprises computerized information storage and retrieval systems in which data is stored on disk drives or DASD for semi-permanent storage. The data is stored in the form of tables which comprise rows and columns. Each row or tuple has one or more columns.
The RDBMS is designed to accept commands to store, retrieve, and delete data. One widely used and well known set of commands is based on the Structured Query Language or SQL. The term query refers to a set of commands in SQL for retrieving data from the RDBMS. The definitions of SQL provide that a RDBMS should respond to a particular query with a particular set of data given a specified database content. SQL however does not specify the actual method to find the requested information in the tables on the disk drives. There are many ways in which a query can be processed and each consumes a different amount of processor and input/output access time. The method in which the query is processed, i.e. query execution plan, affects the overall time for retrieving the data. The time taken to retrieve data can be critical to the operation of the database. tt is therefore important to select a method for finding the data requested in a query which minimizes the computer and disk access time, and therefore, optimizing the cost of doing the query.
A database system user retrieves data from the database by entering requests or queries into the database. The RDBMS interprets the user""s query and then determines how best to go about retrieving the requested data. In order to achieve this, the RDBMS has a component called the query optimizer. The RDBMS uses the query optimizer to analyze how to best conduct the user""s query of the database with optimum speed in accessing the database being the primary factor. The query optimizer takes the query and generates a query execution plan. The query execution plan comprises a translation of the user""s SQL commands in terms of the RDBMS operators. There may be several alternative query execution plans generated by the query optimizer, each specifying a set of operations to be executed by the RDBMS. The many query execution plans generated for a single query ultimately differ in their total cost of obtaining the desired data. The query optimizer then evaluates these cost estimates for each query execution plan in order to determine which plan has the lowest execution cost. In order to determine a query execution plan with the lowest execution cost, the query optimizer uses specific combinations of operations to collect and retrieve the desired data. When a query execution plan is finally selected and executed, the data requested by the user is retrieved according to that specific query execution plan however manipulated or rearranged.
In a SQL based RDBMS the query execution plan comprises a set of primitive operations or commands, e.g. JOIN; a sequence in which the retrieve operations will be executed, e.g. JOIN ORDER; a specific method for performing the operation, e.g. SORT-MERGE JOIN; or an access method to obtain records from the base relations, e.g. INDEX SCAN. In most database systems, particularly large institutional systems, a cost-based query optimizer will be utilized. A cost-based query optimizer uses estimates of I/O and CPU resource consumption in determining the most efficient query execution plan because both I/O and CPU resource consumption depend on the number of rows that need to be processed.
The performance of queries against a database may be enhanced significantly by materializing certain data that may be redundant of data already in the database. This materialized data may be organized in ways better suited to certain database operations, such as searching for specific data, for example as with indexes, or may pre-compute information likely to be asked for often, as with materialized views, for example.
Data materialization such as indexes can benefit performance of a query in one or more of the following ways. First, a materialization can be used to rapidly find data which satisfies a user-specified search criterion, for example, predicates specified in the WHERE clause of an SQL query. Second, a materialization can be used to access rows in a particular order, thereby saving sort operations to achieve that ordering for operations such as JOINS or GROUP BY or ORDER BY clauses specified by the user. Thirdly, a materialization can be used to provide a subset of a table""s columns, or tables"" in the case of join indexes, that are a superset of the columns requested in a user query, thereby saving the access of the data pages of the base table. Because the data pages in the base table are presumably much larger than the index pages in the index table, the cost per row is greater for the base table.
On the other hand, materialized data has additional costs which include the following: (1) update costs to keep the materialization consistent with other data that has been modified, requiring possible data access and computation to determine the new contents of the materialized data based upon what data was modified; (2) storage costs for the materialized data, which is usually redundant of the base data; (3) increased optimization time to consider the use of these materializations, as an alternative to, or in addition to, accessing the base table.
Currently, most database systems leave the determination of the appropriate materializations up to the user. However, this can be very difficult and/or time-consuming for the user. There are typically numerous possible materializations and in many possible combinations. Furthermore, the costs and benefits of each combination will, in general, be very difficult for the user to assess.
Accordingly the present invention provides a method for determining optimal database materializations using a query optimized.
The present invention provides a method for exploiting a database query optimizer to recommend materializations, for example indexes, of a database to enhance performance. The method takes one or more queries as inputs and using the cost-based optimizer in the database management system generates a series of virtual materializations by materializing some subsets of the database. The virtual materializations are used to consider the relative performance benefits, i.e. cost-benefits, for the queries based on the various virtual materializations. If the optimizer decides to use any of the materializations in its plan, then those materializations are recommended to the user, or created automatically for the user.
A feature of the present invention is that the method is incorporated into the query optimizer, rather than as an external tool. This arrangement provides the following advantages: (1) The existing infrastructure for iterating through alternative access paths to a table, and for estimating execution costs of these alternatives, can be exploited, and do not have to be replicated in an external tool. (2) Maintenance costs decrease because the equations used to estimate costs and benefits need be maintained in only one place, i.e. in the query optimizer. (3) There is greater accuracy in cost estimation, since there is no possibility of the query optimizer and an external tool being out of synchronization. The user is also guaranteed that the query optimizer will use the recommended materialization if it is defined, since the query optimizer recommended the materialization by using its equations to pick that materialization for use. (4) There is greater efficiency in determining the materializations to recommend, since the query optimizer need only be invoked once to determine the best materializations for a given query. An external tool, on the other hand, must iteratively recommend and create candidate materializations and invoke the query optimizer to assess that set of materializations.
In one aspect, the present invention provides a method for determining optimal materializations for a query optimizer in a database management system, wherein the query optimizer generates one or more query execution plans in response to a query input from a user for accessing data in a database schema in the database management system, the method comprises the steps of: (a) generating a plurality of temporary materializations as candidates for the query execution plans associated with the query; (b) computing estimated statistics for selected performance parameters for each of the temporary materializations; (c) utilizing the query optimizer to optimize each of the query execution plans; (d) determining if any of the temporary materializations are being utilized in any of the query execution plans; (e) if any of the temporary materializations are being utilized in any of the query execution plane, recommending the temporary materializations to the user together with the associated query execution plans.
In another aspect, the present invention provides a relational database management system for use with a computer system wherein queries are entered by a user for retrieving data from a database schema, the relational database management system includes a query optimizer for optimizing query execution plans associated with the queries entered by the user, the system comprises: (a) means for processing queries; (b) means for generating a plurality of temporary materializations as candidates for the query execution plans associated with the query; (c) means for computing estimated statistics for selected performance parameters for each of the temporary materializations; (d) the query optimizer including means for optimizing each of the query execution plans and means for selecting query execution plans on the basis of selected performance parameters; (e) means for determining if any temporary materializations are being utilized in any of the query execution plans; (f) means for recommending the temporary materializations to the user together with the associated query execution plans selected by the query optimizer if any of the temporary materializations are being utilized in one of the query execution plans.
In yet another aspect, the present invention provides a computer program product for use on a computer wherein queries are entered by a user for retrieving data from a relational database management system having a query optimizer for optimizing query execution plans associated with the queries entered by the user, the computer program product comprises: a recording medium; means recorded on the medium for instructing the computer to perform the steps of, (a) generating a plurality of temporary materializations as candidates for the query execution plans associated with the query; (b) computing estimated statistics for selected performance parameters for each of the temporary materializations; (c) utilizing the query optimizer to optimize each of the query execution plans; (d) determining if any of the temporary materializations are being utilized in any of the query execution plans; (e) if any of the temporary materializations are being utilized in any of the query execution plans, recommending the temporary materializations to the user together with the associated query execution plans.