The present invention relates to a computer system and more particularly to a method and a system for query processing, which process queries from users using one or more databases.
In recent years, there exist many database management systems (DBMS) in company""s computer systems. From a historical point of view, those DBMSs have been added on a purpose base keeping pace with the development of the companies. Deregulation among different categories of industries is now rapidly progressing, which strengthens the tendency of the companies to add new operations to the existing ones. In this case, in response to the introduction of the new operations, an additional new DBMS is often introduced into the existing system.
On the other hand, each company is trying to provide the customers with better services including a new service differentiated from others, and as a result is trying to acquire a larger number of excellent customers. In such cases, because the company is more often required to analyze the company""s activities, the customers"" trends, and the like the data of which are accumulated in the DBMS group, the number of the companies that construct a data warehouse and a data mart are significantly increasing.
To construct the data warehouse and the data mart, it is necessary to put together data accumulated in the above-mentioned large number of DBMSs into one logically integrated database. In addition, beside the construction of database that is used as a base of analysis processing such as the data warehouse and the data mart, for the purpose of rapid starting of new operations, the existing DBMS groups are logically integrated to enhance competitiveness. It is because the logically integrated database as a base enables the company to construct application programs (applications) for the new operations in a shorter period of time.
Not only two or more DBMS groups within one company but also DBMS groups across two or more companies are now increasingly integrated. For example, a company group forming a group of companies integrates their information infrastructures to increase their profits as a whole; and companies in alliance with each other integrate their information infrastructures when they form a virtual alliance of the companies. Integration of the information infrastructures over a wider range provides applications developed on this integrated information infrastructure with commonality among a large number of divisions. As a result of it, overlap of application development can be avoided.
As a method for integrating the information infrastructures, there is a method of placing a xe2x80x9cdatabase hubxe2x80x9d system, which provides access to the DBMS group in an integrated manner, between the DBMS group and an application group. In this method, the database hub receives a query (typically, a query described in Structured Query Language (SQL)) from an application, and then disassembles the query and converts it into an appropriate query to the DBMS group. After that, the database hub issues this disassembled and converted query to the DBMS group, collects data to create a result of the query from the DBMS, obtains the final result of the query received from the application, and returns the result to the application.
As a result of the integration of the information infrastructures, a variety of application queries are issued to the database hub. Those queries range over various kinds of queries such as a complex query for analysis operations, and a simple query for routine tasks. In particular, in the case of the complex query for analysis operations, the database hub and the optimization of the queries to the DBMS group concealed behind the database hub play an important role in performing the queries at high speed.
The optimization of the queries is broadly categorized into two groups: a rule-based query optimization and a cost-based query optimization. The cost-based query optimization, for example, is described in xe2x80x9cAccess Path Selection in a Relational Database Management Systemxe2x80x9d written by P. G. Selinger and others (ACM SIGMOD Conference, May 1979, pp. 23-34; hereinafter referred to as Reference 1). In addition, those two kinds of optimizations are applied in a system (hereinafter referred to as Prior Art 1) described in xe2x80x9cOracle 8 Server Release 8.0, Overview: vol. 2xe2x80x9d written by Oracle Corporation (pp. 19-7-9-11, Oracle Corporation, June 1997; hereinafter referred to as Reference 2). The cost-based query optimization provides the best query execution procedure (query execution plan) among various execution possibilities from the standpoint of execution cost.
Therefore, it is thought that the cost-based query optimization is suitable for the complex query. In fact, the cost-based query optimization is applied to many DBMS products, and enormous efforts are being expended to enhance it.
In reality, however, even if the cost-based query optimization is applied, the query is not always well optimized. As a result, the query is often processed by the execution plan that cannot be said to be the best for the user. There are two reasons for it.
Firstly, a query issued by an application is becoming complex at higher pace than that of technical enhancement of the cost-based optimization. Secondly, the cost-based optimization requires statistical information about a target database. However, there is a limit of details of the statistical information. Taking a relational database (a database which stores data in one or more tables consisting of one or more records having one or more columns) as an example, the following data are usually used as the statistical information: size of the table (number of records in the table); the minimum value, the maximum value, and an average value of values in the columns; number of different values in the columns; existence/nonexistence of NULL value in the columns; and distribution of values in the columns. Regarding the cost-based optimization, the cost of a series of processing to obtain a result of the query is calculated using that statistical information. However, because the statistical information is merely data characteristics summary that is viewed from several aspects, it is difficult to estimate the correct cost of the series of processing completely.
To address this problem, in a system (hereinafter referred to as Prior Art 2) described in xe2x80x9cOracle 8 Server Release 8.0 Tuningxe2x80x9d (pp. 8-8 to 8-27, Oracle Corporation, September 1997; hereinafter referred to as Reference 3), the application is allowed to direct a query optimization method to the query either by query unit or by user. However, because the direction can be given only by query unit or by user, it remains impossible to give an optimization direction in block to a group of queries that have been issued from different applications, and it also remains impossible to give the optimization direction in block to a group of queries that have common characteristics.
The following is an example: In the integration of the information infrastructures across two or more companies as described above, a database hub and a DBMS group under the database hub process queries which are issued from various organizations, divisions and posts, and which are related to operations with a variety of complexity and urgent necessity. For example, if two companies use a common application, and if there is a situation in which different optimization methods are required for optimization of a query from one company and for optimization of a query from another company, it remains impossible to address this case using the direction of optimization method by query unit or by user. Such differentiation of the optimization methods should not be realized by modifying the application. It is recommended that an administrator handling the database hub and the DBMS group under the database hub can control the differentiation in a batch.
In short, a first problem that is addressed by the present invention is to enable the administrator to control operation of the database hub and the DBMSs in a batch without modifying the application.
In addition, as described above, the conventional cost-based optimization including Prior Art 1 is realized using the contents of the database (the statistical information about tables and columns) and information about a system (processor performance of a machine, input/output performance, network performance, etc.). (Those two kinds of information are generically called xe2x80x9cstatistical informationxe2x80x9d.) However, an aspect of system administration that the administrator considers is not reflected in the cost-based optimization. The aspect of system administration is based on consideration of the characteristics of various system operations, and includes the reduction of load to a specific DBMS in a specific time period, and the reduction of load to a specific network. More specifically, because the DBMS that performs Online Transaction Processing (OLTP) processes operations with highly urgent necessity, the administrator sometimes wants to suppress analysis queries issued to such DBMS with the exception of some kinds. The conventional cost-based optimization for the database hub and the DBMSs, however, cannot reflect such request.
In short, a second problem that is addressed by the present invention is to control the query optimization in response to the request from the system administrational aspect.
In addition, as described above, the cost-based optimization highly depends on the existence of the detailed statistical information. If there is not enough or no statistical information, in many cases it is difficult to obtain a good execution plan by the cost-based optimization. In contrast to the conventional DBMS where the query optimization can be processed as a whole within a single DBMS, the integration of the information infrastructures is related to two or more DBMSs when a query generally issued by an application is executed. Moreover, those DBMSs generally exist in different divisions, and different companies. Therefore, because of the constraints among divisions and companies associated with management, security, contract, and the like, it may be impossible to obtain (or it is very difficult to obtain) enough statistical information to execute the cost-based optimization. In such cases, obtaining all of the statistical information is not recommended. There is a necessity to be capable of indicating statistical information, which should be used in a specific situation, to the database hub and the DBMSS.
In short, a third problem that is addressed by the present invention is to indicate the statistical information, which should be used in a specific situation, from outside the database hub and the DBMSs as necessary.
Next, according to the conventional technology, when a query is executed after query optimization, the execution plan that has been obtained by the query optimization is executed as it is. The aspect of system administration that the administrator considers, therefore, cannot be reflected in the cost-based optimization. The database hub and the DBMS group under the database hub simply process the query that is issued from various organizations, divisions and posts, and that is related to operations with a variety of complexity and urgent necessity. Therefore, it was difficult for the administrator to realize control over the query execution; for example, it was difficult to prohibit an application in a general division to access a specific DBMS of the associated company during the daytime.
In short, a forth problem that is addressed by the present invention is to control the query execution in response to the request from the system administrational aspect.
Finally, concerning the direction of query optimization including Prior Art 2, a human inspects the execution plan and manually gives direction one by one for the processing (for example, join method, join order, etc.) present in an inquiry. Because of it, the direction of query optimization is complicated and is prone to cause a mistake. In short, a fifth problem that is addressed by the present invention is to provide the human with support for creating a direction relating to the system operation for the query optimization and the query execution to ensure that the direction relating to the system operation for the query optimization and the query execution is easily performed without a mistake.
An object of the present invention is to provide a high performance and easy-to-use information system by addressing the five problems described above, and thereby realizing high speed operation as a whole system in response to the administrator""s intention, even when the information infrastructures are integrated based on the integration of a plurality of DBMSs within a single company or across a plurality of companies.
As a means for solving the first problem, the present invention introduces a query classification definition and a query operation direction into the database hub (or the DBMS). The query classification definition is a method for classifying queries issued by applications based on various classifications and attributes. The query operation direction is a method for selecting operational options for query optimization and query execution. The present invention enables an administrator to associate the query classification definition with the query operation direction and to store them in the database hub (or the DBMS). The database hub (or the DBMS) obtains the query operation direction, which is associated with said query classification definition, for a group of queries classified by said query classification definition, and thereby changes the operation of query optimization and query execution. This solves the first problem that the administrator is required to control the operation of the database hub and the DBMS in a batch without modifying the application.
As a means for solving the second problem, in the present invention, the query classification definition and the query operation direction provide control over a method for creating an execution plan for query optimization of the database hub and the DBMS (a method for selecting join method, join order, record selection processing, grouping processing, sort processing, subquery processing, etc.). In order to start the query operation direction selectively in response to the system administrational aspect, the query classification definition is allowed to have specifications including a type of DBMS (that is accessed by the database hub according to the query classification definition), and a range of time. This solves the second problem that it is necessary to control the query optimization in response to the system administrational aspect.
As a means for solving the third problem, in the present invention, the query classification definition and the query operation direction enables the administrator to set the statistical information partially to the database hub and the DBMS associating with the group of queries.
The cost-based optimization, therefore, can be performed without obtaining the statistical information exhaustively.
Accordingly, this solves the third problem that it is necessary to input the statistical information (that should be used in a specific situation) to the database hub and the DBMS from the outside as necessary.
As a means for solving the fourth problem, in the present invention, the query classification definition and the query operation direction provide control over the execution operation of the database hub and the DBMS for the group of queries. More specifically, it is possible to halt or delay the query execution and to issue a warning in response to the following information: a specific organization, division, and post; the complexity and the urgency of operations; a type of DBMS to be accessed; the amount of data to be obtained; and the like. This solves the fourth problem that it is necessary to control the query execution in response to the system administrational aspect.
As a means for solving the fifth problem, in the present invention, the following functions are added to the database hub and the DBMS: presentation of information that has been missing at the time of optimization; identification of the query operation direction that has been used; presentation of recommended values for the classification definition of a query group; and presentation of recommended values for the operation direction of the query group. This solves the fifth problem that provides the administrator with support for setting the query classification definition and the query operation direction.