The present invention relates to a database management system and more particularly to a database processing method which is suitable for a parallel query process suited to a relational database management system.
A database management system (hereinafter abbreviated to DBMS), particularly a relational DBMS processes a query which is represented in a non-procedural database language, decides the internal processing procedure, and executes the query process according to this internal processing procedure. As a database language, a database language which is regulated in Database Language SQL ISO 9075:1989 and called SQL is widely used. Among main conventional query processing methods, there are a method for deciding a single internal processing procedure on the basis of the predefined rule and a method for deciding an optimum procedure from a plurality of candidate processing procedures which are selected using various statistical information according to cost evaluation. In the case of the former, the load for generating the processing procedure is small, though there is a problem imposed in the propriety of the rules which are set uniformly and there is also a problem imposed in the optimization of the selected internal processing procedure.
The latter manages various statistical information, generates a plurality of candidate processing procedures, and calculates the load for cost evaluation for each of those procedures so as to select an optimum processing procedure. A technique obtained by combining the above two methods is indicated in, for example, Satoh, K., et. al. "Local and Global Optimization Mechanism for Relational Database", Proc. VLDB, 1985, pp. 405-417. According to the technique indicated in Satoh et al., the processing procedure is decided by inferring the amount of data to be processed from the query condition.
In a large number of DBMSs, the query process is implemented via processing of two phases consisting of the query analysis process and query execution process. For example, when embedding a query into an application program described in a host language such as COBOL or PL/I, the query analysis process is performed for the query embedded in the application program before executing the application program and an internal processing procedure is generated in the executable form. The query process according to this internal processing procedure is executed when the application program is executed. In most cases, a variable used in the host language is contained in the retrieval condition expression which is described in the query. A constant is substituted for this variable when the internal processing procedure obtained as a result of the query analysis process is executed, that is, when the query process is executed. In this case, a plurality of optimum processing procedures can be considered according to the value which is substituted for the variable when the query process is executed. Therefore, there is a problem imposed that a processing procedure which is obtained by the query analysis process beforehand is not always optimum. To solve this problem, a technique is known that a plurality of processing procedures are generated beforehand when the query analysis process is performed and the processing procedure is selected according to the value which is substituted for the variable when the query process is executed. Such a technique is indicated in, for example, U.S. Pat. No. 5,091,852 or Graefe, G., et. al. "Dynamic Query Evaluation Plans", Proc. ACM-SIGMOD, 1989, pp. 358-366.
An offer of a parallel database system which is scalable in correspondence with an increase in the transaction amount and an increase in the database amount which exceed an increase in the CPU performance of computer systems and an increase in the storage capacity of disk units is desired from users recently. Performance requirements for database systems which are desired by users are application to more than tens of thousands of users in concurrent execution, realization of retrieval transactions in units of tera bytes, and guarantee of a response time which is not in proportion to the table size. As a system in response to such a request, a great deal of attention is attracted to a parallel database system jointly with a recent reduction in the hardware cost. The parallel database system is described in, for example, DeWitt, D., et. al.: "Parallel Database System: The Future of High Performance Database Systems", CACM, Vol. 35, No. 6, 1992, pp. 85-98. In the parallel database system, a plurality of processors are tightly or loosely coupled with each other and the database process is distributed to these plurality of processors statically or dynamically. In each node (a processor or a pair of a processor and disk unit), database operations are executed in parallel or in the manner of the pipeline operation. Even in such a parallel processing system, the processing procedure can be selected in each node by applying the aforementioned technique.
Generally in a parallel database system, as the parallelism increases, the response performance improves. However, when the parallelism is excessively increased, problems such as an increase in the overhead or an increase in the response time of transactions may be imposed. Therefore, it is important to set a moderate parallelism. However, in a conventional parallel database system, a reference for deciding the number of nodes to be used for database operations is not defined. Therefore, it is difficult to obtain an appropriate parallelism and to realize an optimum load distribution. Data to be used for database operations is separately stored in each node. If there is a scattering in the data amount stored in each node when performing database operations in the manner of the pipeline operation, the processing time in each node is biased and the pipeline operation cannot be performed smoothly.