1. Field of the Invention
The present invention relates to a join operation processing system suitable for performing a join operation on a relational database usable in a distributed database management system.
2. Description of the Prior Art
A distributed database management system consists of general local database management systems, each called sites, interconnected with each other by various networks. These sites have memories, each memory including databases distributed and stored therein. A join operation on the relational databases distributed to a multiple sites is needed to be performed at one site by transferring all relational tuples related to the join operation from the other sites to the above one site.
The above method however must also transfer any tuple which is considered not to be involved in a relation as the final result of the join operation. Accordingly, the amount of data transmitted among sites determines the rate of the join operation and the cost of the operation.
A method is known for solving such a problem, which is disclosed in "Using Semi-Joins to Solve Relational Queries" Journal of the Association for Computing Machinery, Vol. 28, No. 1, January 1981, pp 25 to 40. According to this method, the amount of data transferred from a certain site to another site, when relation transmission between the above sites is required to perform a join operation, can be reduced by transferring only attribute values associated with the join operation, thereby performing a semi-join process for the join operation wherein a relation comprising only tuples finally involved in the relation which is considered to be yielded as the result of the join operation is prepared, and transferring the relation as a result of the above semi-join process.
FIG. 1 illustrates a prior art join operation processing system and FIG. 2 likewise illustrates a relation to be processed in the join operation processing system of FIG. 1. In FIG. 2, RX is called a relation name, while A1, A2 and A3 each called an attribute name. In addition, a, b and c or the like are named attribute values of an attribute A1, and A, B, C or the like named attributed values of A2. Moreover, for data types of the respective attributes, A1 represents an English small letter, A2 an English capital letter, and A3 a numeral. A row, for example, such as (a A1) and (b B2), etc., is named a tuple.
Operation of a prior art join operation processing system will be described with reference to FIG. 1. Considering, for example, a join operation processing wherein a relation 3 (named R) managed by a site 1 and a relation 4 (named P) managed by a site 2 are subjected to the join operation under conditions that values of attributes A2 and B1 are equal to each other, and thereafter the resultant relation is transferred to a computer 5 connected with the site 1.
First, a request from an application program 6 of the computer 5 to a distributed data base management system 7, namely, a request for performing a join operation between the relations R and P of a distributed database 18 is transmitted via a database management system access manager 8.
Then, the processing request is analyzed by a processing request analyzer 9a and transferred to a process determining means 10a. The process determining means 10a determines the process in conformity with the analyzed result and informs a database manager 11a of the determined process. The database manager 11a executes a prescribed join operation in conformity with the execution process. In addition, the database manager 11a also has a function to inform, if processing at another site is needed, a database manager of the another site of a necessary process through an intersite communication controller 14a.
The process will further be described in detail. First, as a semi-join process, a project operation is executed on the relation R by the database manager 11a whereby a relation 12 (named R') comprising only the attribute A2 is prepared as an intermediate result. The relation 12 is stored in a local database 13a managed by the site 1.
The database manager 11a transfers a process to be performed by the database manager 11b in the site 2 as well as the relation 12 to the database manager 11b via the intersite communication controllers 14a and 14b and a communication network 15.
The database manager 11b subjects the transmitted relation 12 and a relation 4 (named P) stored in the database 13b managed by itself to a join operation in conformity with the transmitted process, and obtains a relation 16 (named P') as an intermediate result, and stores it in the database 13b. The database manager 11b transfers the relation 16 to a database 13a managed by the site 1. The database manager 11a performs a join operation on the relations 16 and 3 and transfers the resultant relation 17 to the computer 5.
For the join operation processing system, it is assumed that the transfer capability of a communication network for connecting the respective sites of the distributed data base management system to each other is relatively low, and thus, the amount of communication among the respective sites is reduced.
Namely, a semi-join process at each site imposes a burden on the database management system in the situation as described below.
(1) Throughput in each site is low.
(2) Queries from many users are concentrated to any particular site.
(3) Any relation to be processed is large in its capacity and thus an intermediate process yielded by the semi-join process must also require a large capacity.