The present invention generally relates to a distributed database system, and more particularly to a distributed database system for optimizing a sequence of query processes when a query including relations joined together with respect to multiple attributes thereof is input.
With recent development of computer systems and network technology, it is highlighted that distributed database systems, rather than centralized database systems, would be useful to efficiently store, retrieve and manipulate information in the distributed databases. However, a distributed database system still has some unresolved problems. One such problem is that it is difficult to efficiently perform the query processes for the distributed databases in response to an input query. Thus, the data transfer cost of the distributed database system is relatively high.
A paper entitled "Optimization of Tree Distributed Query" by H. Li and H. Sato (IPSJ Database System SIG Report 64-6; IPSJ Programming Language SIG Report 26-3, September 1990) discloses the use of semijoins between relations in order to reduce the quantity of data of relations and to decrease the total transmission time incurred when the tree distributed query processes are performed associated with the databases.
A paper entitled "Discussion on Optimization of Distributed Query Processing" by H. Li and H. Sato (IPSJ Database System SIG Report 64-6, March 1988) discloses a proposed approach for optimizing the transmission time and the response time in the distributed query processes associated with the databases. In the above mentioned paper, the uses of joins between multi-attribute relations and of semijoins between multi-attribute relations are discussed.
However, in the existing distributed database system, it is impossible to efficiently perform a sequence of query processes associated with distributed databases if a query including multi-attribute relations joined together with respect to the multiple attributes thereof is input. A query is a request for extracting information from the databases.
Next, a description will be given of the method for generating the semijoin between two relations. A derived relation resulting from the semijoin generation by this method is called a reducer, and this reducer serves to reduce the size of data to be transmitted. A query process of a distributed database system is performed in three major procedures. In a first procedure, local processes including projection operations, selection operations and joining operations are performed. In a second procedure, the size of data of relations to be transmitted is reduced. In a third procedure, the required operations are performed, and the data is transmitted to a node of the communication network from which a query was issued. When the second procedure is performed, the size of data to be transmitted is decreased by generating semijoins between single-attribute relations only.
FIGS. 1A-1E show a regular join between two relations and two semijoins between the relations derived from two distributed databases. The join between two relations A and B in FIGS. 1A and 1B is generated with respect to the attribute "a" as shown in FIG. 1C, and the resulting data includes all items of data of the two relations. A first semijoin between the two relations is generated with respect to the attribute "a" of the relation A as shown in FIG. 1D. The item "al" of the attribute "a" is shared by the two relations A and B. From this first semijoin, a tuple including the items "a2" and "b2" in the relation A is reduced. A second semijoin between the two relations A and B is generated with respect to the attribute "b" as shown in FIG. 1E. The item "b2" of the attribute "b" is shared by the two relations. From the second semijoin, a tuple including the items "al" and "b3" in the relation B is reduced. Generally, the quantity of data resulting from the semijoin generation is smaller than the quantity of data resulting from the regular join generation, as shown in FIGS. 1C-1E.