1. Field of the Invention
The present invention relates to a database processing apparatus, an information processing method, and a computer program product for processing structured documents in a distributed database having a plurality of databases distributed that store therein the structured documents having hierarchical structures.
2. Description of the Related Art
Along with recent spread of structured documents such as an extensible markup language (XML) document, databases that store therein the structured documents (for example, an XML database) are increasingly used. An XQuery and the like are proposed as languages to be used in querying these databases. The XQuery is a functional language for querying the XML database, and is characterized by FLWR syntax. The FLWR syntax contains a for clause, a let clause, a where clause, and a return clause. A process performed by using the XQuery is described in “A Complete and Efficient Algebraic Compiler for XQuery”, for example.
In the FLWR syntax of the XQuery, the for clause binds each item in a sequence to a variable, whereas the let clause binds an entire sequence to a variable. Combination of the for and let clauses enables advanced queries appropriate for the XML (restructuring or compiling of the XML cannot be expressed without the let clause.)
A process for the XQuery results in a sequence. The let clause can be called a nested clause because a nested query can be invoked by using the let clause. Thus, the let clause is syntax forming the foundation of the XQuery; however, its realizing method has not been studied sufficiently. Practically, when the let clause is handled like the for clause, problems such as “loss of sequence elements” and “lack of the number of results” occur, and therefore processing thereof is difficult. The article mentioned above only refers to a processing method using FLWR samples, and a detailed processing method for the let clause is unclear.
In the existing database products, the let clauses are mostly implemented according to a processing system for functional languages. When a database is queried using the nested structure mentioned above in the implementation of the let clauses according to the processing system for functional languages, an outer XQuery and an inner XQuery of the nested structure are regarded as having an input-output relation. Therefore, upon completion of a process for the outer XQuery, obtained results of the process are passed on to the inner XQuery as variables, and processes for the variables are then performed.
The let clause or nested clause is essential to perform an advanced process of the XQuery. There are few processing methods corresponding to the processing system for functional languages. When these processing methods are used, problems as mentioned below occur. Particularly, the amount of calculation increases in an XQuery including double or more for and nested clauses. In such XQuery, an inner loop is processed upon completion of processes for outer multiple loops. That is, a nested function is invoked by the number of multiple loops. In a practical processing system having ordering restrictions on nested functions, the problem is overcome by rewriting an XQuery, a program, or the like.
Another problem occurs in a distributed XML database in which XML data are distributed over plural database servers and a coordinator server is connected to these database servers. That is, until all the XML data are retrieved from the database servers using an upper for clause, a lower nested clause cannot be processed. The coordinator server needs to receive all data lower than the corresponding XML elements included in the database servers. When the XML elements included in the database are large in number or size, the cost required for transferring the elements becomes quite high.
As described above, when a distributed XML database is queried using a nested structure, processes thereof can be complicated, resulting in increases in the quantity of data to be transferred and the amount of calculation, and therefore performance of the database can be significantly deteriorated.