1. Field of the Invention
The present invention relates to an apparatus, a method, and a computer program product for processing a query for a database in which structured document data that is represented by tree-structured nodes is stored.
2. Description of the Related Art
There are several schemes for a structured-document management system for storing and retrieving structured-document data that is described in eXtensible Markup Language (XML) or the like.
In recent years, a system has been proposed as a new storing method, in which the structured-document data in a native form. With this system, because the XML data (structured-document data) having a wide variety of hierarchical structures is stored without performing a special mapping process, there is a merit that a special overhead does not exist at the time of storing and acquiring the data. Furthermore, because a preliminary schema design that requires a high cost is not necessary, a structure of the XML data can be freely changed according to a change of a business environment as appropriate.
As a query language for retrieving the XML data, the XML query language (XQuery) has been developed. The XQuery is a language for handling the XML data just like a database. Therefore, it is provided with a means for extracting, collecting, and analyzing a data set satisfying a condition. Moreover, because the XML data has a hierarchical structure in which elements such as parent-child relation and sibling relation are combined, a means for tracing the hierarchical structure is provided. A technology for retrieving the structured-document data in which a specific element and a specific structure designated by a retrieving condition while tracing the hierarchical structure of stored structured-document data is already disclosed in JP-A 2000-057163 (KOKAI).
However, it takes more time to perform a process of tracing the elements constituting the hierarchical structure of each structured-document data as a size of the structure of the structured-document data grows, as the number of the structured-document data stored in the database increases, and as the retrieving condition becomes complicated. In addition, if the size and the number of the structured-document data increases, it is impossible to load the stored structured-document data on a memory, resulting in a majority of the data being stored in a secondary storage device such as a hard disk.
In the system that manages the structured-document data in the native form described above, because the hierarchical structure of the structured-document data between the elements is stored as it is, an access must be made at frequent intervals between the elements of the structured-document data stored in the secondary storage device in order to check whether there is an element or a structure that is designated as a retrieving condition. The situation is even stricter if the retrieving condition is complicated. Namely, in the system that manages the structured-document data in the native form, it is difficult to increase the speed of the retrieving process as the size of the structure of the structured-document data grows, as the number of the structured-document data stored in the database increases, and as the retrieving condition becomes complicated.
In recent years, a query optimization technology has been developed for increasing the speed of the query with a complicated retrieving condition.
A query optimization technology disclosed in Japanese Patent No. 3754253 stores a class of an applicable retrieving graph node, an application cost, an application condition, and a plan generation rule that indicates an action executed at the time of executing a retrieving plan, generates a retrieving graph having a retrieving graph node including a variable node that can be incarnated by the plan generation rule by parsing a description of a retrieving request, applies the plan generation rule by selecting a retrieving graph node that satisfies the application condition in the retrieving graph node of the retrieving graph and that requires the minimum application cost, and generates a retrieving plan that indicates a retrieving process procedure for a structured-document database by repeating the application of the plan generation rule while incarnating the variable node by executing the action. With this technology, it is possible to generate the plan at a high speed because the plan generation rule can be applied linearly.
A query optimization technology disclosed in Japanese Patent No. 3492246, when retrieving a portion that satisfies a designated condition from the XML data, seeks to optimize the query before executing a retrieving, and at the same time, seeks to optimize an execution by performing a replacement of the process procedure and a reuse of an obtained process result at the time of executing the retrieving, for example, performing a rewriting, such as a replacement of an inner loop and an outer loop in the case of running a nested loop, replacement of a right-hand member and a left-hand member when processing a self-join, and a replacement of an execution order. With this technology, it is possible to generate the plan at a high speed because the plan generation rule can be applied linearly, although it is inferior to the system disclosed in Japanese Patent No. 3754253.
However, it cannot be said that the query optimization technologies disclosed in Japanese Patent Nos. 3754253 and 3492246 are perfected, and there still exist several problems that must be solved.
The first problem of the query optimization technology disclosed in Japanese Patent No. 3754253 is that it is necessary to clarify a number of plan generation rules in advance. The second problem is that it is necessary to perform a considerable tuning in order to appropriately control an application order of a number of rules because there is a possibility of an occurrence of an interference. The third problem is that an accuracy of the plan (i.e., a cost) is not adequate, although the plan can be generated at the high speed.
The first problem of the query optimization technology disclosed in Japanese Patent No. 3492246 is that it is necessary to clarify a number of plan generation rules in advance. The second problem is that an accuracy of the plan (i.e., a cost) is not adequate.
In other words, with the query optimization technologies disclosed in Japanese Patent Nos. 3754253 and 3492246, it is necessary to exhaustively generate background knowledge in advance, such as a number of plan generation rules and a plan change rule, in order to optimize the query. However, because the XQuery language specification has following characteristics, it is anticipated that the above background knowledge increases so that the optimization becomes difficult: a nesting that causes a sequence (let expression); and a path designating a hierarchical condition between the elements of the XML data (such as/and //).