1. Field of the Invention
The present invention relates to a technology for searching a plurality of structured documents of different document structures managed in a structured-document database having a hierarchized logical structure.
2. Description of the Related Art
In recent years, a structured-document database for storing structured-document information described in extensible markup language (XML) or the like and for searching stored information has been realized. As for a query to the structured-document database, a query language called the XML query (XQuery) for which the world wide web consortium (W3C) is pushing forward a standardization has been a mainstream.
The XQuery is capable of searching information by designating a path (a structure) or a keyword (a vocabulary), with a feature of an extremely high language description capability. For instance, regarding a structure condition, the XQuery can describe a search condition in a format including a structure ambiguity using a symbol such as “/*” and “//”.
In the XQuery, information at a node level in a document object model (DOM), such as an element and an attribute, becomes a search target. For instance, in JP-A 2001-147933 (KOKAI), a technology is proposed in which a search of information at a node level of a structured document is performed by following method.
First, at the time of storing a structured document in a database, a data structure of a target document is analyzed, and an index is created by embedding analysis information for the structure (the node) in vocabulary index information or the like. After that, at the time of searching information, a query graph is created by analyzing a search query, and a plan for executing a query is created after calculating a cost. Finally, the query is executed following created plan, and information of a node that satisfies a structure constraint in the query graph is obtained as a search result.
Various types of data are stored in such structured-document database, which are managed in a unified manner, and as a result, various data structures (schema) are included in the structured-document database. When processing the XQuery, it is required to search (process) a target candidate from among the data having those various schemas without omission at a high speed.
To speed up a search operation, a method of attaching a vocabulary index or a numeric index to a specific path, a method of analyzing a data structure of a storage target and extracting schema analysis information for a feature structure, and the like have been considered.
With the first method, for example, when performing a search like “/title=“XML” ”, a speed up of the search can be expected because a reverse lookup is possible from a vocabulary by attaching a vocabulary index to a <title> tag.
With the second method, for example, a speed up of the search can be expected because a structure collating process cost for verifying that <title> is present under <header> by registering information indicating that <title> is present “by necessity” and “solely” as a child element of <header>.
In general, a portion requiring a cost in a searching process is a data collating process represented by a structure collating process, a value collating process, and the like. The value collating process is a process for verifying that a designated phrase (value) is included as a search key.
A problem in the study of a search optimization process is how to create a plan with a low cost, and a representative process requiring a high cost is the data collating process described above. The reason is because a “data scan” must be performed, in which an access is actually tried to a structured document in a database. A speed of the data scan is generally slow compared with a process with only the index.
On the other hand, in JP-A 2002-202973 (KOKAI), a technology is proposed in which a structure collating process can be execute only with an index so that a data scan can be avoided as much as possible, by setting an ID to structure information of a structured document (parent-child relation and sibling relation) in advance and attaching the ID to all index information.
However, in a technology of speeding up the search operation using the index, such as the technology proposed in JP-A 2002-202973 (KOKAI), there is a problem that a speed of a searching process is decreased when plural index types are mixed.
For instance, let us consider a case in which there are <title> and <body> as child elements of <header>, and a vocabulary index (an index of which a location can be identified from a vocabulary) is attached to <title> while the vocabulary index is not attached to <body>. In this case, if a plurality of paths, such as “/header[.//text( )=“XML”]”, is designated in a condition, a value collating process becomes necessary for <body> because the index is not available for <body>, resulting in a decrease of the speed of the searching process.
In addition, for instance, let us consider a case in which it is clear that <title> is included under <patent> by necessity and solely from schema analysis information analyzed at the time of registration. In this case, if a plurality of paths, such as “/header[.//text( )=“XML”]”, is designated in a condition, the speed of the searching process is decreased because a structure collating process is required for <body> although it is not necessary for <title>.
In other words, the speed of the overall searching process may be decreased in spite that a certain path can be processed at a high speed because a data scan is not necessary due to an availability of an index and the like, because the data scan occurs for a specific path.
In general, a searching process is executed by analyzing a search condition, determining a process order for obtaining a solution, and repeating a process of leaving an intermediate candidate that satisfies a data constraint following the process order. The above problem is caused by a fact that a constraint check is strictly performed for all candidates when obtaining the intermediate candidate.