1. Field of the Invention
The present invention relates to a method, a system and a program for creating an index so as to search a structured document satisfying a given condition in a structured document database in which a plurality of structured documents are stored.
2. Description of the Related Art
Structured documents having logical structures have been widely used in recent years. One representative example is a structured document described in XML (Extensible Markup Language). The structured documents are generally accumulated in a database constructed in a disk unit.
Herein an XML document is described as a representative example of the structured documents, and its terminology is defined. XML is a description language for an XML document. XML embeds a specified character string called a “tag” (for example, <Price> and </Price>) into its original text. The tag includes a start tag (for example, <Price>) and an end tag (for example, </Price>). The tag is composed of an element name and marks bracketing the element name. The start tag and the end tag are used in pairs. For example, the start tag “<Price>” and the end tag “</Price>” makes a pair, and their element name is “Price”.
An XML document may have a hierarchical structure using tags.
An XML document has an element between a start tag and an end tag (for example, in “<Price>100</Price>”, its element is “100”). Such an XML description makes it possible that an XML document by itself provides data and defines a meaning of the provided data.
In XML, an attribute can be added to a tag. The attribute is a pair composed of an attribute name and a value. For example, in “<Price unit=“yen”>100</Price>”, the attribute name is “unit”, and the value is “yen”.
Besides the XML document, there are various structured documents having logical structures. For example, a representative structured document is an SGML (Standard Generalized Markup Language) document. SGML requires a document type definition (DTD), which is information concerning a logical structure such as an element name and a data type in the structured document. Meanwhile, XML does not always require the DTD.
Thus, a database for accumulating XML documents (hereinafter referred to as XML_DB) needs to be usable even without the DTD of a structured document to be accumulated.
Just as SQL is used as a syntax for representing a condition in searching a relational database, so is XPath (see http://www.w3.org/TR/xpath) or XQuery (see http://www.w3.org/TR/xquery) used as a syntax for representing a condition for a structure-specified search in an XML_DB. The W3C (World Wide Web Consortium) is working for standardization of such an XML-related technology.
To use a large number of structured documents accumulated in the XML_DB, there have been developed various structured document search systems for searching a document, an element, an element name, an attribute name, a value, or the like, which may match a search condition specified by a user, using the above-mentioned syntaxes such as XPath and XQuery. In the structured document search system for the XML_DB, generally an index based on an element name is generally created so as to conduct a fast search in which the element name is specified (hereinafter referred to as structure-specified search).
An index in a database is generally created in such a way that a database administrator specifies a target for which the index is created. However, if an XML_DB does not have any structural definition for describing a logical structure of an XML document to be accumulated, the database administrator cannot have any idea what type of the logical structure the XML document to be accumulated has.
In other words, it is difficult for the database administrator to specify in advance an element and a data type thereof for creating an index, because the database administrator has in advance no list of element names and data types corresponding thereto which the XML document to be accumulated in the XML_DB may include.
To deal with the above-mentioned problem, for example, Japanese Laid-Open Patent Application, Publication No. 2006-18584 discloses a method of determining a data type of an index to be created (hereinafter referred to as an full automatic indexing method). The full automatic indexing method creates indexes for element names and attribute names included in all XML documents which are accumulated in an XML_DB (hereinafter referred to as a structure index). As the indexes include all element names and attribute names in the logical structure index, the method can determine the data type of the created index for each element or value.
U.S. Pat. No. 6,105,022 discloses another method in which a database administrator does not need to specify the data type of an index. The method creates a full text search index by identifying element names and attribute names (hereinafter referred to as structure-specified full automatic search index method).
The full automatic indexing method disclosed in the former-cited Japanese Laid-Open Patent Application, Publication No. 2006-18584 creates logical structure indexes for element names and attribute names included in all accumulated XML documents. Hence, when the structure-specified search is conducted, in which a given element name can be specified, an index for the specified element name can be used in the full automatic indexing method, thus a response to the search is quick. For example, when the structure-specified search of “/Book information/Magazine/Price=100” is conducted, an index for “/Book information/Magazine/Price” can be used.
A condition for a structure in the structure-specified search does not always specify a single structure. For example, in XPath for describing a search condition in the structure-specified search in an XML_DB, a plurality of element names satisfying a given condition in all element names can be specified using a descendant axis. For example, if “//Price” is specified, not only “/Book information/Magazine/Price” but also “/Book information/Used/Magazine/Price” and “/Book information/New-secondhand/Magazine/Price” are specified collectively.
When the structure-specified search specifying a plurality of element names using the descendant axis in XPath (hereinafter referred to as the plural structure-specified search) is conducted, the full automatic indexing method conducts a search using a plurality of indexes present for each element name. Thus search results created for each index have to be merged, which makes a response to the search slow.
On the other hand, the structure-specified full text search index method disclosed in the U.S. Pat. No. 6,105,022 always uses one index for entire XML documents irrespective of a specified element name. Thus search results created for each index need not to be merged. Even when a search specifying plural structures is conducted, a response to the search is substantially the same as that of the structure-specified search, in which a given element name is specified.
However, the structure-specified full text search index method always uses a single index for entire XML documents irrespective of a specified element name. This means that data other than the specified element name are also subjected to processing. Since the response to the structure-specified search which specifies a given element name is slow, a response to the plural structure-specified search may also be slow.
Namely, there is a problem that, when the plural structure-specified search is conducted, a response to the search is not so quick, if the search is conducted using an index created by the full automatic indexing method or the structure-specified full text search index method.
To solve the above-mentioned problem, in a simple manner, a search method is contemplated in which a plurality of indexes with which merge processing of search results would be otherwise necessary are packaged into a single index (hereinafter referred to as a packaged index) in advance, and a search is conducted using the single packaged index. To obtain the plurality of yet-to-be-packaged indexes, a methodology of creating element name-by-element name indexes for all element names is used based on a concept of the full automatic indexing method.
However, if the element name-by-element name indexes are created for all element names, in other words, if the indexes are created for a combination of all logical structures, the number of the created indexes is enormous. This results in an enormous quantity of information to be managed as indexes, and an enormous amount of time to register structured documents and update the indexes.
It is thus an object of the present invention to provide a method for extracting a minimum necessary packaged index for use in conducting the plural structure-specified search for a structured document having a document data-structure.