1. Field of the Invention
This invention relates to a system for and method of searching structured documents stored in a database using indexes, and more particularly to a structured document search system and method suitable for a case where a value search covering the values of a plurality of nodes and a search of a related node common to the plurality of nodes are specified depending on a search condition.
2. Description of the Related Art
A document having a logical structure is termed a structured document. In a structured document, the logical structure of the document is indicated by tags written in the document. A structured document whose logical structure is represented using the tags is suitable for processing on a computer.
Extensible Markup Language (XML) is widely as a means for describing data using tags. XML is characterized in hierarchy of data using meaningful tags and in free extensibility of structure. As XML-applied technology putting these features to good use, a database called an XML database (XMLDB) is known. The XML database is controlled by a database management system called an XML database management system (XMLDBMS). The XML database provides the function of storing XML documents and searching for an XML document (a structure specified in the XML document).
The XML document, which is a document written using XML, is known as a representative of structured documents. An XML document is composed of elements constituting a tree structure. Each of the elements, which is also called a node (or tag node), is composed of a tag and a content (or value). The tree structure begins with an element serving as a root (a root node). The individual elements are configured in such a manner that they have a parent-child relationship and a brother-sister relationship.
A standardized query language is frequently used in searching for nodes in an XML document. XPath and XQuery are known as typical query languages. XPath is used to do a search by specifying the positions of elements (or nodes) in the XML document.
In an XML document search system (or a structured document search system) including an XML database management system, to speed up a search, indexes are caused to correspond to nodes regarded as possible targets of a value search (refer to paragraph 0013 of Jpn. Pat. Appln. KOKAI Publication No. 2006-018584, for example). Such indexes are called value indexes.
FIG. 2 shows an example of XML documents in tree structure form. In a database (XML database) in which the XML documents of FIG. 2 have been stored, suppose a book satisfying the condition that title is “TCP . . . ” is searched for. In this case, a query made by a client (a client terminal) (hereinafter, referred to as a first query) is described in, for example, XPath, this gives the following:
/bib/book[title=“TCP . . . ”]
To speed up a search on the basis of a first query (XPath), value indexes are caused to correspond to title nodes regarded as possible targets of a value search. The value indexes are composed of sets of values (keys), such as “TCP . . . ” and “Adv . . . ,” and node IDs. A node ID, which is a unique number allocated to each node, indicates a logical location (node position) in an XML document stored in the database.
FIGS. 22A to 22C show examples of value indexes. FIG. 22A shows an example of value indexes of nodes (title nodes) having values of title names. FIG. 22B shows an example of value indexes of nodes (last nodes) having values of last names. FIG. 22C shows an example of value indexes of nodes (first nodes) having values of first names. These value indexes are generally held in a value index table.
In a search on the basis of a query from the client to the XML document search system, an index is searched for using the value of a node (element) as a key. If the corresponding index is found, a node ID corresponding to the value can be obtained. In the example of the first query (XPath), the XML document search system can determine from the value index caused to correspond to the title node that there is a node satisfying the condition that title is “TCP . . . ” and the node ID is 3 (see FIG. 22A).
As described above, the XML document search system which uses an index (value index) in a search has the following advantages. First, the XML document search system can determine whether there is a node conforming to the condition of the query without searching all of the XML documents (or scrutinizing the XML documents) stored in the database. If there is such a node, the XML document search system can determine the position of the node. This enables the XML document search system to carry out a search at high speed.
To speed up a search when structural conditions are specified, a method of extracting structural information on the XML documents stored in the database and compiling an index is known. Such an index is known as a structure index. The structure index is composed of a set of a path character string indicating a structure, such as “/” or “/bib,” and the node ID of a node having the structure. If there are a plurality of nodes conforming to the same path character string (e.g., “/bit/book” in the example of FIG. 2), the plurality of node IDs correspond to the same path character string. The data structure of such a structure index is the same as a structure index applied to an embodiment of the invention explained later. Thus, refer to FIG. 6, if necessary.
In the first query (XPath), the XML document search system finds one node (a node whose node ID is 3) conforming to the condition of the value search on the basis of a value index. It cannot be determined from only the value index whether the node complies with the structural condition (/bib/book/title) given in XPath. Thus, using the structure index, the XML document search system checks whether the node complies with the structural condition. From the structure indexes (structure index table) of FIG. 6, it is seen that there are nodes complying with the structural condition (/bib/book/title) (i.e., three nodes having a structure represented by “/bib/book/title”) and the node IDs of the nodes are 3, 13, and 26. The node whose node ID is 3 satisfies both the structural condition and the value search condition. Therefore, it can be determined that the node whose node ID is 3 fulfills all of the search conditions.
As described above, the XML document search system using a value index and a structure index in a search has the following advantages. First, the XML document search system can determine whether there is a node conforming to the conditions of the query including the structural condition without searching all the XML documents stored in the database. If there is such a node, the XML document search system can determine the position of the node. This makes it possible to carry out a search at high speed.
However, in the above conventional techniques, when a query in which a plurality of targets of value search have been specified is processed, this might delay the search. The reason is that the process of searching all of the XML documents (scrutinizing the XML documents) stored in the database is needed. An example of a query in which a plurality of targets of value search have been specified is a query in which a plurality of nodes (tag nodes) acting as the targets of value search are specified by the AND operator “and.” When a plurality of nodes are searched for under the condition including the AND operator “and,” this might delay the search for the above reason.
Hereinafter, such a search will be explained using a case where an author who satisfies the condition that the value (last name) of a last node is “Stevens” (last=“Stevens”) and the value (first name) of a first node is “W.” (first=“W.”) is searched for on the basis of the following second query (xPath):
/bib/book/author[last=“Stevens” and first=“W.”]
As described above, value indexes are caused to correspond to nodes regarded as possible targets of a search. The value indexes are composed of sets of a value (key), such as “Stevens” or “Buneman,” and a node ID. In the case of a second query, as shown in FIGS. 22B and 22C, assigning a value index to each of (i) the last nodes and (ii) the first nodes makes it possible to search at high speed for nodes which satisfy the condition that the last name is “Stevens” (last=“Stevens”) and nodes which satisfy the condition that the first name is “W.” (first=“W.”).
However, the search condition shown in the second query is the AND condition that “author who is [A] and [B].” Therefore, of the last nodes and first nodes searched for on the basis of the value index, a node having the same parent node (author node), that is, a node linked with the same node (author node), has to be selected. However, such a link cannot be determined from the value index. Accordingly, in the conventional techniques, all of the XML documents stored in the database have to be actually searched from the last nodes and first nodes searched for, causing a delay in the search.
In the conventional techniques, such a delay is caused as described below even in a search using a structure index. The node IDs of the last nodes searched for from the value index on the basis of the second query, that is, the node IDs of nodes satisfying the condition of the value search that the last name is “Stevens” (last=“Stevens”) are 16 and 29 (see FIG. 22(b). Moreover, the node IDs of the first nodes searched for from the value index, that is, the node IDs of nodes satisfying the condition of the value search that the first name is “W.” (first=“W.”) are 8, 18 and 23. (See FIG. 22(c).)
When a set (candidate set) of node IDs of the last nodes and a set (candidate set) of node IDs of the first nodes have been acquired, it is determined from the structure index whether, for example, the node IDs included in the two candidate sets satisfy the structural condition (/bib/book/author/last for the last nodes and /bib/book/author/first for the first nodes). In this example, it is seen that all of the node IDs fulfill the structural condition.
Next, of all of the combinations of the last nodes and first nodes narrowed down from the index, the combinations having the same parent (author node) have to be selected under the AND condition that an author who is [A] and [B]. In this example, the nodes satisfying the AND condition are only in a combination of the one whose node ID is 16 among the nodes whose last name is “Stevens” (last=“Stevens”) and the one whose node ID is 18 among the nodes whose first name is “W.” (first=“W.”).
However, it cannot be determined from the value index and structure index whether the last node and first node have the same parent. Accordingly, in the conventional techniques, all of the XML documents stored in the database have to be actually searched, resulting in a delay in the search.