1. Field of the Invention
The present invention relates to a document searching system and a document searching method used for conducting a search in a structured document having a logical structure that is logical and hierarchized.
2. Description of the Related Art
Conventionally, distributed relational database systems are popularly used to deal with a large amount of computerized documents. In a case where a distributed relational database system is in a large scale, the data in the documents is converted so as to conform to a predetermined schema before being stored into a plurality of computing machines. In such a distributed relational database system, an interface is provided for the user as if one computing machine and one database were used. In this situation, the schema denotes a definition related to structures of tables and columns used in the database system and the order in which the pieces of data are stored in the database system.
In the distributed relational database system, when a search request has been received from the user, the plurality of computing machines that are connected to one another via a network perform an execution process in collaboration. As a result, it is possible to realize an advanced searching function to conduct a search in a large-scale group of documents, which is too large to store in one computing machine.
In a distributed database system, a system called “shard-nothing type” is also used. In a “shared-nothing type” system, the data is stored into each of the computing machines in an exclusive manner so that it is possible to store a larger amount of documents. In this type of distributed database system, a function to perform a join operation is provided so that it is possible to conduct a search among the plurality of computing machines. When such a join operation is performed, the computing machines need to transfer interim result data to and from one another.
Accordingly, such a distributed database system has a problem where the throughput and the response are degraded because of the time to transfer the interim search result data. To solve the problem, many techniques related to the join operation have been proposed, including an example disclosed in JP-A 2001-109758 (KOKAI). JP-A 2001-109758 (KOKAI) discloses a technique that uses a multiple mapping mechanism in which one view table is able to store therein mutually different criteria that are mapped in a plurality of mapping processes and that makes it possible to switch between databases to be accessed, without having to switch the view table to another one and without having to change the access items in the view table. This technique makes it possible to flexibly respond to changes in the data being a target of the access. Also, as a method for executing the join operation, a semi-join method and a hash join method have been proposed.
As explained above, in the distributed database system, by using the various methods for executing the join operation that have conventionally been proposed, it is possible to reduce the amount of data transfer.
In recent years, structured documents have rapidly grown popular. Because structured documents have grown popular, structured document databases that specialize in structured documents are also becoming popular so that it is possible to manage a large amount of structured documents.
A structured document is a document that includes data contents and a data structure, while the data structure has a logical relationship (i.e., a document logical structure) with elements. Examples of meta languages used for writing a structured document include Standard Generalized Markup Language (SGML) and eXtensible Markup Language (XML) that is formulated by World Wide Web Consortium (W3C) and has rapidly been growing popular recently.
A structured document database uses a search language for the structured documents and stores therein information that expresses the logical relationships among the elements that are stored in the structured documents. When a search is conducted in the structured documents, it is possible to realize a search having a high level of precision by specifying a logical structure of the elements with a search formula and specifying the structure of the structured documents as a search criterion. For example, JP-A 2005-190163 (KOKAI) discloses a technique for storing, as a template, information in which commonalities and regularities of the logical structures in a group of structured documents are condensed and using the template when a search is conducted.
Next, the search formula that is used in a case where the structured documents are XML documents will be explained. In the search formula, for example, XML Path Language (XPath Language) or XML Query Language (XQuery Language) that are formulated by W3C is used to specify a structure of the structured documents as a search criterion.
XPath is a language in which a method for writing path formulae used for specifying a specific element or a specific attribute within an XML document is defined. XQuery includes XPath as its subset and is a search language for XML documents in which it is possible to write complicated operations, such as a repetition, an assignment, and a comparison, that are performed on an XML element or an attribute specified using XPath or to generate XML elements and attributes. By using these search languages, it is possible to write search formulae that are more complicated than conventional search formulae.
In a distributed structured document database system in which structured documents are stored while being distributed in a plurality of computing machines, sometimes it is necessary to transfer interim result data having a logical structure when a join operation is performed by using a search formula as described above. In this situation, by applying the method for executing the join operation described above to the distributed structured document database, it is possible to reduce the cost of the transfer process.
However, in the distributed structured document database system, problems as explained below that are unique to the system arise: First, in a distributed relational database as described above, when a semi-join operation is performed, because data is joined by using a predetermined field as a join key, only the data in the field needs to be transferred. On the other hand, in a distributed structured document database, in some situations, the elements in a join key have a logical structure. For example, in some situations, a join key may be a sequence having a sequence structure, and the elements may have a deep tag (list) structure that includes numerical values and character strings such as XML nodes (i.e., XML sub-trees). As explained here, a join key used in a distributed structured document database system has a high possibility of having a larger data size than a join key used in a distributed relational database.
Consequently, a distributed structured document database has a problem where, even if the amount of data transfer is reduced by using the semi-join method or the like, the amount of data transfer increases because the data size of the join key is large, and therefore the response is degraded.