1. Field of the Invention
The present invention relates to a structured-document managing system, a structured-document retrieving method, a structured-document retrieving apparatus, and a structured-document managing apparatus for storing and managing a large quantity of structured documents by arranging the structured documents in a group of structured document databases, which has a layered logical structure, in a distributed manner.
2. Description of the Related Art
In recent years, the advance in the information technology (IT) has made it possible to easily acquire an enormous amount of information. On the other hand, when necessary information is buried under a large amount of data, it may be impossible to make full use of the information. A large amount of information is meaningless unless it is possible to make good use of the information.
Though some pieces of the information are in a uniform format, many others are in free formats. A technology which is expected to be a core technology that enables an integrated management of these various types of information is the Extensible Markup Language (XML). The XML is a standard document description language that has flexible extensibility and cooperability. In addition, many major vendors have assured the support of the XML.
A structured document written in the XML has the following characteristics: (1) the structured document has a layered structure; (2) structure elements of the same path sometimes repeatedly appear in the document and sometimes not; and (3) a character string of a partial document could be large data.
On the other hand, there is a query language as a technology for extracting stored documents. In the field of the Relational Database (RDB), the Structured Query Language (SQL) is known as the query language. The XML Query Language (XQuery) is developed for the XML.
The XQuery is a language designed to handle collections of XML data like databases. The XQuery provides means to extract collections of data that satisfy a condition concerning a value of a structure element or a condition concerning a hierarchical structure. In addition, the XQuery allows for setting of an ambiguous condition concerning a hierarchical structure. For example, it is possible to set a condition to acquire “a ‘comment’ tag anywhere in descendants of a ‘document’” tag”, using a regular path expression.
In retrieval of structured documents such as XML documents, a structured document is often acquired as a retrieval result. A structured document may be generated in an intermediate result of retrieval processing. As an example of a simple method of generating a structured document as a retrieval result or an intermediate result of such retrieval processing, there is a method of tracing layered result data in a preorder to convert the result data into a character string. However, a data amount is large in this method.
According to one widely known manner of storing the structure documents, the structured documents are stored in plural document-storing apparatuses in a distributed manner. In retrieval processing for the structured documents arranged and stored in a distributed manner in this way, in general, it is necessary to transfer intermediate result data or the like of retrieval among the apparatuses. Since a load of transfer processing in the retrieval processing is large, there is a demand for reduction in the processing load of data transfer by, for example, reducing a data size to reduce a transfer amount.
In JP-A 2005-18672 (KOKAI) (hereinafter, “document 1”), a technology for compressing XML data generated is proposed. In a method disclosed in the document 1, a structured document is divided into a portion concerning structures and a portion concerning values with the use of a schemer (a data definition) of the structured document and tag names and attribute names are condensed as a data definition and held in the portion concerning structures to reduce a data size.
For example, since only one set of tag names of an identical path has to be held in a data definition portion, the data size is reduced. Concerning data with repetition, it is necessary to hold the number of repetitions in the value portion. However, concerning data without repetition, by holding information “no repetition” in the structure portion, it is unnecessary to hold the number of repetitions in the value portion.
In the method disclosed in the document 1, it is possible to compress the data size by contriving a data expression format. However, since it is not taken into account that redundancy of data or the like occurs in a retrieval result or an intermediate result, character string generation processing may be performed uselessly.
For example, in retrieval of structured documents, depending on a retrieval condition, character strings of a plurality of structured documents or a partial structured document may be generated from a common data area such as pages in which the structured document is stored. In such a case, in the method disclosed in the document 1, a plurality of character strings are generated individually even if the character strings are completely the same. Therefore, unnecessary character string generation processing is performed and a transfer amount is increased because the character strings redundantly generated are transferred.