1. Field of the Invention
The present invention relates to a structured document management device and a method.
2. Description of the Related Art
In recent years, various data are managed in a structured document format, such as XML (Extensible Markup Language). There are growing demands for managing various data in the structured document format, which have conventionally been managed by other means, such as formulaic numerical data managed with relational databases, and text data managed with full-text search engines.
Thus, sophisticated queries that designate keywords and structured conditions are being used for structured documents managed in structured document databases, such as XQuery (an XML Query Language) standardized by W3C (World Wide Web Consortium). With this, an increase in search speed is also demanded.
In order to increase the search speed, there is a method that includes dividing a text, which is to be registered in a structured document database, on a keyword basis; associating the divided keyword with document identification information of the structured document to be registered in the structured document database, structure information that indicates the structure of the structured document, and occurrence position information of the keyword in the structured document; and indexing in an inverted file format.
In an index managing system that uses the inverted file, the size of the index generally becomes significantly large. JP-A 2006-172363 (KOKAI) discloses preparing plural compression methods in advance and compressing an index using a compression method that corresponds to a time when registering the index in the inverted file format.
Because time is used as a key to determine the compression method in the art disclosed in JP-A 2006-172363 (KOKAI), the optimal compression method is not always selected.
Generally, a structured document, which is to be registered in a structured document databases, has pieces of data of similar structures that are to be sequentially registered. However, it is often the case that the structure itself of the structure document does not have regularity or relevance. Thus, a high compression rate of the index structure information cannot be expected even when plural compression methods are separately used.