1. Field of the Invention
The present invention relates to a system and a method for managing structured documents in a searchable and an editable manner.
2. Description of the Related Art
Structured documents in such languages as Extensible Markup Language (XML) have been arranged into databases for searches by content or by document structure or for partial reuse. Documents of up to several pages are stored in a sufficiently effective fashion when simply put into files suitable for word searches only; larger documents are better utilized when arranged for searches by document structure, i.e., in a manner suitable for searches through the documents by partial structure or by attribute information attached to elements making up such structures. Documents may also be reused with their partial structures kept intact or may be edited in units of partial structures. Where a bulky document is to be edited in partial structures, plural workers may each work on a specific part of the document in a cooperative editing environment established for the occasion. In order to provide functions for implementing the above-described types of editing and reuse of structured documents, it is vital to arrange structured documents into databases.
One way to put a structured document into database format involves utilizing an existing relational database in which elements making up the structure of the document are each used as a record. In that case, the document structure is implemented by describing a parent-child relation of the elements using fields in each record. Various kinds of attribute information may be deployed in the fields. Since the relational database permits searches by field, specifying particular fields makes it possible to perform rapid searches by attribute information or by text. On the other hand, to search through structured documents requires successively tracking the fields that represent the parent-child relation of the elements. Every time a parent or a child element is to be referenced, it is necessary to acquire a new record. The need to frequently obtain new records generates repeated access to the database. Thus putting structured documents into a relational database turns out to be a singularly time-consuming, inefficient exercise in terms of searches by document structure.
Alternatively, structured documents may be regarded as a tree structure and expressed in a linked list. Data structures illustratively in a linked list format may be preserved in an object-oriented database and expanded into memory as needed for searches by document structure. In this case, it is easy to make rapid searches based on document structures. It should be noted, however, that attribute information about elements and other information such as contents attached to leaf elements need to be stored along with parent-child relation information about the elements. In making searches by use of such information, it is necessary to keep track of the document structures while referencing the information attached to each of the elements involved. That means searches based on attribute information or on contents are very inefficient and time-consuming.
Furthermore, partial editing of a document may cause changes in an element-to-element parent-child relation affecting the ancestor-descendant of the document as a whole and leading to numerous updates in the index being established. As a result, in the abovementioned cooperative editing environment where plural workers work jointly on a large document, responses to editing actions tend to be prolonged and ponderous. In particular, where document structures are preserved in a tree structure or as a linked list with a binary format index, structural changes cannot be made where desired because they would require reconstituting the entire index.