The present invention relates to a structured document managing method and system in which electronized document data is managed by use of a computer, and more particularly to a technique effective in the application thereof to a structured document managing system in which a structured document is managed with a logical structure of the document and an entity construction (or entity structure) thereof associated with each other.
The real progress of the information society has brought about the very energetic increase in amount of electronized document data generated by use of word processors, personal computers or the like. There are desired not only the mere generation or creation of various kinds of documents by a computer for use thereof for display or printout but also the realization of the great improvement in working efficiency by electronically generating documents which play an important part in the work of an organization (or mission critical documents) and electronizing the whole of a work which handles these documents.
Such a mission critical document may be made the object of various processings which include not only the mere display/printout but also the extraction of data from the document, the generation of a derived document, the reuse with a changed display style, and so forth. In many cases, therefore, the mission critical document is generated, by use of a document markup language such as SGML (Standard Generalized Markup Language), XML (eXtensible Markup Language), HTML (Hyper Text Markup Language) or the like, as structured document data which are capable of being easily subjected to mechanical processing. Accordingly, a structured document managing system handling those mission critical documents has need to have a function of storing and managing a structured document to perform the reference to structure information, the edition/revision of the content of the document, the management of a version history.
The structured document described by the above-mentioned document markup language has a double structure which includes an entity structure and a logical structure. The logical structure is a structure which includes the order of arrangement of individual logical elements (chapters, sections, paragraphs and others) forming a document and a mutual inclusion relationship between the logical elements. According to the document markup language, the logical structure of a document is represented in such a manner that a “tag” as a character string indicating the type of each logical element (or an element type) is arranged at each of the head and tail of that element. The entity structure is a structure represented in such a manner that document data is divided into units called “entities” and the reference to another entity is described in the content text. Namely, the entity structure includes a call relationship between entities.
In the case where a document becomes so long that it is inadequate to maintain the document as a single continuous text, there is generally performed the management which includes, for example, dividing the document into a plurality of entities so that the respective entities are assigned to persons in charge of writing/edition. Also, data such as graphic data is handled as a separate entity since it is described in a data format peculiar to each of various processing programs and is therefore incapable of co-existence with a text composed of content character strings and tags. In the case where a document is formed by a plurality of entities, it is general that each entity is stored in one file. Tags used for identifying elements in a structured document, entity reference describing the reference to an entity, and character strings describing comments or the like are generically termed marks.
FIG. 15 is a diagram showing an example of the conventional structured document described by a document markup language SGML. The entity structure of a single SGML document is composed of one entity corresponding to a body of the document (or a document entity) and zero or one or more external entities to which the reference is made directly or indirectly from the document entity.
The external entity includes a text entity described as an SGML text in a manner similar to the document entity, and a non-SGML data entity such as graphic data. (In the present specification, the document entity and the external entity are merely called entity in a general term.) In the case of the document shown in FIG. 15 by way of example, a document entity 101 has the reference to text entities 102 and 103 and a non-SGML data entity 104 therefrom. Also, the text entity 102 includes the reference to the text entity 103 therefrom. As a result, the text entity 103 having the reference from both the document entity 101 and the text entity 102 is shared between the document entity 101 and the text entity 102.
FIG. 16 is a diagram showing an example of a logical structure obtained through the analysis of the conventional SGML document shown in FIG. 15. As shown in FIG. 16, a logical structure possessed by the SGML document can be represented as a tree structure which has elements and data contents as nodes.
In FIG. 16, an elliptic node represents an element and a rectangular node represents data content (parsed character string data or non-SGML data). Also, an area enclosed by dotted line in FIG. 16 represents a set of nodes. A node set 201 corresponds to the document entity 101, a node set 202 corresponds to the text entity 102, node sets 203 and 204 correspond to the text entity 103, and a node set 205 corresponds to the data entity 104. As shown by the relationship between the entities and the node sets, the content of an entity called by the entity reference or an entity to be referred to is developed at a called position and a developed text or a text obtained as the result of development is subjected to syntax analysis, thereby obtaining the logical structure of the SGML document.
In a typical example of the conventional technique, document data is stored in files and is managed on a file system. According to an invention disclosed by JP-A-9-223054 (hereinafter referred to as prior art 1), the reading of files from storage means, the storage of files into the storage means and a version management are realized in units of a file set including a plurality of files. Though the invention of the prior art 1 primarily aims at the management of file groups forming computer programs, the disclosed file management function can also be used for the purpose of managing file groups forming a large scale document. However, the invention of the prior art 1 handles each file as data having no structure. Therefore, in the case where the invention is applied to the document management, it is not possible to perform an operation with the consciousness of the structure of a document.
On the other hand, an invention disclosed by, for example, JP-A-8-44718 (hereinafter referred to as prior art 2) is known as a prior art with which a structured document can directly be managed. A document processing apparatus disclosed by the prior art 2 includes means for analyzing the structure of a registered document to generate tree structure data with logical elements such as chapters and sections taken as nodes and storing/managing the tree structure data. The apparatus further includes means with which a sub-tree forming a part of the document structure is shared between a plurality of documents, and means with which in the case where the addition, deletion and/or updating of-a logical element are made for a sub-tree, the resulting version is managed.