The present invention relates to a method and a system for managing documents so as to indicate changes among a plurality of versions of a document by specifying and showing each difference between the versions.
Recently, documents to be widely distributed and reused, such as product manuals, have been written and stored in structured document formats, such as SGML (Standard Generalized Markup Language) in order to facilitate their distribution and reuse. As for frequently revised documents, a new version document (hereinafter simply referred to as a version) is generated each time a document is revised, and each new version is generally managed as a separate file. To intuitively grasp changes between versions, it is effective to employ a “difference indication” method in which matching relationships between corresponding character strings in two files of different versions are extracted, and then portions (strings) which do not have corresponding matched portions (strings) are indicated as differences.
Japanese Laid-Open Patent Publication No. 9-319632 (1997) discloses a method for managing versions of structured documents, describing a technique of extracting differences between structured documents and indicating the extracted differences. Specifically, this version management method sets one of two versions written in a structured document format as a reference version, extracts difference information between the two versions (that is, extracts each change, and the portion to which the change has been made), and outputs an SGML document (hereinafter referred to as a “difference-embedded document”), which is structurally described and embedded in the reference version. This difference-embedded document can be displayed by use of SGML document editing software available, etc. to highlight the changes from the reference version. Furthermore, each version can be restored by interpreting the structural description of the changes (difference information) embedded in the difference-embedded document, and converting the structure of the difference-embedded document based on the structural description. That is to say, a difference-embedded document is a document in which contents of two versions are efficiently described so that the contents of two versions are separable.
Further, in order to efficiently manage versions of a frequently revised documents, the method for managing versions according to the above patent publication sets a certain version as a reference version. And each time a new version is created, the method extracts difference information between the new version and the reference version. After that, the method outputs a “difference document” in which only the obtained difference information is structurally described, and stores the output difference document as version management data in order to reduce the amount of data required for version management. In this case, it is possible to restore the two versions by interpreting the structural description of the changes (difference information) in each difference document, and converting the structure of the reference version based on the structural description.
Like HTML (Hyper Text Markup Language), XML (extensible Markup Language) is a structured document description language intended to be used on the Internet. XML is structurally a subset of SGML and can define a document structure freely as is the case with SGML. Not only can XML document data be displayed and printed out by use of an XML-aware Web browser, but also it can express various data based on a document structure defined for a specific application, which makes XML useful as a data exchange format on the Internet. Recently, various industries have been employing their industry-standard data exchange formats defined by use of XML.
A document body having a logical structure can be described in XML without using a prepared DTD (document type definition), which is not possible with SGML. However, it is necessary to mark up each element constituting the logical structure, instead, by sandwiching the element between a start tag and an end tag. To express an element “participant name” which is composed of elements “surname” and “first name”, for example, it is necessary to write a line such as: “<participant name><surname>Hitachi</surname><first name>Taro</first name></participant name>”. In SGML, on the other hand, if a DTD clarifies that the element “participant name” is composed of the elements “surname” and “first name”, the end tags “</surname>” and “</first name>” can be omitted by writing a line such as: “<participant name><surname>Hitachi<first name>Taro</participant name>”.
In XML, it is possible to write a document having a logical structure without necessarily preparing a DTD by sandwiching each element in the document between a start tag and an end tag, as described above. Capitalizing on this advantage, it is possible to freely combine various tag sets used for writing industry-standard data to write a document. The XML namespace is used to avoid “collision” between element names (duplication of an element name), which may occur when a plurality of tag sets are used in a document. Consider a case in which a document including formulas and tables is written using three types of tag sets for writing the entire document body, the formulas, and the tables, respectively. Furthermore, suppose that the document body tag set includes a tag “<title>” for indicating a document title, and the table tag set also includes a tag “<title>” for indicating a table title. In such a case, the expression “<title>XXX survey results</title>” appearing in the document, for example, is vague as to which tag set the expression belongs to. To clarify which tag set each tag belongs to, a tag indicating a table title, for example, is expressed as “<table: title>”. In this case, the word “table” indicates a namespace specifying a table tag set. Use of namespaces enables an application to discriminate each tag set even in an XML document including a plurality of tag sets. Furthermore, it is possible to regard only a tag set belonging to a specific namespace as a target for processing.