The present invention relates to a structured document difference string extraction method and apparatus for a document processor such as a word processor capable of extracting a difference character string between structured documents stored as an electronic file.
A structured document is defined as one having embedded therein, i.e., containing information on the logical structure of a document, that is, information such as "this portion of the document constitutes a chapter" or "this portion makes up a title".
The difference extraction between documents is defined as detecting a most coincident combination of elements constituting each document including paragraphs, lines and characters and extracting non-coincident elements as a difference. Suppose that two documents for which the difference is to be detected are "ABCDEFG" and "ACDAEFHN". When the two documents are compared in terms of elements thereof including A, B, C, D, E, F, G and H, the most coincident combination is detected as "correspondence of ACDEF". Also, the difference is detected in the form of "B is deleted", "A is inserted after D" or "G is changed to H".
A conventional method for difference extraction is disclosed in JP-A-2-255964, in which comparison is made in terms of punctuation marks, lines, words and characters. In application of this method to structured documents, a character string representing a logical structure contained in the documents is compared in the same manner as other character strings are compared in the documents.
Extraction of a difference in a structured document by the same means as in a normal document may be inappropriate to the document editor, however, since the result may be non-coincident with the logical structure of the document.
The following Examples 1-3 were considered by the Applicants during development of the present invention, and have not been known or published publicly.