This disclosure relates to the field of computer document processing, and particularly to enhancing the ability of a computer user to compare two or more documents and quickly understand the similarities and differences between or among these documents.
There are many circumstances in today's electronic world where it is desirable to compare two or more documents with one another. Writers of all stripes frequently wish to compare one version of a document with another version of a document, to see what similarities and changes exist as between two different versions of the same document. Legal professionals often need to compare different documents to see how they differ, whether these be, for example, draft contract proposals from two or more parties, or two different patents or patent applications. Lawmakers similarly need to compare various competing proposals for legislation, and to pinpoint what is different and what is the same between or among two or more often extremely lengthy and unwieldy proposals. A copyright attorney may be in the position of comparing two written works, to see if one has “copied” the other sufficiently to constitute an infringement. And, in a myriad of other situations, the need to compare documents and quickly pinpoint their similarities and differences, has grown to near ubiquity in today's electronic world. It is important to understand that this goes beyond and is independent of merely managing different versions of the same document. This encompasses the situation in which it is simply necessary to compare two documents—whatever their origins—for differences and similarities. Editing the document may also be done after or in conjunction with this comparison, by one person or by a group of people, but in many situations, the comparison may be the end result in and of itself.
Perhaps the most familiar method used to compare documents is the so-called “underline strikeout” method, such as is employed in the widely-used Microsoft® Word program. In this method, a first document is compared to a second document, and those words or phrases that are deleted in going from the first document to the second document are highlighted with a “strikeout” indicator on a computerized output device, while those that are added from the first to the second document are highlighted with an “underline” indicator on the output device.
This approach has many drawbacks, some of which will be described here. First, the two documents are not separately presented, but are merged into a single document rather than in distinct, juxtaposed windows. Thus, the user viewing the output presentation can often become confused about what was in the first document versus the second document. Also, the comparison method itself is generally serial, front-to-back, which makes it difficult for the user to identify when text has simply been moved from one place to another. Frequently, if a segment of text is moved, it will appear as a strikeout (deletion) from the first document, and an underline (addition) to the second document at an entirely different location. This gives the user no clue that this segment was actually moved, or where it was moved from and to. Further, for a structured document with multiple headings as well as a hierarchical header structure, the structural meaning of the headings is ignored, and these document headers are treated just like any other items of text. The entire document is outputted en-masse, and the user has no way to start from a header-based “table of contents” and simply “drill down” into the document sections that are of most interest for comparison. There are no suitable statistical or similar comparison summaries of similarities and differences, either for the whole document, or for various document substructures, to aid the user in navigating over to the subsections of greatest interest. Additionally, there is no context information outputted in conjunction with the document text to enable the user to immediately determine where the outputted text fits in the context of the overall document structure. Finally, individual subsections are not in any way “mapped” to one another before comparison, so that in deciding what to underline and what to strikeout, the computer processor's determination that something in the first document is “different” than something in the second document may be erroneous, because it is not comparing the right document subsections with one another following an appropriate subsection mapping.
A moderate enhancement to traditional underline strikeout methods is achieved by Workshare® in its DeltaView® document comparison software. Most significantly, if a segment of text is moved, it is identified as such, rather than simply as a deletion from the first document and an addition to the second document in some unrelated and not-indicated location. In particular, the move is still identified as a deletion from the first document and an addition to the second document, but is given a highlighting different from highlighting given to an ordinary deletion and addition, so as to specifically identify it as a move. This is still problematic, however, since it does not link the old location of the moved material in the first document to its new location in the second document. If there are multiple items of text that are moved or if the documents are long documents and the text is moved far from its original location, the benefit of this feature is lost since the moves will simply get lost amidst one another or the user will have to scroll through a large segment of text to find where the original text was moved to. At most, this is a move-enhanced form of underline strikeout, which otherwise diverges very little from conventional underline strikeout.
In addition to a window outputting the above-discussed enhanced underline/strikeout information, DeltaView® also has two further windows, one showing the first document, and the other showing the second document. However, these two windows show these two documents in clean, unmarked form, and do not in any way highlight changes as between the first and second documents. All information about the changes must be gleaned from the third, enhanced underline/strikeout window. It would be preferable, and would present a much simpler and easier to use output, if the enhanced underline/strikeout window were to be omitted, and if all of the highlighting information summarizing similarities and differences between the two documents were to be presented in only two windows, one for the first document, and one for the second, rather than in the three windows required for the DeltaView® presentation.
Finally, as with Microsoft® Word and similar software, DeltaView® entirely ignores the document structure, and document headers are treated just like any other items of text. As a result, the entire document is again outputted en-masse, and the user has no way to start from a header-based “table of contents” and simply “drill down” into the sections of the documents that are of most interest for comparison, aided by suitable statistical or similar comparison summaries of similarities and differences, either for the whole document, or in association with various document substructures. Additionally, there is no context information outputted in conjunction with the document text that would enable the user to immediately determine where the outputted text fits in the context of the overall document structure. Finally, individual subsections are not in any way “mapped” to one another before comparison, so that in deciding what to underline and what to strikeout, the computer processor's determination that something in the first document is “different” than something in the second document may be erroneous, because it is not comparing the right document subsections with one another following an appropriate subsection mapping.
Microsoft® WinDiff outputs a document comparison that is almost unintelligible to a novice computer user. As with the underline/strikeout method, the two documents are not separately presented, but are merged into a single document rather than in distinct, juxtaposed windows. But the comparison is even more unwieldy than an underline/strikeout comparison, and if a single word differs, it regards the entire line as differing. WinDiff requires the use of an additional window that awkwardly shows a visual output of parallel, interconnected lines representing a map of each document on a line-by-line basis, based on a three-color highlighting scheme indicating first document text not in the second document, second document text not in the first document, and text in both documents. Connecting lines are used to connect text that appears in both documents, including moved text, as between the individual document maps. The whole approach is awkward and non-intuitive at best. WinDiff also ignores the document structure, and document headers are treated just like any other items of text. Thus, WinDiff contains all of the other deficiencies earlier noted with respect to Microsoft® Word and Workshare® DeltaView® that relate to this ignoring of the document structure.
Norton Utilities® File Compare represents something of an improvement over Microsoft® Word and Workshare® DeltaView®, because it entirely foregoes the underline strikeout methodology wherein two documents are merged into a single document for output with underlines and strikeouts, in favor of a side-by-side output of the two documents being compared. In contrast to DeltaView®, this does contain highlighting information summarizing the differences between the documents. That is, Norton Utilities® File Compare does omit the third window required by DeltaView®. However, Norton Utilities® File Compare still has a number of drawbacks.
First, the highlighting of similarities and differences do not occur at a word-by-word level, but appears to occur on a line-by-line or a segment-by-segment basis, and so tend to greatly overstate the degree of difference between the documents, and to greatly understate the degree of similarity between the documents. If perhaps two or three words in a ten or twelve word segment of text have been altered, Norton Utilities® File Compare will highlight the entire ten or twelve word segment as having been altered.
Further, Norton Utilities® File Compare, like DeltaView ®, does distinguish text moves from additions and deletions. However, it does not actually move the output of text in the second document to match the sequencing of the text in the first document, nor does it allow for any form of active highlighting wherein the use can simply designate text in one document and find out where that same text exists in the other document. Instead, in the first document output, it inserts the phrase “moved from line x” opposite the pertinent second document text, where x is the line number in the first document where that same text originates. Similarly, in the second document output, it inserts the phrase “moved to line y” opposite the pertinent first document text, where y is the line number in the second document to which that text has been moved. An actual move of the second document text to juxtapose with the first document text perhaps with some indication of where the moved text originated or of the fact that the text was moved and/or some form of active highlighting, would actually render the understanding of the text move less confusing.
Finally, Norton Utilities® File Compare, like Microsoft® Word, Microsoft® WinDiff, and Workshare® DeltaView®, also entirely ignores the document structure, and document headers are treated just like any other items of text. Thus, Norton Utilities® File Compare contains all of the other deficiencies earlier noted with respect to Microsoft® Word, Microsoft® WinDiff, and Workshare® DeltaView® that relate to this ignoring of the document structure. The entire document is again outputted en-masse, and the user has no way to start from a header-based “table of contents,” and to simply “drill down” into the sections of the documents that are of most interest for comparison, aided by suitable statistical or similar comparison summaries of similarities and differences, either for the whole document, or in association with various document substructures. There is again no context information outputted in conjunction with the document text that would enable the user to immediately determine where the outputted text fits in the context of the overall document structure. And again, individual subsections are not in any way “mapped” to one another before comparison, so that in deciding what to mark as similar and different, the computer processor's determination that something in the first document is “different” than something in the second document may be erroneous, because it is not comparing the right document subsections with one another following an appropriate subsection mapping.