1. Field of the Invention
The field of the invention relates generally to systems and methods of electronic document comparison.
2. Related Background
FIGS. 1 and 2 illustrate the relationship between documents, objects, users and applications users may use to create, modify, compare and manage electronic documents.
An electronic document, or document, may be viewed as a collection of user-input data used with a certain application (such as, Microsoft Word®, Corel WordPerfect® and Microsoft Excel®). In a general sense, a document is the data or information generated when using a computer application. FIG. 1 is a generalized block diagram illustrating the relationships between a document that may be used with the present invention and an application as such relationships exist in conventional document creation, management and editing programs. A user 101 interacts with an application 102 to create, view, alter, edit or manage a document 103. The application makes use of the document so the user may edit the document, view the document, or perform other actions in relation to the document.
With the advent of OLE (Object Linking and Embedding), a document may contain data from other applications as well as from the main application. The data from other applications are contained in “objects”, or “document objects.” One example of the use of OLE is when a Word document has a spreadsheet table from Excel embedded within it. In this example the excel spreadsheet is an object within a Word document. Embedded objects may include text, tables, pictures or drawings, or other forms of data. A document with one or more objects from other applications is typically referred to as a “Compound Document”. Unless otherwise specified, an “object” refers to a section of the document that is created from, or edited by, an application other than the application that edits or creates the primary document the object is embedded into.
Single format documents, that is documents not including embedded objects from other applications, are often compared using the well-known algorithm called ‘LCS’ (longest known sequence) or HCS (heaviest common sequence) to determine differences between two documents. There exist specialized adaptations or versions of LCS and HCS specially made to compare Word Documents, Excel Documents, HTML documents and
PDF documents. In addition to LCS, other comparison algorithms include: HCS (heaviest common sequence), LCSS (longest common sub sequence) or MSS (matching similarity sequence). These comparison algorithms are implemented in comparison engines, some of which are integrated into document creation and editing applications (such as Word, Excel, Open Office™, StarOffice™, etc.), and some of which are implemented separately from the document creation and editing applications, as discussed below in connection with FIG. 3.
FIG. 2 is a generalized block diagram illustrating the relationship between the objects of an electronic document and the corresponding application used to create, edit or view them. A compound document 201, such as a Word document, may include objects such as an Excel spreadsheet, a picture, a PowerPoint slide or graphic, and a Visio drawing. The compound document 201 is created and edited by an application 202. In this example, the application creating and editing a compound Word document may be Microsoft's Word program. The applications 203 creating and editing the embedded objects are, respectively, Microsoft's Excel, Paint, PowerPoint and Visio.
Conventional document management, creation, editing and viewing applications often include the ability to compare documents and output a document which illustrates the differences between two documents. Typically, the output document including indications of the differences between the two input documents is referred to as a “redline” or “redline document.” FIG. 3 is a generalized block diagram illustrating a conventional document comparison application as may be found in the prior art. A document comparison engine 303 may compare an original document 301 (or first document) to a modified document 302 (or second document). The output of the comparison is a “redline” document or comparison output document 304. Typically, the comparison output document provides indications of what has changed between the original document and the modified document. Conventional document comparison engines and applications provide for comparison of single format documents.
Accordingly, a need exists to provide a comparison system and method capable of comparing compound documents.