1. Field
The present invention relates generally to the field of document comparison. More particularly, the invention relates to a system and method for the storage, retrieval, and comparison of documents.
2. Description of the Related Art
The advent of text processing application programs have enabled the computer to become a viable tool for document creation. A user is able to develop a document by entering the text comprising the document into the computer using such an application program. Typically, the document contents are stored on the computer in what is known as a file. The user is able to subsequently make modifications to the document by recalling the stored file and making the desired changes. The user can then save the subsequent version of the document containing the modifications as the same file or a different file. Saving the document containing the modifications as a different file enables the user to create multiple versions of the document.
In the course of creating a document, it may be necessary or desirable for the user to compare two versions of the document. One method of comparing the different versions was to create a printout of the files containing the appropriate versions of the document and visually comparing the two printed files word-by-word and noting the changes. This process was labor intensive and very time consuming, especially for large documents. The user still had to laboriously scroll through the two files and note the changes.
Commercially available word processing programs such as Word 97, from Microsoft Corporation, and WordPerfect version 8.0, from WordPerfect Corporation, include a document compare feature that enables the comparison of a document currently displayed on the screen with a document stored on the disk as a file. The text which exist in the document stored on the disk but not in the currently displayed document is copied and inserted into the appropriate position in the currently displayed document and indicated by strikeout codes such as a xe2x80x9credlinexe2x80x9d through the text. The text which exists in the currently displayed document but not in the document stored on the disk is indicated accordingly, for example, by underlining the text. The user of these word processing programs still has to inspect the content of the entire document for the changes.
The present invention hierarchically compares one or more documents stored on a computer. The hierarchical comparison allows a user to efficiently identify and view only those segments within the documents that are different.
In one preferred embodiment, a document server computer stores one or more documents in a document database. Additionally, the documents may advantageously be grouped into one or more document types or categories of documents. A category of document may advantageously be a logical grouping of types of documents including one or more versions of individual documents. The document server computer includes one or more web pages which are accessible by one or more users over a communication medium. The web pages enable a user to remotely request a comparison between two documents. The documents stored in the document database include one or more segments. A document segment is an identifiable portion of a document. For example, a segment may be a chapter, section, subsection, page, and the like. In one embodiment, the document database stores the document separated into its document segments.
A user utilizes a web browser executing on his or her user computer to connect to the document server computer. Once connected, the user can access the web pages and request a comparison between two documents. In one embodiment, the document server computer requests that the user specify a category of document to compare. The document server computer may also request a user password to ensure that the user has authorization to access the requested category of document. Having verified the password, the document server computer lists the documents contained in the requested category of documents.
In one embodiment, the most recent prior version of each of the documents may advantageously be listed in one list and the current version of the documents may advantageously be presented in another list. The user can then select a first document from one list and another document from the other list and request the document server computer to compare the selected documents. Substantially similar or like segments in the selected documents are compared, and segments containing differences or changes are identified. Upon detecting the first difference or change, the comparison is stopped for that segment, and the comparison proceeds to another segment. If a particular segment in one document does not have a substantially similar or like segment in the other document, the particular segment is identified as containing differences or changes. The segments identified as containing modifications or changes are listed in a side-by-side display on the user computer. One side of the display advantageously lists the identified segments from the first document and the other side of the display advantageously lists the identified segments from the second document. Ordinarily, the selected documents would be the current version and the immediately prior version of the same document. However, the user may select any document from the first list to be compared with any selected document from the second list.
The user can then select a first segment from one list and a second segment from the other list and request the document server computer to compare the selected segments. The document server computer compares the selected segments identifying the differences or changes between the two segments. Differences or changes between the selected segments may advantageously be identified at two levels. Components in the segment containing the differences or changes are appropriately indicated as well as the actual elements or subcomponents which are different. For example, a word contained in a sentence may be different between the selected segments. The document server computer can then identify the word as a unit or subcomponent containing differences or changes, and accordingly identify the sentence containing the changed word as the component containing differences or changes. The document server computer then displays the entire contents of each of the selected segments in a side-by-side display on the user computer. The identified components and subcomponents are distinguished in the side-by-side display of selected segment contents so that the user can quickly identify the changes that have been made to the segment contents.
The document server computer advantageously performs a document comparison in manageable hierarchies or stages. In one embodiment, a request to compare a category of document presents a list of documents in the requested category of documents. A subsequent document comparison presents a list of segments in the document that contain differences or changes. A further segment comparison presents the contents of the segment with the actual differences or changes, and the neighborhood of the actual differences or changes, appropriately indicated for clarity and identification.
Then, for example, a user can easily and efficiently observe changes made to a set of proposed government regulations including thousands of pages of regulations and comments, or an agreement containing a large volume of exhibits with specifications without having to view and scroll through the entire document. The proposed regulation or the agreement may contain one or more volumes and each volume may additionally contain thousands of sections and thousands of pages. Changes to the proposed regulations or agreements are made to a copy, or another version, stored on the document server computer. The regulations and agreements may be subject to review by one or more authorized users, and the users can efficiently and conveniently identify and observe the changes made to the regulations and agreements. The regulation or the agreement may be compared in hierarchies. At each hierarchical level, each segment of the document (which has been split up in accordance with natural divisions in the document, such as sections) that has been changed is identified and presented to the user. From this list of segments which contain content changes, the user can easily and conveniently focus in on a particular portion of the regulation or agreement that has been changed. The specified portion of the regulation or agreement of interest is further compared. Ultimately, the contents of the portion of the regulation or agreement of interest are fully displayed to the user. The actual changes, as well as the neighborhood of the changes, are further distinguished in this display for ease of identification.