This invention is concerned with providing an automatic comparison of two documents.
When two documents are in the form of electronic text files, they can be compared quite readily by a standard function in conventional word processing programs such as Microsoft Word. However, when both documents are paper hard copies, or one is on paper and the other is an electronic text file, the task of comparing the documents becomes much more difficult. Particular challenges are presented, for example, in the case of verifying that two purported copies of a paper legal document (e.g., a lengthy contract) are in fact identical in text. Conventional practices call for one human being to orally read out from one purported copy, while another human being follows along on the other purported copy. Even when the two individuals are highly skilled paralegals, such a process may be time-consuming, tedious, and prone to error.
Another technique used to compare paper documents entails running both through an optical character recognition (OCR) scan. The two resulting electronic text files may then be compared as if both had been generated from a word processing program. Similarly, if one document is on paper and the other is an electronic text file, the paper document may be OCR scanned to provide a second electronic text file for character-by-character comparison with the text file that was available initially. However, OCR scanning can produce artifacts and discrepancies even where the two paper documents were identical (or where a paper document was printed from the text file to which it is to be compared), so that human review or “clean up” may be required. Also, character-by-character comparison may be impractical as to non-text portions of a document, such as graphs, charts and/or diagrams.