The present application is directed to the imaging arts and more particularly to document comparison and retrieval.
An issue that exists in the area of document comparison is related to the comparison of two given documents and the ability to automatically detect and highlight any changes in content or layout placement between the two. Another issue is related to the recognition of the content of a document and the use of the recognition to retrieve similar or related documents from a document collection.
The solution to either of the above problems relies on the ability to identify matching document content. Existing methods attempt to directly access the document content. However, dealing with arbitrary document content can be difficult. The document content can frequently contain application-specific information or be in a complex proprietary format that is not readily amenable for direct identification and matching of content between documents.