Today, enterprise organizations can store extremely large numbers of documents (e.g., well into the millions or billions) on one or more servers. Some documents can include multiple versions, including non-final drafts, final drafts, and/or executed versions. Different versions of a document can also be written in multiple file formats (e.g., a final draft contract can be saved in Microsoft Word format and an executed version of the contract can be saved in Adobe PDF format). Such differences can make it difficult to determine whether one document (e.g., a source document) matches another (e.g., a target document), either exactly or within an acceptable margin of error defined in reference to one or more pre-specified parameters.
One situation in which documents may need to be compared is a mass migration of enterprise systems to new or upgraded platforms. During such a migration, it can be important to ensure that large numbers of transferred documents match across systems. It can also be important to understand the nature and extent of any mismatches, as well as to share information among disparate (e.g. non-co-located) teams so that mismatches can be quickly identified and resolved.