Printed documents may remain in circulation for a period of time and the source of the document may become unclear. A document may have an identification mark that uniquely identifies the document. Example identification marks include identification numbers and/or bar codes. While documents may have an identification mark that uniquely identifies each document, the identification mark may be incorrect or may be missing. Thus, it can often be difficult to identify and/or locate a corresponding electronically-stored version of the printed document.
The electronically-stored version of the document may be stored on a variety of media including a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers), solid-state memories, optical media, magnetic media, and the like. It can also often be difficult to determine if two electronically-stored documents are identical or substantially similar.
A number of techniques are available for analyzing documents, including electronically-stored documents and printed documents. For example, analysis algorithms and methods may be utilized to recognize text and/or graphics based on the underlying pixel data obtained from images of the documents. The analysis algorithms and methods may identify remarkable characteristics in the image of a document, such as “keypoints”. A “keypoint” and techniques for detecting them are known in the art. These are locations of extreme values in a set of images based on the original image where each image in the set is the original passed through a band pass filter.