The present application is directed to a computer operable system and method which incorporates a software program and algorithm for finding an image of a target document or document set in a large collection of documents, based on an image of a query document which is an imperfect copy of the target image. The query image may be captured by at least one of, but not limited to, a digital camera, personal data assistant, document scanner, text reader, video camera, motion picture film camera, computer, cell phone camera or other device capable of generating digital representations. The target image may be displayed on a monitor or a computer screen and its picture directly taken by one of the above devices, or the target image may first be printed on a printer or a similar output device and a picture taken by one of the above devices of the paper reproduction of the target image. Alternatively, the query image could be reproduced from a stored electronic version of the query image. Due to the manner and devices used to capture the query image, often the captured query image will be of a lower resolution, blurry, distorted by rotation and perspective, and of uneven lightness, as compared to the target image.
Thus the present application is directed to finding images of documents containing printed and/or handwritten words and other types of image content such as, but not limited to, line drawings and photographs, where the query image is typically of lower quality than the corresponding target image. The query and/or target images may be stored and retrieved from a computer memory, a database, buffer memory, or other computer operable medium.
Koichi Kise and the Intelligent Media Processing Group at Osaka Prefecture University have proposed a method to retrieve document images which includes the following steps:                1. Identify stable keypoints in an image which are likely to be found reliably both in target images and in query images. For images of documents, good keypoints are word pixel mass centroids or end points of a presented image. Keypoints are a consistently identifiable aspect of an object in an image. The keypoints are derived from the object appearance at particular interest points using localized image features. The keypoints are invariant to image scale and rotation, and are robust to changes in illumination, noise, occlusion, and small changes in viewpoint. In addition, keypoints are highly distinctive, relatively easy to extract with low probability of mismatch, and are easy to match against a (possibly large) database of local image features in close to real-time performance.        2. Form “fingerprints” that may represent the two-dimensional spatial arrangements of local neighborhoods of keypoints. A fingerprint is a string of quantized integers that encode certain distortion-invariant triangle area ratios among the keypoints in each neighborhood. Under a typical implementation, a fingerprint may be a series of integers quantized to the range of [0, 7]. A given target or query image may typically generate several thousand fingerprints depending on the document content. If the keypoints are very stable, the majority of these fingerprints will be identical between the target images and query images of the same document, while few fingerprints will match between the target images and query images of different documents.        3. At a pre-processing stage, a corpus (i.e., a main body or database) of collected target images is analyzed to extract the several thousand fingerprints from each image. For a very large corpus of images, with low probability, a given fingerprint may be found in multiple target images. The fingerprints are of high dimension which may be composed of a sequence of about 35 quantized integers in the range of [0, 7], which can be interpreted as a 35-dimensional vector space. They are hashed into a hashtable, whose entries contain lists of linked fingerprint records. Each fingerprint record contains the identity of a particular fingerprint, its value (the fingerprint string), and the identity of target images containing that fingerprint. The hashtable entry points to the first fingerprint record. In the event that more than one document contains the same fingerprint, the corresponding fingerprint records are linked to each other in a linked list chain, such that the entire list of records of a given hashtable entry can be followed sequentially by traversing the links.        4. At query time, fingerprints are extracted from the query image. Sequentially, each fingerprint is used as a key for looking up the hashtable content, to retrieve relevant candidate fingerprint records of target images. For each such fingerprint record, because of potential hashtable collisions, the query fingerprint string is compared with the target document fingerprint string. If there is an exact match with a particular target document fingerprint string, a vote count for that target image is incremented. The expectation is that many votes will accrue for correct matches between the query and candidate target image, and few votes will accrue for incorrect matches resulting from coincidental matching of a small number of fingerprints.        
This method has been stated to be operable for databases of up to 20,000 images of document pages. However, at least the following shortcomings are considered to exist in the described method.
First, it is not clear that the method can scale from thousands to millions of images. In many cases, the fingerprints found in the query and correct target documents are not an exact match, but differ by one or a few digits due to noise and quantization errors. Hashing methods are not well suited to finding near-neighbors in a high-dimensional space, so the number of votes for a particular document can drop significantly because of such digit mismatches. One method to address this problem is by entering many additional records of the possible modified near miss fingerprint combinations. However, in practice this method can only be applied to a limited number of digit changes, since the number of possible fingerprint combinations grows exponentially with the number of single digit changes.
Second, the method relies on the ability to obtain the same order of keypoints in each local neighborhood in order to generate identical fingerprints between the query and target collection document. A common problem with the existing methods is that word centroids are often co-linear, as words are typically aligned along text lines, thereby making it difficult to determine the exact keypoint order. The ordering of a colinear set of keypoints by increasing angle, as most existing methods do, for example, is particularly prone to noise and rounding accuracy errors, leading to fewer correct fingerprint matches between the query and target document.
Still further, shortcomings of the above method will be set out, and methods and systems to overcome these shortcomings will be discussed in detail in the following pages.