In most office environments, large numbers of documents are printed daily, and the task of organising and filing all of these documents, both physical and electronic versions, is becoming more difficult due to the increasing volume of such documents. A common problem experienced by users is finding the location of an electronic version of a document when the user only has a printed copy. Occasionally this problem is solved by printing some identifier on the printed copy of the document, with the identifier containing the information specifying where the electronic version of the document is stored. However, in certain circumstances such additional information cannot be added to the printed document, either for aesthetic or other reasons.
Another common problem is experienced when the user wishes to generate more copies of a printed document. Often this is simply achieved by photocopying the document. However, a photocopy of a document is generally not as accurate or as sharp as a print from the electronic version, especially where colour continuous tone images are concerned. Furthermore, for a large document, the process of scanning in the entire document, all of which may not be readily available in a condition suitable for copying, may take a considerable amount of time, which is undesirable. Accordingly, rather than photocopying the document, scanning a single page of the document and finding the electronic version of the document from which the single page was printed, and then reprinting the document from the retrieved electronic version of the document is a preferable method of obtaining new copies of the document.
A prior art method exists which involves generating a database of documents that have been printed. The database is indexed in such a way that the electronic files can be found from a scan of a document. Such databases can often be massive in size. Accordingly, a method of generating an indexing key which can be searched for in the database both efficiently and accurately is an important problem that must be solved to enable the desired functionality in a practical application.
One solution to this problem of image indexing key generation existing in the art is to perform block classification of the document that is scanned, identifying regions of the documents that are paragraphs of text and those regions that are images and graphics, and building a comparison function based on this classification. The downside to this method, and methods similar thereto, is that such methods are sensitive to variations in the printing and scanning process, and that such methods cannot properly distinguish between documents with very similar structural layout.
Another existing method of generating an image indexing key is to use a Fourier-Mellin invariant descriptor (FMID). A FMID is largely rotation, scale and translation (RST) invariant. That is, the FMID generated from an image is similar to the FMID generated from a rotated, scaled and/or translated version of that image. FIG. 1 shows a flow diagram of this prior art method 180 of image key generation. In this method 180, a key is generated from the input image received in step 100 by first applying a Fourier transform to the input image in step 110. The complex magnitude of the result of step 110 is then calculated in step 120. The complex magnitude is log-polar transformed in step 130, and the Fourier transform of the result of the log-polar transform is performed in step 140. The complex magnitude of the result of step 140 is calculated in step 150. The image key, output in step 170, is then calculated in step 160 by taking moments of the result of step 150.
A drawback of the method 180 described with reference to FIG. 1 is that, though the FMID is formally rotation, scale and translation invariant, it is not very discriminatory. In particular, the FMID does not distinguish well between images that are similar in their low spatial frequency structure, but differ in their high spatial frequency structure. This is a particular problem for discriminating between images of text documents that have a largely similar structural appearance, but differ in their textual content, as the textual content is represented by high spatial frequency phase structures in the image that do not survive the FMID generation process.