1. Field of the Invention
The present invention relates to increasing image retrieval performance of images by providing relevance feedback on word images contained in the images.
2. Description of the Related Art
In information retrieval systems where text is recovered from raster images through optical character recognition (OCR), errors in recognized text occur in even the most accurate systems. These errors will lower the effective retrieval performance of keyword searches. OCR is the machine recognition of printed characters. OCR is used, for example, in the banking industry to process checks and credit card slips. OCR systems can recognize many different OCR fonts, as well as typewriter and computer-printed characters. When text documents are scanned into a computer, they are “photographed” and stored as pictures in the computer. The pictures are raster images, which are a category of images into which all bitmapped images and video frames fall, such as GIF, JPEG, and MPEG images.
OCR software analyzes the light and dark areas of raster images in order to identify each alphabetic letter and numeric digit. When OCR software recognizes a character, it converts the character into an actual ASCII text character. This OCR conversion is performed because the actual text characters take up considerably less room on a computer disk than images.
Users can perform queries on OCR data to find and retrieve full page images of multi-page documents in which the query terms are located. Errors in the OCR data will lower the effective retrieval performance of keyword searches. Further, systematic errors in recognition exist when OCR software is used to search OCR text. For example, the name “Hilbert” is much more often misrecognized as “Hubert” instead of the correct “Hilbert.” In this particular example, users would miss most of the relevant results.
There are numerous automatic approaches that can be used to improve search performance on imperfect OCR data, but they work best when queries are long, for example, five or more words, when documents are long for context and term redundancy, and when vocabularies are relatively static.
Some methods try to correct OCR errors before users issue queries. For example, voting-based methods use the majority of OCR results obtained from a number “n” different OCR systems to automatically decide the “right” spelling by outputting the text that results from the majority of the systems. These methods are based on the premise that different OCR systems tend to make different mistakes. Besides being “n” times slower, this method will not eliminate all errors because in some cases all OCR systems used produce the incorrect text for a given image.
Other methods assume the existence of a dictionary that is used to automatically correct words that are not found in the dictionary. For new words that are unlikely to be in any dictionary, these methods force these new words to become one of the words from the dictionary. By forcing these new words to become one of the words from the dictionary, these methods over-correct. In other words, if the OCR recognizes a word correctly, but this word is not in the dictionary, then the method will still change the word into one that is in the dictionary that is closest to the text produced by the OCR. Over-correction is undesirable, especially in the scientific domain where it is very likely that new terms are defined in slides, for example, project names, people's last names, and acronyms. Some methods also under-correct. For example, assuming that a word was correctly recognized simply because it was found in a dictionary is incorrect.
Yet other methods show that OCR data might in fact not significantly degrade the performance of information retrieval systems. Unfortunately, these results are only valid when the queries are long and the documents have hundreds or thousands of terms. Examples of long queries are Text Retrieval Conference (TREC) queries that have five or more terms.
Showing users the original image instead of the misrecognized text-based version is used in some OCR tools for manually correcting OCR errors in scanned documents. These systems, however, are used to proof a single term at a time and have not been designed in a document retrieval setting, and in particular, for document retrieval settings where users are presented with many terms.
What is needed is an interactive solution for increasing retrieval performance of images that works well when queries are short, for example one to two words, when documents are short with little context and term redundancy, and when vocabularies are relatively dynamic. It would be further desirable to create a user interface for increasing retrieval performance by allowing users to provide relevance feedback on word images.