Organizations typically have large collections of paper documents. Such collections of documents may be stored in electronic data storage systems, wherein the documents are stored as digital images (i.e. electronic representations of the documents).
Searching collections of digital document images for documents containing a specific content, such as user defined text, can be difficult and time consuming.
An existing approach to searching digital document images for a specified keyword involves the use of optical character recognition (OCR) to extract text information from one or more digital document images. A keyword search is then performed on the extracted text information. This OCR-based technique is prone to OCR errors, especially for low quality document images