The following relates to the image processing, analysis, classification, comparison, detection, and related arts. The following is described with illustrative reference to spotting applications such as word spotting, logo spotting, signature spotting, and so forth, but will be useful in numerous other applications.
Optical character recognition (OCR) is a known technique for converting an optically scanned handwritten or typed document to an ASCII, XML, or other text-based format. Existing commercial OCR products include, for example, FineReader™ (available from ABBYY USA Software House, Fremont, Calif.). The OCR converted document is readily searched for words of interest. OCR has numerous advantages, but is computationally intensive and sensitive to image quality.
Word spotting (or, more generally, spotting, which can apply to words, objects such as logos, signatures, and so forth, sometimes also referred to as word detection, logo detection or so forth or as word matching, logo matching, or so forth) relates to identification of a certain word of interest in a document image or collection of document images without resort to OCR. Documents processed with word spotting may then be totally or partially processed by OCR, indexed, or flagged for other review or processing. In some applications, the information extracted by word spotting techniques is used for annotating, routing, filtering and redirecting documents. Word spotting operates in image space without conversion to text, and therefore can be computationally efficient as compared with OCR, and can provide a good alternative to manual review of incoming documents.
A typical word spotting algorithm starts by segmenting the document in image space into image blocks corresponding to individual words. The document is typically generated by an optical scanner or digital camera, although the document may be generated or acquired in other ways. The segmenting is suitably done by identifying whitespace that typically surrounds each word and using the whitespace to delineate boundaries of extracted word images. Prior or inferred structural inferred can also be used at this stage, for instance the knowledge that the word can only be present on the document header. Word segmentation can also include global document pre-processing such as orientation correction, de-noising, illumination normalization, document region segmentation, etc. Features are derived from each extracted image, and the features are processed by a classifier to determine whether any extracted image corresponds to a word of interest.
A suitable type of classifiers for word spotting and other detection algorithms are cascaded classifiers that include at least two classifier stages. Some cascade arrangements include one or more fast rejection stages that are computationally efficient, and one or more additional stages that are more computationally intensive but only process the relatively few segmentation blocks that pass through the fast rejection stages. In such an arrangement, any fast rejection stage should produce a low rate of false rejections since any false rejection is not passed onto the downstream stage and hence is irrevocably lost. On the other hand, the fast rejection stage can have a relatively high rate of false positives since any false positive is likely to be corrected (i.e., rejected) by the slower but more accurate downstream stage or stages. It is desirable for the fast rejection stage to be readily configurable to process various different types of words. For example, an environmental agency may want to be able to spot documents containing the word “carcinogenic” and also documents containing the very different word “sulfur”. In some applications in which it is only desired to screen out documents that clearly do not include the word, object, or so forth that is of interest, the classifier may include only a fast rejection stage to provide such screening.
Existing fast rejection stages used in word spotting have typically been based on global features such as the aspect ratio or width of the extracted image. These global features are fast to compute, and can produce low false rejection rates. However, these features are not strongly discriminatory and tend to produce high false positive rates in the initial classifier. The effectiveness of such features for classification can also be highly dependent on the word to be spotted. For example, the aspect ratio feature is highly discriminatory for words of interest that have an unusual aspect ratio, but is less effective for “typical” words that have typical aspect ratios similar to numerous other words. In general, features for detecting a particular object type should exhibit large variation between objects of different types, and small variation amongst objects of the particular object type to be detected. The effectiveness of the features is also related to how well they deal with the variations present in the object they describe. In the case of word spotting one such variation is writing style. For example, the aspect ratio can strongly vary for the same word between different writers. A robust feature exhibits small variation for the same word written by different writers, but large variation for different words even if written by the same writer.
On the other hand, localized features computed using a sliding window or the like can be strongly discriminatory, but are computationally intensive, and therefore typically not well suited for use in an initial fast rejection stage of a cascaded classifier.
While word spotting is presented herein as an illustrative application, it will be appreciated that other applications would benefit from a features generator for generating features corresponding to an image that is readily configurable for different applications and provides features of substantial discriminatory value without concomitant computational complexity. Such a features generator would have value in numerous systems, including classification systems operating in conjunction with suitable classifiers, indexing and search systems, and so forth.