The following relates to the document processing and related arts.
Document images are sometimes generated in digital image format through the use of optical scanners that create pixel map representations of document pages, or the like. Such document images are challenging to process since the information content is not readily extracted from the pixel maps. In the case of document images generated from typed document pages it is known to employ optical character recognition (OCR) to identify characters (e.g., letters, numerals, punctuation, or the like) and strings of characters. OCR processing is facilitated by the general standardization and uniformity of type fonts.
Document images generated from handwritten documents (including documents containing handwritten portions) are more challenging, due to substantial person-to-person variations in writing style, and even some variation in writing style by a single writer. Handwriting recognition is a process analogous to OCR, in which individual handwritten characters are recognized. Handwriting recognition optionally includes preprocessing such as normalization of slant, skew, aspect ratio, or other variations of handwritten characters. Handwriting recognition has found application in processing of optically scanned document images, as well as in other applications such as electronic tablets or tablet computers having a touch-sensitive pad or screen on which text can be handwritten using a suitable stylus.
Since the handwriting recognition system outputs strings of characters (for example, in ASCII format) it is straightforward to identify any word of interest by searching for the corresponding character string. However, the character-based approach does not readily take into account connectors between characters or identifying aspects of character groupings. It is known to extend the character-based approach by modeling certain commonly occurring character groupings as described for example in U.S. Pat. No. 7,266,236. Handwriting recognition is also computationally intensive as each character on the page image is individually processed and converted to ASCII or another character representation.
Word spotting applications have a modest objective of identifying whether a particular word or phrase occurs in an image of a handwritten (or partially handwritten) document and, optionally, of locating said word or phrase. A typical word spotting application is intended to have high throughput, and serves as a categorizer to categorize each input document as either containing the word or interest or not. Handwriting recognition can in principle be used for word spotting, but its computational intensiveness can adversely affect word spotting throughput. Instead, word spotting typically employs faster pattern recognition or whole-word modeling in which a classifier is trained to match a characteristic pattern or word model using training images of the word of interest. Such algorithms are computationally efficient compared with handwriting recognition, and word-level pattern recognition or modeling can utilize connectors and character groupings throughout the handwritten word of interest in performing the matching.
By accounting for person-to-person variations in handwriting recognition and word spotting applications, the accuracy of the handwriting recognition or word spotting is expected to improve. U.S. Pat. No. 6,256,410 discloses an approach for adapting handwriting recognition for an individual writer. A first pass is performed using a universal handwritten characters model, which enables most handwritten characters in the document to be correctly identified. The universal handwritten characters model is then iteratively refined using the initially recognized handwritten characters as a database to improve the recognition accuracy.
Unfortunately, the writer-adaptation approach of U.S. Pat. No. 6,256,410 is not readily adapted to word spotting applications. There are usually few or no occurrences of the handwritten word of interest in any given document, and since word spotting is performed on a per-word basis rather than on a per-character basis, this means that a first pass will typically provide zero, one, or at most a few candidate word pattern matches. This small or non-existent population of samples is insufficient for performing the iterative refinement operation.