The ability to detect and recognize handwritten words in handwritten documents is important for several applications. While the strategic importance of such a capability in current commercial handwriting recognition products is clear, its use in applications such as digital libraries and document management cannot be ignored. With digital libraries, for example, there is a major concern over the preservation and electronic conversion of historical paper documents. Often, these documents are handwritten and in calligraphic styles, as in a sample of a church record used in genealogy studies illustrated in FIG. 1. An important aspect of the use of electronic versions of such documents is their access based on word queries. Handwritten keyword extraction and indexing can also be a valuable capability for document management, in handling a variety of irregular paper documents such as handwritten notes, marks on engineering drawings, memos and legacy documents.
While an OCR algorithm can be used to extract text keywords for index creation of scanned printed text documents, such a process is not yet an option for handwritten documents due to a lack of robust handwriting recognition algorithms. One of the difficulties is due to the fact that the same word could be written differently at different locations in a document even when the document is written by a single author. In cursive script, this often means that a word is written as a collection of word segments separated by intra-word separations that are characteristic of the author. FIGS. 2A-C illustrate this situation, where the word "database" is written by the same author differently in the various instances it occurs. Further, the different word instances could exhibit different amounts of global skew, because lines of handwritten text are often not parallel as in printed text. This latter fact makes the detection of lines of handwritten text a further difficulty during recognition.
The present method of grouping handwritten words was motivated by an application that required image indexing of old calligraphic handwritten church record documents for purposes of tracing genealogy. These documents were written against a tabular background, as shown in FIG. 1. On being given a query about a person's name, the task was to locate the relevant records. While the formulation of query word patterns for these documents is an interesting problem, for the purposes of this disclosure relevant problem is that of matching handwritten words after they have been formulated by a user--perhaps by a training process that generates such pattern queries from actual typed text queries, or perhaps such queries are derived from the handwritten document itself.
A method of localizing handwritten word patterns in documents exploiting a data structure, called the image hash table, to succinctly represent feature information needed to localize any word without a detailed search of the document, is presented in U.S. Pat. No. 5,953,451 issued to Syeda-Mahmood on Sep. 14, 1999. The use of an image hash table to localize objects draws upon ideas of geometric hashing that has been used in the past for identification of objects in pre-segmented image regions. These concepts are discussed in articles by Y. Lamdan and H. J. Wolfson entitled "Geometric hashing: A general and efficient model-based recognition scheme", Proceeding of the International Conference on Computer Vision, pages 218-249, 1988, and "Transformation invariant indexing", Geometric Invariants in Computer Vision, MIT Press, pages 334-352, 1992. More work has been done in extending the basic geometric hashing scheme for use with line features as described in an article by F. C. D. Tsai entitled "Geometric hashing with line features" Pattern Recognition, Vol. 27, No. 3, pages 377-389, 1994. An extensive analysis of the geometric hashing scheme has been done in an article by W. E. L. Grimson and D. Huttenlocher entitled "On the sensitivity of geometric hashing", Proceedings International Conference on Computer Vision, pages 334-339, 1990. Finding good geometric hash functions has also been explored in an article by G. Bebis, M. Georgiopolous and N. Lobo entitled "Learning geometric hashing functions for model-based object recognition" Proceedings International Conference on Computer Vision, pages 543-548, 1995, and an extension of geometric hashing using the concept of rehashing the hash table has been discussed in an article by I. Rigoustos and R. Hummel "Massively parallel model matching: Geometric hashing on the connection machine" IEEE Computer, pages 33-41, February 1992.
All the prior work has used the geometric hashing technique for purposes of model indexing in object recognition where the task is to determine which of the models in a library of models is present in the indicated region in the image. The localization of handwritten words in unsegmented handwritten documents is an instance of image indexing (rather than model indexing) for which no prior work on using geometric hashing is known. Work that uses a serial search of the images for localizing handwritten words as described in an article by R. Manmatha, C. Han and E. Riseman, entitled "Word spotting: A new approach to indexing handwriting", Proceedings IEEE Computer Vision and Pattern Recognition Conference, pages 631-637, 1996, only begins to address the need.
U.S. Pat. No. 5,640,466 issued to Huttenlocher et al. on Jun. 17, 1997, entitled "Method of Deriving Wordshapes for Subsequent Comparison", describes a method for reducing an image of a character or word string to one or more one dimensional signals, including steps of determining page orientation, isolating character strings from adjacent character strings, establishing a set of references with respect to which measurement about the character string may be made, and driving a plurality of measurements with respect to the references in terms of a single variable signal, from which information about the symbol string may be derived.
Localization or indexing of a specific word in the document is done by indexing the hash table with information derived from the word is such a manner that the prominent hits in the table directly indicate candidate locations of the word in the document, thus avoiding a detailed search. This method accounts for changes in appearance of the handwritten word in terms of orientation, skew, and intra-word separation that represent the way a single author may write the same word at different instances. More specifically, localizing any word in the image hash table is done by indexing the hash table with features computed from the word pattern. The top hits in the table are candidate locations most likely to contain the word. Such an indexing automatically gives pose information which is then used to project the word at the indicated location and then verify it. Verification then involves determining the extent of match between the underlying word and the projected word. The generation and indexing of image hash tables takes into account the changes in appearance of the word under 2D affine transforms, changes in the orientation of the lines of text, overall document skew, changes in word appearance due to occlusions, noise, or intra-word handwriting variations made by a single author.
Generally, localization and detection of handwritten words involves four stages: (1) Pre-processing where features for word localization are extracted; (2) Image hash table construction; (3) Indexing where query word features are used to look up hash table for candidate locations; and (4) Verification, where the query word is projected and registered with the underlying word at the candidate locations. The focus of the present disclosure is on stage (1) of this processing, namely, in the stage where features for word localization are generated. Therefore, a feature of the present invention is in the ability to recognize and generate handwritten word regions for purposes of feature generation used ultimately for handwritten word indexing.
Disclosures of all of the references cited and/or discussed above in this Background are incorporated herein by reference for their teaching.