“Digital ink database” as used herein refers to a database which stores handwritten characters, for example a string of handwritten characters forming a handwritten letter.
Overview
Pen-based computing systems provide a convenient and flexible means of human-computer interaction. Most people are very familiar with using pen and paper. This familiarity is exploited by known systems which use a pen-like device as a data entry and recording mechanism for text, drawings or calculations which are quite naturally supported by this medium. Additionally, written ink is a more expressive format than digital text, and ink-based systems can be language-independent.
The increasing use of pen computing and the emergence of paper-based interfaces to networked computing resources (for example see: P. Lapstun, Netpage System Overview, Silverbrook Research Pty Ltd, 6 Jun., 2000; and, Anoto, “Anoto, Ericsson, and Time Manager Take Pen and Paper into the Digital Age with the Anoto Technology”, Press Release, 6 Apr., 2000), has highlighted the need for techniques which are able to store, index, and search (raw) digital ink. Pen-based computing allows users to store data in the form of notes and annotations, and subsequently search this data based on hand-drawn queries. However, searching handwritten text is more difficult than traditional text (e.g. ASCII text) searching due to inconsistencies in the production of handwriting and the stylistic variations between writers.
Digital Ink Database Searching
The traditional method of searching handwritten data in a digital ink database is to first convert the digital ink database and corresponding search query to standard text using pattern recognition techniques, and then to match the query text with the converted standard text in the database. Fuzzy text searching methods have been described, see P. Hall and G. Dowling, “Approximate String Matching”, Computing Surveys, 12(4), pp. 381-402,1980, that perform text matching in the presence of character errors, similar to those produced by handwriting recognition systems.
However, handwriting recognition accuracy remains low, and the number of errors introduced by handwriting recognition (both for the database entries and for the handwritten query) means that this technique does not work well. The process of converting handwritten information into text results in the loss of a significant amount of information regarding the general shape and dynamic properties of the handwriting. For example, some letters (e.g. ‘u’ and ‘v’, ‘v’ and ‘r’, ‘f’ and ‘t’, etc.) are handwritten with a great deal of similarity in shape. Additionally, in many handwriting styles (particularly cursive writing), the identification of individual characters is highly ambiguous.
Various techniques for directly searching and indexing a digital ink database are known in the prior art, see for example: A. Poon, K. Weber, and T. Cass, “Scribbler: A Tool for Searching Digital Ink”, Proceedings of the ACM Computer-Human Interaction, pp. 58-64, 1994; I. Kamel, “Fast Retrieval of Cursive Handwriting”, Proceedings of the 5th International Conference on Information and Knowledge Management, Rockville, Md. USA, Nov. 12-16, 1996; W. Aref, D. Barbera, P. Vallabhaneni, “The Handwritten Trie: Indexing Electronic Ink”, The 1995 ACM SIGMOD International Conference on Management of Data, San Jose, Calif., May 1995; W, Aref, D. Barbera, D. Lopresti, and A. Tomkins, “Ink as a First-Class Datatype in Multimedia Databases”, Database System—Issues and Research Direction, pp. 113-163, 1996; and, R. Manmatha, C. Han, E. Riseman, and W. Croft, “Indexing Handwriting Using Word Matching”, Proceedings of the First ACM International Conference on Digital Libraries, pp. 151-159, 1996.
These systems use a similarity measure to compare a feature vector derived from a set of query pen strokes with a database of feature vectors derived from the digital ink database. The entries in the database that exhibit the greatest degree of similarity with the query are returned as matches. Additionally, some approaches create an index or use a partitioning scheme to avoid a sequential search of all entries in the database. See for example: D. Barbara, W. Aref, I. Kamel, and P. Vallabhaneni, “Method and Apparatus for Indexing a Plurality of Handwritten Objects”, U.S. Pat. No. 5,649,023; D. Barbara and I. Kamel, “Method and Apparatus for Similarity Matching of Handwritten Data Objects”, U.S. Pat. No. 5,710,916; D. Barbara and H. Korth, “Method and Apparatus for Storage and Retrieval of Handwritten Information”, U.S. Pat. No. 5,524,240; D. Barbara and W. Aref, “Method for Indexing and Searching Handwritten Documents in a Database”, U.S. Pat. No. 5,553,284; R. Hull, D. Reynolds, and D. Gupter, “Scribble Matching”, U.S. Pat. No. 6,018,591; A. Poon, K. Weber, and T. Cass, “Searching and Matching Unrecognized Handwriting”, U.S. Pat. No. 5,687,254; and, W. Aref and D. Barbara, “Trie Structure Based Method and Apparatus for Indexing and Searching Handwritten Databases with Dynamic Search Sequencing”, U.S. Pat. No. 5,768,423.
Other studies, J. Hollerbach, “An Oscillation Theory of Handwriting”, Biological Cybernetics, pp. 139-156, 1981, and, Y. Singer and N. Tishby, “Dynamical Encoding of Cursive Handwriting”, IEEE Conference on Computer Vision and Pattern Recognition, 1993, describe efforts to model the physical properties of handwriting for handwriting synthesis.