“ink query” as used herein refers to a series of hand-drawn digital ink strokes prepared by a user as a search term or phrase.
The increasing use of pen computing and the emergence of paper-based interfaces to networked computing resources (for example see: P. Lapstun, Netpage System Overview, Silverbrook Research Pty Ltd, 6 Jun. 2000; and, Anoto, “Anoto, Ericsson, and Time Manager Take Pen and Paper into the Digital Age with the Anoto Technology”, Press Release, 6 Apr. 2000), has highlighted the need for techniques which are able to store, index, and search (raw) digital ink. However, searching handwritten text is more difficult than traditional text (e.g. ASCII text) searching due to inconsistencies in the production of handwriting and the stylistic variations between writers.
The traditional method of searching handwritten data in a digital ink database is to first convert the digital ink database and corresponding search query to standard text using pattern recognition techniques, and then to match the query text with the converted standard text in the database. Fuzzy text searching methods have been described, see P. Hall and G. Dowling, “Approximate String Matching”, Computing Surveys, 12(4), pp. 381-402, 1980, that perform text matching in the presence of character errors, similar to those produced by handwriting recognition systems.
However, handwriting recognition accuracy remains low, and the number of errors introduced by handwriting recognition (both for the database entries and for the handwritten query) means that this technique does not work well. The process of converting handwritten information into text results in the loss of a significant amount of information regarding the general shape and dynamic properties of the handwriting. For example, some letters (e.g. ‘u’ and ‘v’, ‘v’ and ‘r’, ‘f’ and ‘t’, etc.) are handwritten with a great deal of similarity in shape. Additionally, in many handwriting styles (particularly cursive writing), the identification of individual characters is highly ambiguous.
The Netpage System
Pen-based computing systems provide a convenient and flexible means of human-computer interaction. Most people are very familiar with using pen and paper. This familiarity is exploited by known systems which use a pen-like device as a data entry and recording mechanism for text, drawings or calculations which are quite naturally supported by this medium. Additionally, written ink is a more expressive format than digital text, and ink-based systems can be language-independent. Moreover, the majority of published information is distributed in paper form, and most people prefer reading printed material to reading information on screen-based terminals. However, online applications and publishing systems have a number of advantages over pen and paper, such as the ability to provide information on demand, document navigation via hypertext, and the ability to search and personalize the information.
The Netpage system, see Silverbrook Research, Netpage System Design Description, 8 Sep. 2000, provides an interactive paper-based interface to online information by utilizing pages of invisibly coded paper (also referred to herein as an interactive page) and an optically imaging pen. Each interactive page generated by the Netpage system is uniquely identified and stored on a network server, and all user interaction with the interactive page (i.e. paper) using the Netpage pen is captured, interpreted, and stored. Memjet digital printing technology, see Silverbrook Research, Memjet, 1999, facilitates the on-demand printing of Netpage documents, allowing interactive applications to be developed. The Netpage printer, pen, and network infrastructure provide a paper-based alternative to traditional screen-based applications and online publishing services, and supports user-interface functionality such as hypertext navigation and form input.
Netpage is a three-tiered system comprising a client layer, a service layer, and an application layer, as depicted in FIG. 1. The client layer contains the Netpage pen, Memjet printer, and a digital ink relay. Typically, the printer receives a document from a publisher or application provider via a broadband connection, which is printed with an invisible pattern of infrared tags that encodes each page with a unique identifier and the location of the tag on the page. As a user writes on the page, the imaging pen decodes these tags and converts the motion of the pen into digital ink, see Silverbrook Research, Netpage Pen Design Description, 27 Apr. 2000. The digital ink is transmitted over a wireless channel to a relay base station, and then sent to the service layer for processing and storage.
The service layer consists of a number of services that provide functionality for application development, with each service implemented as a set of network servers that provide a reliable and scaleable processing environment. The infrastructure provides persistent storage of all documents printed using the Netpage system, together with the capture and persistent storage of all digital ink written on an interactive page. When digital ink is submitted for processing, the system uses a stored description of the page to interpret the digital ink, and performs the requested actions by interacting with the applications that generated the document.
The application layer provides content to the user by publishing documents, and processes the digital ink interactions submitted by the user. Typically, an application generates one or more interactive pages in response to user input, which are transmitted to the service layer to be stored, rendered, and finally printed as output to the user. The Netpage system allows sophisticated applications to be developed by providing services for document publishing, rendering, and delivery, authenticated transactions and secure payments, handwriting recognition, and user validation using biometric techniques such as signature verification.
There are some existing techniques for matching hand-drawn ink queries with handwritten text databases, hand-drawn sketches, and image databases, as mentioned below.
Chans et al. (Y. Chans, Z. Lei, D. Lopresti, and S. Kung, “A Feature Based Approach For Image Retrieval by Sketch”, Proceedings of SPIE Volume 3229: Multimedia Storage and Archiving Systems II, 1997), match hand-drawn sketches with image features based on “edge segments modeled by Implicit Polynomials (IP)”. A similarity computation is based on calculating the distances between pairs of feature sets (called curvlets) using an elastic matching procedure.
Lopresti and Tomkins (D. Lopresti and A. Tomkins, “Temporal-Domain Matching of Hand-Drawn Pictorial Queries”, Handwriting and Drawing Research: Basic and Applied Issues, IOS Press, pp. 387-401, 1996., and, D. Lopresti, A. Tomkins, and J. Zhou, “Algorithms for Matching Hand-Drawn Sketches”, Proceedings of the 5th International Workshop on Frontiers in Handwriting Recognition, pp. 223-238, 1995), describe a system for matching hand-drawn sketches against a database of sketches. Global features such as stroke length and angle traversed are extracted for each stroke in the database, from which a stroke codebook is created using vector quantization. Input sketches are matched against a database using a string block-editing algorithm that uses vector-quantized codes as primitives, see D. Lopresti and A. Tomkins, “Block Edit Models for Approximate String Matching”, Proceedings of the 2nd Annual South American Workshop on String Processing, pp. 11-26. A similar approach using dynamic programming for ink searching is described by Poon et al. in A. Poon, K. Weber, and T. Cass, “Scribbler: A Tool for Searching Digital Ink”, Proceedings of the ACM Computer-Human Interaction, pp. 58-64, 1994.
In D. Lopresti and A. Tomkins, “Pictographic Naming”, Proceedings of the INTERCHI '93 Conference, 1993, Lopresti and Tomkins describe an automatic index creation algorithm for handwritten notes. Ink strokes are grouped into words and re-sampled so each point is equidistant along the ink trajectory. A set of angular and curvature features are extracted for each stroke, and the feature vectors are clustered using hierarchical clustering. A Chi-squared statistic is used to select useful index terms.
D. Lopresti and A. Tomkins, “Pictographic Naming”, Proceedings of the INTERCHI 1993 Conference, 1993, discuss using Hidden Markov Models (HMMs) for matching pictograms, and describe a system of inexpensive discriminants that give a rough indication of similarity between ink drawings useful for database pruning. Also described is the use of a windowed dynamic-programming approach to allow a user to search a pictographically named file system.
Del Bimbo et al. (A. Del Bimbo, P. Pala, and S. Santini, “Image Retrieval by Elastic Matching of Shapes and Image Patterns”, Proceedings of IEEE Multimedia, pp. 215-218, 1996), describe an image retrieval algorithm that uses an elastic matching shape-similarity procedure. Schomaker et al. (L. Schomaker, L. Vuurpijl, and E. de Leau, “New Use for the Pen: Outline-Based Image Queries”, Proceedings of the 5th International Conference on Document Analysis and Recognition, pp. 293-296, 1999), present an image query technique based on hand-drawn image outlines. This algorithm uses a feature-set containing normalized point coordinates and running angles, together with an angular histogram. For recognition, a Euclidean-distance nearest-neighbor classifier is used. Muller et al. (S. Muller, S. Eickeler, and G. Rigoll, “Multimedia Database Retrieval Using Hand-Drawn Sketches”, 5th International Conference on Document Analysis and Recognition, Bangalore, India, September 1999), describe a multimedia database retrieval system that supports hand-drawn sketches of items using both shape and color. Database entries are represented as HMMs based on scale- and rotationally-invariant features, and database-pruning techniques are used to reduce search overhead.
Pavlidis et al. (I. Pavlidis, R. Singh, and N. Papanikolopoulos, “Recognition of On-Line Handwritten Patterns Through Shape Metamorphosis”, Proceedings of the 13th International Conference on Pattern Recognition, Vol. 3, pp 18-22, 1996), use shape metamorphosis (i.e. morphing) to match both online handwritten text and online hand-drawn line figures. The signal is segmented at areas of high and low curvature, and these segmentation points are used as features to perform shape metamorphosis between the input and target shapes. The final similarity score is based on the degree of morphing required to convert the input signal to the target.
Manmatha et al. (R. Manmatha, C. Han, E. Riseman, and W. Croft, “Indexing Handwriting Using Word Matching”, Proceedings of the First ACM International Conference on Digital Libraries, pp. 151-159, 1996), create text indices of handwritten documents by segmenting the text into words and performing similarity matching on the words by matching (using a bitmap exclusive-or) the word image with all other word images in the text. Groups of similar words are then formed into indices. Mahmood (see T. Mahmood, “Indexing of Handwritten Document Images”, Proceedings of the 1997 Workshop on Document Image Analysis, 1997) uses a technique called geometric hashing (see Y. Lamdan and H. Wolfson, “Geometric Hashing: A General and Efficient Model-Based Recognition Scheme”, Proceedings of the International Conference on Computer Vision, pp. 218-249, 1988) to index offline handwritten text using a feature representation that is invariant under affine transformation.
Kamel (see I. Kamel, “Fast Retrieval of Cursive Handwriting”, Proceedings of the 5th International Conference on Information and Knowledge Management, Rockville, Md. USA, Nov. 12-16, 1996) describes an approach to the fast indexing and retrieval of cursive handwriting. Strokes are segmented at “each local minimum in the x-y coordinates”, and converted into a feature vector based on geometric properties such as stroke length and angle traversed. The feature vectors are then mapped to a lower dimension using Karhunen-Loeve (i.e. Principal Component Analysis, see R. Duda, P. Hart, and D. Stork, Pattern Classification, Second Edition, John Wiley & Sons, Inc., pp. 115-117, 2001) transform, then indexed using an R-Tree (a multidimensional version of a B-Tree described in A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching”, Proceedings of the ACM SIGMOD, 1994). Searching uses a voting algorithm that matches each input stroke against the stroke index. In I. Kamel, D. Barbera, “Retrieving Electronic Ink by Content”, Proceedings of the 1996 International Workshop on Multi-Media Database Management Systems, 1996, this technique is expanded using a two-step indexing schema that includes a filtering step and a refinement step. The filtering step uses global features to locate a hyper-rectangle in the database that is then searched using a sequential algorithm to find the most similar matches.
Aref et al. (W. Aref, D. Barbera, P. Vallabhaneni, “The Handwritten Trie: Indexing Electronic Ink”, The 1995 ACM SIGMOD International Conference on Management of Data, San Jose, Calif., May 1995) use a combination of local and global features to train a set of HMMs that model letters in a handwritten trie. A beam search is used to traverse the trie, with the most promising nodes expanded at each point. In W, Aref, D. Barbera, D. Lopresti, and A. Tomkins, “Ink as a First-Class Datatype in Multimedia Databases”, Database System: Issues and Research Direction, pp. 113-163, 1996, describe an algorithm (called ScriptSearch) to search a continuous stream of text for a handwritten phrase. The approach does not perform word segmentation; rather, it uses dynamic programming to match against a vector-quantized sequence of stroke primitives. Also described is a technique for searching large ink databases using a tree-structured index based on HMMs.
Napper et al. (in a co-pending PCT application based on Australian Provisional Patent Application No. PR8243) describe a technique for searching digital ink databases using text-based queries. The procedure uses a handwriting model generated from a training database to map query text into a writer-dependent feature set, which is then used to perform a sequential similarity search on the database.
Lopresti et al. in D. Lopresti, Y. Ma, and J. Zhou, “Document Search and Retrieval System with Partial Match Searching of User-Drawn Annotations”, U.S. Pat. No. 5,832,474, disclose an automatic ink matching system. The specification describes the process of stroke segmentation, feature extraction, vector quantization, and a fuzzy matching technique using an edit-distance sequential search.
Bricklin et al. in D. Bricklin et al. “Graphic Indexing System”, U.S. Pat. No. 5,867,150, describe a method of creating handwritten note indices under the direction of a user. In this system, the user indicates regions of ink to be indexed using a lasso gesture, and subsequent searching is performed manually by a user browsing the ink index gallery.
Barbera et al. in D. Barbara, W. Aref, I. Kamel, and P. Vallabhaneni, “Method and Apparatus for Indexing a Plurality of Handwritten Objects”, U.S. Pat. No. 5,649,023, describe a B-Tree data structure used to index a set of left-to-right HMMs, with each HMM representing a handwritten object. In D. Barbara and I. Kamel, “Method and Apparatus for Similarity Matching of Handwritten Data Objects”, U.S. Pat. No. 5,710,916, they describe another indexing system that uses a set of global stroke features and an R-Tree for indexing. In D. Barbara and H. Korth, “Method and Apparatus for Storage and Retrieval of Handwritten Information”, U.S. Pat. No. 5,524,240, and, D. Barbara and W. Aref, “Method for Indexing and Searching Handwritten Documents in a Database”, U.S. Pat. No. 5,553,284, they describe a number of less sophisticated HMM-based indexing methods. In W. Aref and D. Barbara, “Trie Structure Based Method and Apparatus for Indexing and Searching Handwritten Databases with Dynamic Search Sequencing”, U.S. Pat. No. 5,768,423, disclosure is made of a combined HMM and trie-structure searching technique (see W. Aref, D. Barbera, P. Vallabhaneni, “The Handwritten Trie: Indexing Electronic Ink”, The 1995 ACM SIGMOD International Conference on Management of Data, San Jose, Calif., May 1995).
Mahmood in T. Mahmood, “Method of Indexing Words in Handwritten Document Images Using Image Hash Tables”, U.S. Pat. No. 5,953,451, discloses a method of indexing handwritten documents using geometric hashing (see T. Mahmood, “Indexing of Handwritten Document Images”, Proceedings of the 1997 Workshop on Document Image Analysis, 1997).
Hull et al. in R. Hull, D. Reynolds, and D. Gupter, “Scribble Matching”, U.S. Pat. No. 6,018,591, describe a technique for scribble matching that uses velocity minima for stroke segmentation. Three matching algorithms are defined; an elastic matcher, a matcher based on shape information (called a syntactic matcher), and a matcher based on height encoding using reference-line zoning (called a word matcher).
Poon et al. in A. Poon, K. Weber, and T. Cass, “Searching and Matching Unrecognized Handwriting”, U.S. Pat. No. 5,687,254, describe a method for searching and matching gesture-based handwriting using dynamic time warping. This technique is used for creating indices of handwritten documents and for performing “find and replace” functions on handwritten text.
This highlights a need for an electronic filing system using pen-based computing that allows users to store or index data in the form of notes or annotations, etc., and subsequently search this data based on handwritten (i.e. hand-drawn) queries.