Pen based computers and personal digital assistants (PDAs) offer a more natural means of input and are becoming increasingly common for capturing handwriting, annotation, sketching, and other types of free form input. The growing number of ink documents, both on personal computers and the internet, has created a need for efficient indexing and retrieval of ink documents. Unfortunately, traditional search and retrieval systems lack such capabilities or are too slow and cumbersome for practical anytime use particularly when searching through more than a few hundred ink documents.
Searching in handwritten cursive text is a challenging problem. The same word written by different persons can look very different. Further, multiple instances of the same word written by the same person are similar but not identical. In general, instance variations (for the same person) are smaller than inter-person variations (for the same word), which in turn are smaller than inter-word variations. Thus, to be effective, successful electronic ink retrieval strategies must be robust to such handwriting variations.
Current approaches for ink retrieval fall into three categories: a) recognition based, b) template matching based, or c) shape matching based. Recognition based retrieval methods combine the output of a handwriting recognizer with approximate string matching algorithms to retrieve similar ink. They are quite robust to handwriting styles and generalize well over both print and cursive writing. However, they have high computational requirements as the recognizer needs to be run once on every entry in the data store during indexing. Moreover, during the recognition process much information about the particular shape of the letters (allographs), writing style, etc are lost. Good retrieval rates require that the complete recognition lattice be stored and used during matching which adds significantly to the memory overhead. Furthermore, since handwriting recognition is heavily dependant on the associated lexicon, recognition based retrieval approaches are inherently limited by their accuracy and the applicability of the lexicon used. With a good lexicon, state of the art handwriting recognizers achieve 80-90% word recognition accuracy. However, as soon as the lexicon is removed, word accuracy drops down to 60-70%. In addition, such recognition based retrieval techniques often break down for arbitrary ink input such as pen gestures and line graphics such as flow charts, hand drawn maps, etc.
Unlike other forms of shape data, electronic ink has time information that can be effectively used in template matching algorithms. Dynamic time warp (DTW) is the most prevalent matching technique for producing reliable similarity scores between handwritten words. DTW is very accurate, but is an O(n2) algorithm, where n is the number of points. As a result, it is computationally prohibitive for use in retrieval applications involving large databases of words. DTW based template matching techniques are well suited for searching through all words in a single page or document and are commonly used for building find-and-replace type of features supported by modern ink capable document editors and word processors.
Shape matching algorithms decompose the input shape into a bag of shape features. Two shapes are compared for similarity based on the minimum cost of matching features from one shape to the features of the other. The lower the matching cost the better the match (with zero as a lower bound). Unlike DTW matching, the shape features typically discard all time information. Unfortunately, as in the case of DTW, computing the optimal matching for a single shape comparison has a complexity that is super-polynomial in the number of features. Thus, shape matching algorithms have been effective over small database sizes but impractical over anything larger than a few hundred words.
All of the above approaches for ink retrieval rely on a linear scan through the database for each query which tends to be slow. Sequential evaluation combined with early termination is commonly employed while computing match scores to avoid long query times.