The present invention relates generally to computer devices, and more particularly computer devices arranged to receive handwritten input.
Contemporary computing devices allow users to enter handwritten words (e.g., in cursive handwriting and/or printed handwritten characters) and symbols (e.g., a character in Far East languages). The words and symbols can be used as is, e.g., to function as readable notes and so forth, or can be converted to text for more conventional computer uses. To convert to text, for example, as a user writes strokes representing words or other symbols onto a touch-sensitive computer screen or the like, a handwriting recognizer (e.g., trained with millions of samples, employing a dictionary, context and other rules) is able to convert the handwriting data into dictionary words or symbols. In this manner, users are able to enter textual data without necessarily needing a keyboard.
When dealing with typewritten input entered into a word processing program, it is relatively straightforward to implement a xe2x80x9cfindxe2x80x9d or xe2x80x9csearchxe2x80x9d feature as part of the program. With text, a user types in a search string and possibly enters some properties of the string, (e.g., bold typeface), and the program searches for a string in a document that exactly matches the word and any specified properties. Such a search is straightforward because typewritten input entered into a word processing program is defined by a limited set of codes, e.g., ASCII numeric values represent alphanumeric characters, and there is a limited set of properties a string can have. In general, the word processing program simply advances through the document attempting to match the full set of entered codes of the search string with a string of codes in a document in order to find an exact (allowing for any wildcards) match.
However, when entering handwritten ink, e.g., via an electronic ink processing program, it is virtually impossible for a user to write a word exactly the same way twice. Thus, searching is not possible via the simple xe2x80x9cexact-string-match-or-notxe2x80x9d operation. One attempted search method featurizes the electronic ink (e.g., handwritten data in the form of coordinates and other information) entered by a user, and searches through the document to find another piece of ink with similar features. This method is not very reliable, as for example, the same user can write two sets of ink, each of which is intended to be the same word, but that significantly vary from each other""s features from the computer""s perspective. A second method uses simple string comparison, using the translated text word that appears for any handwritten input. This second method is also relatively unreliable, because such a search depends on a recognizer making a correct translation for each translated word, despite the reality that recognizers are not one hundred percent accurate.
Briefly, the present invention provides a system and method for finding matches for recognized handwritten words, by comparing a given search word (a typed-in character set or handwritten word that has been recognized) against the words in a document, including recognized words and any possible alternates for those recognized words as returned by a recognizer. For handwritten (ink) words, one implementation may look for an exact match between an entered search word (and possibly alternates of the search word) and the recognized words and their alternates stored in a handwritten document. To this end, the recognized word and each alternate associated therewith are examined against an entered search word and possibly its alternates.
Numerous other variations are possible because of the use of alternates, which also may be returned with a probability ranking. For example, rather than a strict exact match test on the alternates, a scheme that looks for a percentage of matching characters can be implemented, with the user optionally adjusting the percentage, e.g., from loose to exact. Other variations include the weighting of certain characters, (e.g., the first character has to exactly match, with only a percentage of others needed), and/or factoring in the number of syllables. Since alternates are returned with a probability, the probabilities of alternates may be used, e.g., a looser match is adequate on a highly probable word, while an exact match is required on a less probable word. Other variations include length of word weighting, Bayesian combination of probabilities to determine weighting, alternate to alternate exact match, percentage of alternate to alternate matches, the percentage of the percentages and so on, and the use of word/alternate matching in conjunction with ink/feature/bitmap/image matching. Various combinations of these variations are also feasible.
Other advantages will become apparent from the following detailed description when taken in conjunction with the drawings, in which: