Advanced input methods such as handwriting, speech, and Input Method Editors (IMEs) may use generic input samples for training their recognizers. For instance, a handwriting recognizer may be trained by collecting writing samples donated by a randomly selected group. Such a method of training a recognizer has the advantage that the recognizer is tuned to work for as many people as possible. The more samples collected from different people used to train the recognizer, the more robust the recognizer may become for general use.
However, such a method for training a recognizer has several disadvantages. Because the recognizer is trained by generic writing samples, an individual user may experience recognition errors, especially when the user writes a character that looks like another character obtained from the generic samples. For example, a user may write the letter “u” as the letter “n” from the generic samples. Unless the user is able to modify their handwriting to adapt to the recognizer trained by generic samples, the user may not have the ability to fix such recognition errors. The user may only be provided the ability to correct such an error by correcting the text translation of an individual word containing the error, such as correcting the text translation from “yon” by “you”. But this may not fix the underlying recognition error of misrecognizing a particular input character. Correcting each misrecognized word may be painful for the user especially when the same errors are repeated and the recognizer doesn't appear to learn the nuances of the shape in an individual's writing from these corrections.
Advanced input methods may also provide a generic language model used to train their recognizers. This language model may not use the vocabulary unique to the user. Moreover, some key sources of a user's vocabulary such as emails, documents, and URLs authored by the user may not be represented in the generic language model. For example, email addresses do not conform to the language rules or vocabulary of a specific language. English language rules which require, for example, a space between words do not apply to an email address. Similarly, a Uniform Resource Locator (URL) does not conform to the language rules or vocabulary of a specific language. As a result, a generic language model is limited in its ability to accurately recognize these types of input. A user consequently may have an unsatisfactory experience when using a generic language model that results in poor recognition accuracy for these types of input.
What is needed is a way for advanced input methods to be made aware of how an individual user writes and what an individual user writes so that higher accuracy in recognition of input may be achieved. Additionally, such a system should support dynamic adaptation of a recognizer as a user writes to the system and authors text.