Users of electronic devices around the world enter text in various languages. A wide variety of language recognition systems are designed to enable users to enter text on such devices via one or more modes of input such as keyboard text entry, speech, and/or handwriting. Such language recognition systems often provide predictive features that suggest word completions, corrections, and/or possible next words in supported languages.
Language recognition systems typically rely on one or more language models that contain various information to help the language recognition system recognize or produce particular languages. Such information is typically based on statistical linguistic analysis of an extensive corpus of text in a particular language. It may include, for example, lists of individual words (unigrams) and their relative frequencies of use in the language, as well as the frequencies of word pairs (bigrams), triplets (trigrams), and higher-order n-grams in the language. For example, a language model for English that includes bigrams would indicate a high likelihood that the word “degrees” will be followed by “Fahrenheit” and a low likelihood that it will be followed by “Chanukah”. In general, language recognition systems rely upon such language models—one or more for each supported language—to supply a lexicon of textual objects that can be generated by the system based on the input actions performed by the user and to map input actions performed by the user to one or more of the textual objects in the lexicon. Language models thus enable language recognition systems to perform next word prediction for user text entry.
Once a language model has been developed for a language and provided to users, language recognition systems typically allow users to build on or train their local language models to recognize additional words in that language and to remove undesired words according to their individual vocabulary use. The language recognition system may thus improve on its predictive ability for a particular user.