Electronic devices and the ways in which users interact with them are evolving rapidly. Changes in size, shape, input mechanisms, feedback mechanisms, functionality, and the like have introduced new challenges and opportunities relating to how a user enters information, such as text. Statistical language modeling can play a central role in many text prediction and recognition problems, such as speech or handwriting recognition and keyboard input prediction. An effective language model can be critical to constrain the underlying pattern analysis, guide the search through various (partial) text hypotheses, and/or contribute to the determination of the final outcome. In some examples, statistical language modeling has been used to convey the probability of occurrence in the language of all possible strings of n words.
Given a vocabulary of interest for the expected domain of use, determining the probability of occurrence of all possible strings of n words has been done using a word n-gram model, which can be trained to provide the probability of the current word given the n−1 previous words. Training has typically involved large machine-readable text databases, comprising representative documents in the expected domain. It can, however, be impractical to enumerate the entire contents of a word n-gram model or dictionary after every keystroke. In addition, n-gram models can fail to account for character-by-character changes as a user enters new information, and can thus fail to provide reliable results in some circumstances.
Unigram language models have similarly been used in text prediction and like applications. Unigram language models can produce a probability of a word in a target language. In some examples, unigram language models can accept a prefix of a word (e.g., a character) and produce candidate words beginning with that prefix along with probabilities associated with the candidate words. Unigram language models, however, can fail to account for previously entered words or other context, and can thus fail to provide reliable results in some circumstances.
Accordingly, using either a word n-gram model or a unigram model for text prediction can limit overall prediction accuracy and reliability.