The present application relates generally to word predictions and, more specifically, to improving the accuracy of word predictions for highly inflected languages.
Electronic devices and the ways that users interact with them are evolving rapidly. Changes in size, shape, input mechanisms, feedback mechanisms, functionality, and the like have introduced new challenges and opportunities relating to how a user enters information, such as text. Statistical language modeling may play a central role in input prediction and/or recognition, such as keyboard input prediction and speech (or handwriting) recognition. Effective language modeling may thus play a critical role in the overall quality of an electronic device as perceived by the user.
However, to achieve acceptable levels of coverage and robustness, language models may require extensive training on very large text databases. As a result, it may be burdensome or impractical to gather and/or store sufficiently large amounts of training data for use in effectively training the language models. Relatedly, due to the finite size of such databases, many occurrences of word strings may be seen infrequently, yielding unreliable prediction results for all but the smallest word strings.
Further, the sizes of resulting language models may exceed what can reasonably be deployed onto portable electronic devices. Though it may be possible to prune training data sets and/or language models to an acceptable size, pruned models may have reduced predictive power and accuracy. Additionally, grammatically incorrect predictions are particularly problematic, as poor predictions often may be more distracting than the lack of a prediction.