In natural language processing (NLP) applications, it is often considered critical that languages of an input, such as an input provided to an electronic device, be accurately identified. Proper language identification, for instance, helps employ various NLP features, such as auto-correction, auto-completion, and word prediction.
Many approaches for language identification exist, but are often not practical for particular implementations. Lexical approaches, for example, are not practical on devices, such as mobile devices, having limited storage and/or computational ability. As another example, syntactic approaches require large amounts of evidence and accordingly are typically restricted to use only for long documents. As yet another example, generative statistical approaches suffer from conditional independence assumptions implicit in the Markov strategy.