Named entity recognition (NER) is a step in document understanding in many natural language processing (NLP) applications. Contextual, lexical, morphological, syntactic (e.g., part-of-speech (POS) tagging), and semantic (e.g., semantic-role labelling) pre-processing have all proven useful, when performing NER.
However, such pre-processing tends to be language-dependent and difficult to extend to new languages, since it requires (1) gazetteers or (2) large training data sets and sophisticated methods (e.g., clustering techniques such as brown clusters) to learn models to extract named entities automatically (e.g., using dependency trees). And for many languages, off-the-shelf (OTS) software to perform this pre-processing is not available.
Consequently, scalable multilingual NER remains an active area of research and experimentation.