Supervised part-of-speech (“POS”) taggers are available for more than twenty languages and achieve accuracies of around 95% on in-domain data. Supervised taggers are routinely employed in many natural language processing (“NLP”) applications, such as syntactic and semantic parsing, named-entity recognition, and machine translation. Unfortunately, the resources required to train supervised taggers are expensive to create and unlikely to exist for the majority of written languages. The necessity of building NLP tools for these resource-poor languages has been part of the motivation for research on unsupervised learning of POS taggers.
Recently, learning POS taggers with type-level tag dictionary constraints have gained popularity. Tag dictionaries, noisily projected via word-aligned bitext have bridged the gap between purely unsupervised and fully supervised taggers, resulting in an average accuracy of over 83% on a benchmark of eight Indo-European languages. A further improvement employs a tag dictionary source, resulting in the hitherto best published result of almost 85% on the same setup.