Several techniques have been developed for part-of-speech (POS) tagging. The function of a POS tagger is to associate each word in a sequence of words with a POS category, tag or label. As many words can have multiple parts of speech, the POS tagger must be able to determine the POS category of a word based on the context of the word in the text.
In addition, certain words in the text are ambiguous since they can be used as nouns, verbs, adjectives and adverbs. In such a case, state of the art POS taggers may not be able to disambiguate the text/phrase and will provide inaccurate results. Therefore, there is a need for an improved technique to determine the POS category of a word in a sequence of words.
Generally, POS tagging is more complex in the case of incomplete or incorrect sentences. In real-world applications, documents often contain text composed of incomplete sentences, for example, titles, lists of items, subheadings, and the like. In such cases, POS taggers often incorrectly determine and tag the POS category.
Further, the existing POS taggers use statistical methods due to which the results (the POS tag category assigned to the given word in question) are not tractable at the word level. Thus, this makes the existing POS taggers less usable in generalized contexts. In addition, the existing POS taggers are not extensible without computer programming or without rebuilding the underlying statistical models, which further restricts the usefulness of such POS taggers.