Part-of-speech (“POS”) tagging is used in many natural language processing (“NLP”) tasks. As POS tags augment the information contained within words by indicating some of the structure inherent in language, their accuracy is often critical to NLP applications. In text-to-speech (TTS) synthesis POS information is often relied upon to determine how to pronounce a word properly. A word may be pronounced differently depending on a part of speech and/or a tense. For example, a word “read” may be pronounced differently depending on a tense. A word “advocate” may be pronounced differently depending on whether the word “advocate” is a noun or verb.
POS tags may help to decide whether the synthesized word should be accented or not. For example, a noun may be accented more than a verb. Accordingly, POS tags may greatly influence how natural synthetic speech sounds. Typically, a POS tag is assigned to a word based on the local information contained in a text. For example, to assign a POS tag to a word in the text, adjacent words are typically considered.
Conceptually, the POS tags may be assigned to words in a text according to predetermined rules. For example, if a determiner, such as “the” or “a”, precedes a word in the text, than the word may be assigned an adjective or a noun tag. In another example, if word “to” precedes a word in the text, than the word may be assigned a verb tag.
In the past, numerous rules were manually generated for the POS tagging. An answer to one rule, however, may conflict with the answer to another rule. Accordingly, the POS tagging may strongly depend on how the rules are ordered. Accordingly, the accuracy of the POS tagging by rules may be poor.
Current methods of POS tagging involve sophisticated statistical models, such as maximum entropy Markov models (“MEMMs”) and conditional random fields (“CRFs”). Both types of modeling rely on a set of feature functions to ensure that important characteristics of the empirical training distribution are reflected in the trained model. These types of modeling, however, may suffer directly or indirectly from the so-called “label bias problem”, whereby certain characteristics are unduly favored over other characteristics.
Hence, the tagging accuracy of both MEMMs and CRFs may depend on how many feature functions are selected and how relevant they are to the task at hand. Such selection may require application-specific linguistic knowledge, complicating deployment across different applications. Moreover, it is basically impossible to specify a set of feature functions that will work well in every environment. For example, a set of feature functions that is selected for the POS tagging of the text from the Wall Street Journal may not be appropriate for the POS tagging of the text from the Word Book Encyclopedia, or from a web blog. Typically, the accuracy of both MEMMs and CRFs may increase as the number of feature functions increases. Increasing the number of feature functions to assign POS tags to words in the text dramatically increases the processing time and/or work load on the processing resources and may be very expensive.