Part-of-speech disambiguation is the process of assigning the correct part of speech to each word in a sentence, based on the word's usage in the sentence. For example, the part of speech of the English word "record" may be either noun or verb, depending on the context in which the word is used; in the sentence "John wants to record a record", the first occurrence of "record" is used as a verb and the second is used as a noun. The accurate recognition of this distinction is particularly important in a text-to-speech system, because "record" is pronounced differently depending on whether it is a noun or verb.
As shown in FIG. 1, numeral 100, to disambiguate the parts-of- speech of words in a text, part-of-speech disambiguation systems typically use the following three-step process. Step 1 is the tokenization step, in which a text stream (101) is tokenized into a sequence of text tokens (104) by a text tokenizer (102) as specified by a tokenization knowledge database (103). The tokenization knowledge database typically contains predetermined rules that are used to identify textual elements, which are classifiable by part of speech. Examples of such textual elements are words, punctuation marks, and special symbols such as "%" and "$". Step 2 is the lexicon access step, in which each text token is looked up in a lexicon (106) by a lexicon accessor (105). The lexicon consists of a static lexicon (107) that contains a plurality of textual elements and corresponding part-of-speech tags, and a dynamic lexicon (108) that can generate part-of-speech tags for the textual elements that are not stored in the static lexicon. Because some textual elements (e.g., the word "record") have more than one part of speech, the lexicon access step will result in at least one part-of-speech tag being assigned to each text token; the output of the lexicon access step is therefore a sequence of ambiguously tagged text tokens (109). Step 3 is the disambiguation step, in which all part-of-speech ambiguities in the sequence of ambiguously tagged text tokens are resolved by the disambiguator (110) as specified by the disambiguation knowledge database (111), thus resulting in a sequence of unambiguously tagged text tokens (112).
An example of the application of the above process is presented in FIG. 2, numeral 200. A text stream (201) is input into the tokenization step, which yields a sequence of untagged text tokens (202) as its output. The sequence of untagged text tokens is input into the lexicon access step, which yields a sequence of ambiguously tagged text tokens as its output. As may be seen in FIG. 2, several text tokens have more than one tag associated with them; for example, "wants" is an ambiguously tagged text token (204), because it may be used as either a plural noun (tag "NNS") or a third-person, present tense verb (tag "VBZ"). The set of all possible tag sequences based on the sequence of ambiguously tagged text tokens is represented by a directed acyclic graph of tag sequences (203). The sequence of ambiguously tagged text tokens is input into the disambiguation step, which determines a best path (205) through the directed acyclic graph of tag sequences, thus yielding a sequence of unambiguously tagged text tokens (206).
It is known in the art that local context is a strong indicator of a word's part of speech; hence stochastic systems based on the statistical modeling of word and tag collocations have proven successful. However, these systems fail predictably for syntactic structures that involve non-local dependencies. Because non-local dependencies are beyond the limits of stochastic systems, such effects must be accounted for by systems that can process expanded context. Two problems to be considered in developing such systems are: identifying and placing appropriate limits on the amount of expanded context to be processed, and balancing the contribution of the evidence provided by local and expanded context processing.
Hence, there is a need for a method, device and system for part-of-speech disambiguation that advantageously combines the processing of both local and expanded context.