1. Field of Invention
The present invention relates generally to the field of natural language. More specifically, the present invention is related to word sense ambiguity reduction based on automatic theme prediction.
2. Discussion of Prior Art
Word sense disambiguation is the process of selecting the correct sense of each word in a sentence, based on the word's usage (or context) in the sentence. For example, the sense of the word “bank” as a noun in the English language is either “a building for keeping money safely” or “a land along the side of a river”, based on the context in which the word “bank” occurs. The accurate recognition of this distinction is particularly important in machine translation systems, because “bank” as a noun is translated differently depending on whether it meant the first sense or the second one.
FIG. 1 illustrates the various natural language analysis systems. A natural language analysis system 100 is conventionally composed of two types of processes: processes which present possible alternatives (ambiguities) 102 to words; particularly nouns, in a sentence and processes which select correct alternatives (disambiguation) 104 to words based on the context of the sentence which is subject to analysis.
FIG. 2 illustrates the various types of ambiguities associated with prior art natural language analysis systems. Ambiguities in natural language analysis come in three basic forms:                Morphological ambiguity 202 occurs when a word has more than one part-of-speech. For example, the word “play” can be used as a verb or noun.        Semantic ambiguity 204 occurs when a word/part-of-speech pair has more than one sense (meaning). For example, the word “bank” when used as a noun can have two different senses as described above.        Syntactic (structural) ambiguity 206 occurs when a sentence (or a group of words) has more than one syntactic structure. For example, in the phrase, “a French book writer”, the term “French” may be an adjective modifying the word “book” or the word “writer”.        
FIG. 3 illustrates a prior art system 300 for natural language sentence analysis. The input to the system is a natural language sentence 302, which is first segmented into separate word tokens using a tokenizer 304. Each word token is then morphologically analyzed by a morphological analyzer (stemmer/lemmatizer) 306, which in turn identifies all valid parts of speech for each input word, according to predefined stemming rules and based on lexicon 312 of the language (which contains for each stem all possible parts of speech). It should be noted that ‘stem’, as described in this patent application, is the basic form of any word token (e.g., the stem of “went” is “go”). The sentence, consisting of morphologically ambiguous part-of-speech tagged word tokens, then passes through a part-of-speech preliminary ambiguity resolver 308, that disambiguates parts of speech in a quasi-deterministic fashion. Many conventional rule-based and statistical techniques are used to achieve this process. The part-of-speech tagged word tokens then pass through a lexicalizer 310, which assigns each word/part-of-speech pair, one or more senses by accessing the language lexicon 312. The sentence generated from lexicalizer 310, which is now fully part-of-speech tagged and sense tagged is presented to syntactic & semantic analyzer 314, which resolves all embedded ambiguities in the input sentence by accessing a source with knowledge of grammar and word sense disambiguation and, as a result, generates a sentence with no ambiguities on morphological, semantic and syntactic levels.
The main function of syntactic & semantic analyzer 314 is to disambiguate the input sentence, that is, to select those correct possibilities out of the multitude of presented possibilities (ambiguities). Minimizing such ambiguities would further enhance the accuracy and performance of the disambiguation process. Hence, there is a need for a method and system that reduces the semantic ambiguity presented to the syntactic & semantic analyzer. Whatever the precise merits, features and advantages of the above mentioned prior art systems, none of them achieve or fulfills the purposes of the present invention.