One of the most challenging problems in information technology today is improving the methods through which people interact with computer systems. This problem is made difficult because, among other reasons, humans typically rely overwhelmingly on natural language, e.g., English, for transfer of information. Although people deal with natural language with relative ease, processing of natural language using computer systems has turned out to be an exceedingly difficult problem. One source of this difficulty is the ambiguous meaning of many natural language terms.
There are several sources of ambiguity in language; however, the most pervasive is ambiguity in term meaning. This is referred to as polysemy. In general, all human languages exhibit a high degree of polysemy.
In English, most frequently used terms have several common meanings. For example, the word fire can mean: a combustion activity; to terminate employment; to launch; or to excite (as in fire up). For the 200 most-polysemous terms in English, the typical verb has more than 12 common meanings, or senses. The typical noun from this set has more than eight common senses. For the 2000 most-polysemous terms in English, the typical verb has more than eight common senses and the typical noun has more than five.
Polysemy presents a major obstacle for all computer systems that attempt to deal with human language. This is true for both written and spoken language. In order to achieve commercially acceptable performance, some means must be found to reliably discern the presence of a word sense, reliably distinguish between different senses of the same term, and reliably determine the correct sense for the terms that are encountered. Such discerning, distinguishing, or determining of the sense of terms in written or spoken language is referred to, generally, as word sense disambiguation (WSD).
The effects of polysemy in computer processing of language are ubiquitous. The March 1998 issue of the journal Computational Linguistics is a special issue on word sense disambiguation. In their introduction to the issue, Ide and Veronis state: “Sense disambiguation . . . is necessary at one level or another to accomplish most natural language processing tasks. It is obviously essential for language understanding applications, such as man-machine communication; it is at least helpful, and in some instances required, for applications whose aim is not language understanding.” Ide, N., and Veronis, J., Introduction to the Special Issue on Word Sense Disambiguation: the State of the Art, Special Issue on Word Sense Disambiguation, Computational Linguistics, Volume 24, #1, March, 1998, pp 1–40.
The importance of word sense disambiguation can be seen in the case of machine translation systems. An incorrect choice of sense for even a single term can change the meaning of a passage. Such misinterpretation can yield incorrect results. A well-known example is the (perhaps apocryphal) instance of a machine translation system that reportedly mistranslated hydraulic rams as water goats. Although there are other sources of ambiguity, polysemy presents a major problem in machine translation today.
Polysemy also is a critical problem in the area of information retrieval. For example, a query to a database on the word strike in order to find information on labor disputes would also be inundated with information on the following:                sports (strikes in bowling and baseball);        military actions (air strikes, etc.);        objects that strike one another;        striking of matches and oil;        cases where people strike up a conversation;        events that strike people as funny; and        a range of other topics in which this particularly polysemous term occurs.Polysemy creates difficulties in all aspects of text filtering and retrieval, including query interpretation, categorization, summarization, result matching, and ranking.        
Polysemy limits the performance of systems in other application areas, including but not limited to:                speech recognition;        text-to-speech conversion;        content and thematic analysis;        grammatical analysis;        text categorization;        text summarization; and        natural language understanding.        
In each of these applications, differentiating among multiple word senses is a key to improving performance. In some applications, it is sufficient to be able to identify which occurrences of a given term have the same sense. That is, it is sufficient to be able to associate each occurrence of a given term with the collection of occurrences of that term that has the same sense. This is referred to as sense discrimination. In other applications, each such collection is labeled with the specific meaning of the term that corresponds to that cluster. This is referred to as sense tagging. The present invention addresses each of these objectives.
The importance of word sense disambiguation was noted in one of the first papers ever published on computer processing of language. Weaver, W., translation, mimeographed, 12 pp., Jul. 15, 1949. Reprinted in Locke, W, and Booth, A. D., (Eds.), Machine Translation of Languages, John Wiley & Sons, New York, 1955, pp. 15–23. Since that time, authorities in the area of natural language understanding and related fields have consistently cited both the importance of the topic and the significant limitations of available techniques. In the published version of his doctoral dissertation (1989), Cottrell noted: “Lexical ambiguity . . . is perhaps the most important problem facing an NLU (Natural Language Understanding) system.” Cotrell, G. W., A Connectionist Approach to Word Sense Disambiguation, Morgan Kaufman Publishers, 1989. In their 1997 SIGLEX presentation, Resnik and Yarowsky declared: “Word sense disambiguation is perhaps the great open problem at the lexical level of natural language processing.” Resnick, P., and Yarowski, D., A Perspective on Word Sense Disambiguation Methods and their Evaluation, in: Proceedings of SIGLEX '97, Washington, D.C., 1997, pp. 79–86. In his 1997 paper, Hwee similarly noted: “WSD is a challenging task and much improvement is still needed.” Hwee, T. N., Getting Serious about Word Sense Disambiguation, in: Tagging Text with Lexical Semantics: Why, What, and How?, workshop, Apr. 4–5, 1997, Washington, D.C. In their introduction to the March, 1998 special issue of the journal Computational Linguistics devoted to the topic of word sense disambiguation, Ide and Veronis state: “the problem of word sense disambiguation has taken center stage, and it is frequently cited as one of the most important problems in natural language processing research today”. In the summary to their introduction, they state: “in the broad sense, relatively little progress seems to have been made in nearly 50 years”. In the conclusion to a chapter on word sense disambiguation in their landmark 1999 book on natural language processing, Manning and Schutze note: “Much research remains to be done on word sense disambiguation”. Manning, C., and Schutze, H., Foundations of Statistical Natural Language Processing, MIT Press, 1999.
Some approaches to word sense disambiguation rely upon the existence of dictionaries or thesauri that contain the terms to be disambiguated. In these methods, the sense of a given term is estimated based upon the terms in a window surrounding the given term. The sense, or meaning, is chosen based upon the correlation of these surrounding terms with the terms that happen to be used in the various dictionary definitions or thesaurus entries. The accuracy of these approaches has been poor, in the 50% to 70% range, even for terms with small numbers of distinctly differentiated senses. Accuracy can be improved in those specific cases where a term to be disambiguated appears more than once in a text segment of interest. This is done through employing an assumption that all such occurrences correspond to the same sense of that term (the “one sense per discourse” assumption.) The applicability of this condition is limited, however, by the fact that term occurrences follow a hyperbolic distribution (Zipf's Law). That is, on average, more than half of all terms that appear in a given passage of text of reasonable length will occur only once in that passage.
These approaches to disambiguation are fundamentally limited by the vocabulary coverage of the dictionaries and thesauri employed. The approaches exhibit significantly degraded performance in domains that employ specialized terminology. The static nature of the dictionaries and thesauri used also make them inherently unsuited for dealing with changes in language over time.
Approaches have been proposed for automatic (statistical) extension of thesauri. Thesaurus-based techniques, however, fundamentally are unable to deal with word senses that are not strongly correlated with the specific categories used in the thesaurus.
Other approaches to word sense disambiguation exploit the fact that multiple senses of a term in a given language may translate as distinct terms in another language. These approaches require a large, carefully translated parallel corpus together with a bilingual dictionary for the language pair in question. Generation of such corpora is resource-intensive and is inherently constrained in terms of domain. For a given term to be disambiguated, the technique requires that the specific phrase in which that term occurs also be present in the parallel corpora. Thus, either very large corpora are required or many term occurrences will not be amenable to disambiguation. In addition, there is rarely a perfect one-to-one mapping between word sense in one language and distinct terms in another language. Thus, there are many senses that cannot be disambiguated even if the exact phrase is found. Other senses will yield only the most probable sense among many. Often these probabilities will be less than 50%.
Prior statistical approaches to word sense disambiguation typically have required the existence of a training set in which the terms of interest are labeled as to sense. The cost of creating such training sets has severely limited the application of these approaches. Such approaches are very limited in domain and are not well suited for dealing with the changes in language over time. In addition, these approaches invariably employ a fixed context window (typically plus or minus a few terms), which significantly compromises performance. Some approaches to statistical disambiguation have employed unsupervised techniques. The advantage of this approach is that the training set does not need to be labeled as to sense. These approaches are only applicable to sense discrimination. The effectiveness of the technique is highly dependent upon the correlation between the terms and their usage in the training set and the term distribution and usage in the text to be disambiguated. The technique also is weak for infrequently used senses and senses that have few distinctive collocated terms. Performance of these approaches has been reported as 5% to 10% lower than that of dictionary-based approaches.
One class of approaches to word sense disambiguation relies on the parsing of sentences. Accurate parsing for this purpose depends upon the prior existence of a world model. These approaches suffer from two drawbacks: first, they require a high degree of manual effort in constructing the necessary world models; and second, tractable world models are inherently limited in the extent of the domain that they can cover.
Other approaches to WSD rely on analysis of sentences based on sets of processing rules. In order to obtain even modest accuracy these collections of rules become elaborate, involving templates such as Agent-Action-Object triples. An Agent-Action-Object triple is a set of three terms that can stand in the relationship of an agent that carries out the specified action on the specified object. Once again, these approaches require large amounts of initial manual work and are inherently limited in the scope of their applicability. Some such systems attempt to achieve reasonable coverage and accuracy through use of word experts—handcrafted rule-based expert systems for each term to be encountered. Information is exchanged among these word experts on an iterative basis until a plausible estimate of sentence meaning is generated. The complexity of such systems is indicative of the lengths to which people have been willing to go in order to obtain useful levels of word sense disambiguation. The individual word experts in such systems can constitute pages of code. Even with such elaboration, however, the rapidly increasing combinatorics significantly limits the extent of the context that can be taken into account.
Connectionist approaches to word sense disambiguation suffer many of the same drawbacks as the rule-based approaches. Such approached typically require a large amount of manual effort required in establishing the networks. As the extent of the context that is taken into account is increased, the complexity of the networks grows rapidly. Thus, both domain of applicability and extent of context are constrained.
A large-scale evaluation of existing word sense disambiguation systems has been conducted under the auspices of ACL-SIGLEX, EURALEX, ELSNET, and the EU projects ECRAN and SPARKLE. This project entailed testing of 19 different approaches to the problem. University of Brighton (England), Information Technology Research Institute. SENSEVAL, Evaluating Word Sense Disambiguation Systems [online]. 2001 [retrieved on 2001-08-10]. Retrieved from the Internet: <URL: http://www.itri.bton.ac.uk/events/senseval/>.