A dictionary is a resource that lists, defines, and gives usage examples of words and other terms. For example, a conventional dictionary might contain the following entries:
TABLE 1 ______________________________________ flow, sense 100 (verb, intransitive): to run smoothly with unbroken continuity -- "honey flows slowly" run, sense 100 (verb, intransitive): to stride quickly run, sense 115 (verb, intransitive): (of liquids, sand, etc.) to flow run, sense 316 (noun): a movement or flow run, sense 329 (verb, transitive, slang): to control -- "the supervisor runs the flow of assignments" run, sense 331 (verb, intransitive): (of computer program) execute -- "analyze the efficiency of the flow of the program when the program runs" ______________________________________
The entries above include an entry for one sense of flow and entries for each of five different senses of run. Each entry identifies a word, a sense of the word, a part of speech, a definition, and, in some cases, a usage example. For example, the first entry above is for an intransitive verb sense of the word flow, sense 100. ("Transitive" characterizes a verb that takes an object, while "intransitive" characterizes a verb that does not take an object.) This sense of flow has the definition "to run smoothly with unbroken continuity," and the usage example "honey flows slowly."
Many languages contain polysemous words--that is, words that have multiple senses or meanings. Because different definitions and usage examples are appropriate for different senses of the same polysemous word, many dictionaries take care to subdivide polysemous words into their senses and provide separate entries, including separate definitions and usage examples, for each sense as shown above.
Dictionaries are generally produced for human readers, who are able to use their understanding of some words and word senses of a language in order to understand entries for other words or word senses with which they are not familiar. For example, a reader might know the different senses of the word run shown above, but might not know the word flow. To learn more about the word flow, the human reader would look up the entry shown above for flow. In reading the definition "to run smoothly with unbroken continuity" for flow, a human reader would employ his or her understanding of the different senses of run to determine that this definition of flow refers to the sand and liquid flowing sense of run (sense 115) rather than any of the other senses of run.
The field of natural language processing is directed to discerning the meaning of arbitrary natural language expressions, such as phrases, sentences, paragraphs, and longer documents, in a computer system. Given the existence of conventional dictionaries intended for human readers as described above, it is desirable to utilize such dictionaries as a basis for discerning the meaning of natural language expressions in a natural language processing system. The information in such a dictionary is not optimized for use by a natural language processing system, however. As noted above, the meaning of the occurrence of the word run in the definition for flow is ambiguous, thus rendering the entire definition for flow ambiguous. That is, the definition for flow may mean any of the following, depending upon the sense of run that is selected:
TABLE 2 ______________________________________ Sense of run employed Meaning ______________________________________ 100 to &lt;stride quickly&gt; smoothly with unbroken continuity 115 to &lt;flow like liquid or sand&gt; smoothly with unbroken continuity 316 to &lt;a movement or flow&gt; smoothly with unbroken continuity 329 to &lt;control&gt; smoothly with unbroken continuity 331 to &lt;execute, as a computer program&gt; smoothly with unbroken continuity ______________________________________
While it is clear to a typical human reader that the second of these interpretations is by far the most plausible, a computer-based natural language processing system does not share the human intuitions that provide a basis for resolving the ambiguity between these five possible meanings. An automated method for augmenting a conventional dictionary by adding word sense characterizations to occurrences of words whose sense is not characterized in a representation of the dictionary would have significant utility for natural language processing systems, so that natural language processing systems need not select between multiple meanings of text strings represented in the dictionary representation that contain polysemous words. Such an augmented dictionary representation represents the relationships between word senses, rather than merely relationships between orthographic word shapes.