Word sense disambiguation (“WSD”) can be utilized as a useful stage in an automated process for identifying the meaning of a discourse of text. WSD refers to the process of identifying which sense of a word that has multiple distinct senses is being used in a given passage of text. In the context of a semantically based search engine, WSD may be utilized to determine and index an author's intended sense for an ambiguous word in a passage. This allows the search engine to return the passage, or a document containing the passage, in response to a query that indicates the particular sense, and to not return the passage or document for queries related to other senses.
Due to uncertainty in automatic WSD systems, a particular word in a document might refer to many possible senses with varying levels of probability called word sense probabilities. For example, when used as a noun the word “print” may refer to the text appearing in a book, a picture printed from an engraving, or a copy of a movie on film. There may be a certain probability that the word in context refers to the text appearing in a book, another probability that the word refers to a picture printed from an engraving, and yet another probability that the word refers to a copy of a movie on film.
In order for a semantically based search engine to utilize word sense probabilities at query time, the probabilities need to be stored in a semantic index utilized by the search engine. Because word sense probabilities are typically represented as real numbers, however, storage of word sense probabilities for all of the words identified in a semantic index can consume an enormous amount of data storage capacity.
It is with respect to these considerations and others that the disclosure made herein is presented.