Word correlations broadly refer to semantic relationships existing amount words. For example, a measure of semantic similarity may be defined between two words to reflect how likely they are semantically related to each other. Two words might be closely related to each other as synonyms or antonyms, but may also be related to each other in less direct semantic meanings. Word correlations may be used to expand queries in general web search or to refine the keyword annotations in image search.
The current methods for defining and finding word correlations are text-based. Text-based methods typically estimate the correlation between two words via statistical or lexical analysis. For example, corpus-based co-occurrence statistics may provide useful solutions. Lexical analysis based on WORDNET, a semantic lexicon for the English language, may also provide useful solutions. WORDNET is considered to be better suited than conventional dictionaries for computerized analysis of semantic relationships because WORDNET groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. These text-based methods all have limitations. The corpus-based method only utilizes the textual information. Semantic lexicons such as WORDNET can only handle a limited number of words.
Although corpus-based co-occurrence statistics and semantic lexicons help to produce a combination of dictionary and thesaurus that is more intuitively usable, and to support automatic text analysis and artificial intelligence applications, further improvement on methods for defining and finding word correlations is desirable, particularly in the search-related context.