A thesaurus is a resource for identifying synonyms of a given word; that is, additional words having the same or nearly the same meaning as the given word. Thesauri are generally compiled manually, and in many cases require the investment of thousands of person-hours by skilled linguists. Because of the significant time and monetary costs of manually compiling a thesaurus, thesauri are seldom compiled for the special vocabularies used in particular subject matter domains, such as firefighting, microbiology, or software development. Such subject matter domain thesauri may be useful for understanding relationships between words that are present in a subject matter domain vocabulary, but not present in general vocabularies, as well as relationships between words that have connotations in a subject matter domain vocabulary that are different from those in general vocabularies.
Conventional thesauri may be used to determine whether or not two specified words are considered synonyms. They are not, however, able to quantify the extent to which the meanings of a pair of synonyms are similar. Conventional thesauri are similarly unable to quantify the extent to which the meanings of a pair of non-synonyms are similar. Given the above shortcomings of conventional thesauri, an automated approach to constructing thesauri and quantitatively determining the similarity between words would have significant utility.