Performing a word search over a large set of documents may be time consuming in some cases. The word search may be sped up by creating an index of the document set for every word. However, in a simple index, all the documents are equally related to a search word, because the individual documents are not rated. The word search may be sped up even further by using a relevance index, which indicates to what extent each word is relevant in each document. For example, a relevance index may list the number of occurrences of each word in each document.
However, a relevance index does not include associated words; for example, synonyms, hyponym/hypernym pairs and meronym/holonym pairs. A synonym of a target word is an alternative word for the target word; for example, notebook and laptop are synonyms and alternatives for one another. An association of the type “is a kind” of is a hyponym/hypernym pair, also called a child/parent pair. For example, “laptop” is a kind of “computer” and here, “laptop” is a hyponym (child) and “computer” is a hypernym (parent). In the case of verbs, the association can be understood better by “is one way to,” for example, to type “is one way to” input. An association of the type “is an instance of” is also a hyponym/hypernym pair. For example, “Einstein” is an instance of a “physicist.” An association of the type “is a member of,” “is a part of,” “is a substance of” is a meronym/holonym pair, also called a member/group pair. In these cases the meronym in some way belongs to the holonym. Examples: a “key” is a part of a “keyboard” and “microprocessor” is a substance of “computer.”
An association dictionary for indexed terms allows each word to have associated documents, that is, associated words of the search terms that are in the document. For any search word in any document, a relevance index will show the number of occurrences of the word and the number of occurrences of different types of associated words. An algorithm based on the occurrences can generate a value to indicate the relevance of each document for each word. The value can be generated because more than one search word the occurrence algorithm can generate more than one value, and these values can be combined to indicate an overall relevance of the words to the document.