In information searching, many search queries are ambiguous. Ambiguity in the context of search exists when a number of possible interpretations may exist for a given query. The search phrase “java” for example, may be related to coffee or to computer programming. In the context of Internet searching, the search term “java” may be used in connection with offering tips on learning the programming language, selling coffee, travel to Indonesia, or with Contractors who offer to do development in java for any willing clients.
Another type of ambiguity occurs when a phrase which is relatively unambiguous appears with too little context to know what the user is seeking. For example, a user who searches on Benjamin Franklin could be looking for his biography, picture, discoveries, sayings, etc.
Yet another type of ambiguity arises when a search query is matched to one or more shorter phrases. For example, if a search engine can produce result sets for “vintage hat” or for “hat pin” and a user searches on “vintage hat pin,” what results or combination of results from the shorter phrases should be shown?
Interpretation clusters may be used to direct the presentation of the search results to the user. An interpretation cluster is a subset of search results, for an ambiguous search phrase, that share the same meaning. Search listings in a result set may be ordered so that the user may select a result that satisfies his intended meaning. This can maximize the relevance of the search results.
Improving the relevance of search results reduces the search time for the user. Further, once the intent of the user is captured, it can be used to provide the user with additional relevant results.
Clustering techniques as applied to web content providers have focused on text analysis and link analysis. Text analysis techniques utilize word frequency or usage within documents or web pages/sites to form clusters, but require that documents are sufficiently verbose so as to be recognizably distinct. Link analysis utilizes existing hyperlinks between web pages/sites for clustering. A useful technique for “Efficient Identification of Web Communities” is presented by Flake, et al., in Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD-2000), pp. 150–160, August 2000, herein incorporated by reference in its entirety. One limitation of link analysis in general is that it requires the existence of meaningful links between web pages.